File size: 1,624 Bytes
41a7a63 ba62535 41a7a63 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | ---
license: apache-2.0
language:
- kab
tags:
- ocr
- paddleocr
- kabyle
- text-recognition
---
# Kabyle OCR Model – PaddleOCR Checkpoint
This is a **text recognition** model for the **Kabyle language** (written in Latin script), trained using PaddleOCR (PP‑OCRv3 architecture). The model was trained on synthetic text generated from Kabyle news corpora.
## Model Details
| Property | Value |
|------------------------|--------------------------------------------|
| Architecture | PP‑OCRv3 (CRNN) |
| Character set size | 109 Kabyle characters + 1 blank token |
| Image shape | 3×48×480 (height=48, width=480) |
| Max text length | 25 characters |
| Training data | 18,000 synthetic images (mini‑test) |
| Evaluation accuracy | 57% (on held‑out validation set) |
| Normalised edit distance | 0.96 |
The character set includes both basic Latin letters and Kabyle‑specific characters:
`č, ḍ, ɛ, ǧ, ɣ, ḥ, ṛ, ṣ, ṭ, ẓ` (and their uppercase variants).
## Files in this repository
- `best_accuracy.pdparams` – Trained model weights (PaddlePaddle format)
- `kab_dict.txt` – Character dictionary (one character per line)
- `config.yml` – Full training configuration (including image shape, transforms, etc.)
- `inference.yml` – Inference settings (optional, used by some scripts)
## How to Use the Model
This is a test. Do not use it in production environnement. |