| --- |
| license: apache-2.0 |
| language: |
| - kab |
| tags: |
| - ocr |
| - paddleocr |
| - kabyle |
| - text-recognition |
| --- |
| |
| # Kabyle OCR Model – PaddleOCR Checkpoint |
|
|
| This is a **text recognition** model for the **Kabyle language** (written in Latin script), trained using PaddleOCR (PP‑OCRv3 architecture). The model was trained on synthetic text generated from Kabyle news corpora. |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |------------------------|--------------------------------------------| |
| | Architecture | PP‑OCRv3 (CRNN) | |
| | Character set size | 109 Kabyle characters + 1 blank token | |
| | Image shape | 3×48×480 (height=48, width=480) | |
| | Max text length | 25 characters | |
| | Training data | 18,000 synthetic images (mini‑test) | |
| | Evaluation accuracy | 57% (on held‑out validation set) | |
| | Normalised edit distance | 0.96 | |
|
|
| The character set includes both basic Latin letters and Kabyle‑specific characters: |
| `č, ḍ, ɛ, ǧ, ɣ, ḥ, ṛ, ṣ, ṭ, ẓ` (and their uppercase variants). |
|
|
| ## Files in this repository |
|
|
| - `best_accuracy.pdparams` – Trained model weights (PaddlePaddle format) |
| - `kab_dict.txt` – Character dictionary (one character per line) |
| - `config.yml` – Full training configuration (including image shape, transforms, etc.) |
| - `inference.yml` – Inference settings (optional, used by some scripts) |
|
|
| ## How to Use the Model |
|
|
| This is a test. Do not use it in production environnement. |