--- license: apache-2.0 language: - kab tags: - ocr - paddleocr - kabyle - text-recognition --- # Kabyle OCR Model – PaddleOCR Checkpoint This is a **text recognition** model for the **Kabyle language** (written in Latin script), trained using PaddleOCR (PP‑OCRv3 architecture). The model was trained on synthetic text generated from Kabyle news corpora. ## Model Details | Property | Value | |------------------------|--------------------------------------------| | Architecture | PP‑OCRv3 (CRNN) | | Character set size | 109 Kabyle characters + 1 blank token | | Image shape | 3×48×480 (height=48, width=480) | | Max text length | 25 characters | | Training data | 18,000 synthetic images (mini‑test) | | Evaluation accuracy | 57% (on held‑out validation set) | | Normalised edit distance | 0.96 | The character set includes both basic Latin letters and Kabyle‑specific characters: `č, ḍ, ɛ, ǧ, ɣ, ḥ, ṛ, ṣ, ṭ, ẓ` (and their uppercase variants). ## Files in this repository - `best_accuracy.pdparams` – Trained model weights (PaddlePaddle format) - `kab_dict.txt` – Character dictionary (one character per line) - `config.yml` – Full training configuration (including image shape, transforms, etc.) - `inference.yml` – Inference settings (optional, used by some scripts) ## How to Use the Model This is a test. Do not use it in production environnement.