kabyle-ocr-paddle / README.md
boffire's picture
Update README.md
ba62535 verified
---
license: apache-2.0
language:
- kab
tags:
- ocr
- paddleocr
- kabyle
- text-recognition
---
# Kabyle OCR Model – PaddleOCR Checkpoint
This is a **text recognition** model for the **Kabyle language** (written in Latin script), trained using PaddleOCR (PP‑OCRv3 architecture). The model was trained on synthetic text generated from Kabyle news corpora.
## Model Details
| Property | Value |
|------------------------|--------------------------------------------|
| Architecture | PP‑OCRv3 (CRNN) |
| Character set size | 109 Kabyle characters + 1 blank token |
| Image shape | 3×48×480 (height=48, width=480) |
| Max text length | 25 characters |
| Training data | 18,000 synthetic images (mini‑test) |
| Evaluation accuracy | 57% (on held‑out validation set) |
| Normalised edit distance | 0.96 |
The character set includes both basic Latin letters and Kabyle‑specific characters:
`č, ḍ, ɛ, ǧ, ɣ, ḥ, ṛ, ṣ, ṭ, ẓ` (and their uppercase variants).
## Files in this repository
- `best_accuracy.pdparams` – Trained model weights (PaddlePaddle format)
- `kab_dict.txt` – Character dictionary (one character per line)
- `config.yml` – Full training configuration (including image shape, transforms, etc.)
- `inference.yml` – Inference settings (optional, used by some scripts)
## How to Use the Model
This is a test. Do not use it in production environnement.