---
license: apache-2.0
language:
- kab
tags:
- ocr
- paddleocr
- kabyle
- text-recognition
---

# Kabyle OCR Model – PaddleOCR Checkpoint

This is a **text recognition** model for the **Kabyle language** (written in Latin script), trained using PaddleOCR (PP‑OCRv3 architecture). The model was trained on synthetic text generated from Kabyle news corpora.

## Model Details

| Property               | Value                                      |
|------------------------|--------------------------------------------|
| Architecture           | PP‑OCRv3 (CRNN)                            |
| Character set size     | 109 Kabyle characters + 1 blank token      |
| Image shape            | 3×48×480 (height=48, width=480)            |
| Max text length        | 25 characters                              |
| Training data          | 18,000 synthetic images (mini‑test)        |
| Evaluation accuracy    | 57% (on held‑out validation set)           |
| Normalised edit distance | 0.96                                      |

The character set includes both basic Latin letters and Kabyle‑specific characters:  
`č, ḍ, ɛ, ǧ, ɣ, ḥ, ṛ, ṣ, ṭ, ẓ` (and their uppercase variants).

## Files in this repository

- `best_accuracy.pdparams` – Trained model weights (PaddlePaddle format)
- `kab_dict.txt` – Character dictionary (one character per line)
- `config.yml` – Full training configuration (including image shape, transforms, etc.)
- `inference.yml` – Inference settings (optional, used by some scripts)

## How to Use the Model

This is a test. Do not use it in production environnement.