boffire
/

kabyle-ocr-paddle

text-recognition

Model card Files Files and versions

kabyle-ocr-paddle / README.md

boffire's picture

Update README.md

ba62535 verified 19 days ago

|

history blame contribute delete

1.62 kB

	---
	license: apache-2.0
	language:
	- kab
	tags:
	- ocr
	- paddleocr
	- kabyle
	- text-recognition
	---

	# Kabyle OCR Model – PaddleOCR Checkpoint

	This is a text recognition model for the Kabyle language (written in Latin script), trained using PaddleOCR (PP‑OCRv3 architecture). The model was trained on synthetic text generated from Kabyle news corpora.

	## Model Details

	\| Property \| Value \|
	\|------------------------\|--------------------------------------------\|
	\| Architecture \| PP‑OCRv3 (CRNN) \|
	\| Character set size \| 109 Kabyle characters + 1 blank token \|
	\| Image shape \| 3×48×480 (height=48, width=480) \|
	\| Max text length \| 25 characters \|
	\| Training data \| 18,000 synthetic images (mini‑test) \|
	\| Evaluation accuracy \| 57% (on held‑out validation set) \|
	\| Normalised edit distance \| 0.96 \|

	The character set includes both basic Latin letters and Kabyle‑specific characters:
	`č, ḍ, ɛ, ǧ, ɣ, ḥ, ṛ, ṣ, ṭ, ẓ` (and their uppercase variants).

	## Files in this repository

	- `best_accuracy.pdparams` – Trained model weights (PaddlePaddle format)
	- `kab_dict.txt` – Character dictionary (one character per line)
	- `config.yml` – Full training configuration (including image shape, transforms, etc.)
	- `inference.yml` – Inference settings (optional, used by some scripts)

	## How to Use the Model

	This is a test. Do not use it in production environnement.