boffire's picture
Update README.md
c64ad49 verified
---
title: Tatoeba Kabyle Corpus Standardisation Checker
emoji: 🔍
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
---
Upload any Kabyle text file (or a Tatoeba export) and instantly see:
* Which characters are **not** part of the official Kabyle alphabet (CLDR)
* How many times each occurs, with Unicode code-points and names
* The exact sentences that contain them, highlighted in red
* A downloadable CSV for further processing
Perfect for corpus maintainers who want to keep the Tatoeba Kabyle sentences standardised.
For example, you can download the file `kab_clean_13102025.txt` from the repo and upload it again on the [checker space](https://huggingface.co/spaces/Imsidag-community/kabyle-tatoeba-checker)