--- title: Tatoeba Kabyle Corpus Standardisation Checker emoji: 🔍 colorFrom: blue colorTo: green sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false --- Upload any Kabyle text file (or a Tatoeba export) and instantly see: * Which characters are **not** part of the official Kabyle alphabet (CLDR) * How many times each occurs, with Unicode code-points and names * The exact sentences that contain them, highlighted in red * A downloadable CSV for further processing Perfect for corpus maintainers who want to keep the Tatoeba Kabyle sentences standardised. For example, you can download the file `kab_clean_13102025.txt` from the repo and upload it again on the [checker space](https://huggingface.co/spaces/Imsidag-community/kabyle-tatoeba-checker)