Spaces:
Sleeping
Sleeping
| title: Tatoeba Kabyle Corpus Standardisation Checker | |
| emoji: 🔍 | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app.py | |
| pinned: false | |
| Upload any Kabyle text file (or a Tatoeba export) and instantly see: | |
| * Which characters are **not** part of the official Kabyle alphabet (CLDR) | |
| * How many times each occurs, with Unicode code-points and names | |
| * The exact sentences that contain them, highlighted in red | |
| * A downloadable CSV for further processing | |
| Perfect for corpus maintainers who want to keep the Tatoeba Kabyle sentences standardised. | |
| For example, you can download the file `kab_clean_13102025.txt` from the repo and upload it again on the [checker space](https://huggingface.co/spaces/Imsidag-community/kabyle-tatoeba-checker) |