Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.14.0
metadata
title: Tatoeba Kabyle Corpus Standardisation Checker
emoji: 🔍
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
Upload any Kabyle text file (or a Tatoeba export) and instantly see:
- Which characters are not part of the official Kabyle alphabet (CLDR)
- How many times each occurs, with Unicode code-points and names
- The exact sentences that contain them, highlighted in red
- A downloadable CSV for further processing
Perfect for corpus maintainers who want to keep the Tatoeba Kabyle sentences standardised.
For example, you can download the file kab_clean_13102025.txt from the repo and upload it again on the checker space