boffire's picture
Update README.md
c64ad49 verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: Tatoeba Kabyle Corpus Standardisation Checker
emoji: 🔍
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false

Upload any Kabyle text file (or a Tatoeba export) and instantly see:

  • Which characters are not part of the official Kabyle alphabet (CLDR)
  • How many times each occurs, with Unicode code-points and names
  • The exact sentences that contain them, highlighted in red
  • A downloadable CSV for further processing

Perfect for corpus maintainers who want to keep the Tatoeba Kabyle sentences standardised.

For example, you can download the file kab_clean_13102025.txt from the repo and upload it again on the checker space