SinCode / README.md
KalanaPabasara
Restore required Hugging Face README front matter
f9831df

A newer version of the Streamlit SDK is available: 1.56.0

Upgrade
metadata
title: SinCode
emoji: 🔤
colorFrom: blue
colorTo: yellow
sdk: streamlit
sdk_version: 1.32.0
app_file: app.py
pinned: false

SinCode: Context-Aware Dual-Neural Back-Transliteration for Code-Mixed Romanized Sinhala

A model-driven, context-aware back-transliteration system that converts Romanised Sinhala (Singlish) to native Sinhala script.

Architecture (v3)

Input sentence
    |
    v
Word Tokenizer
    |
    +-- Sinhala script? -------------------------> Pass through unchanged
    |
    +-- English vocab (len >= 3)? --------------> Pass through unchanged
    |
    `-- Singlish word?
            |
            v
     ByT5-small seq2seq
     (top-5 candidates)
            |
            v
     XLM-RoBERTa MLM reranker
     (contextual scoring)
            |
            v
      Best candidate

Models

Model Role Hub ID
ByT5-small Singlish -> Sinhala candidate generation Kalana001/byt5-small-singlish-sinhala
XLM-RoBERTa Contextual MLM reranking Kalana001/xlm-roberta-base-finetuned-sinhala
mBart50 Full-sentence Sinhala output mode Kalana001/mbart50-large-singlish-sinhala

Modes

  • Code-Mixed Output - Retains English words where contextually appropriate; Singlish words are transliterated using ByT5 + XLM-RoBERTa reranking.
  • Full Sinhala Output - Transliterates the entire sentence to Sinhala script using mBart50.

Environment Variables (optional)

Set these in HF Spaces -> Settings -> Repository secrets to enable Supabase feedback storage:

Variable Description
SUPABASE_URL Supabase project URL
SUPABASE_ANON_KEY Supabase anon key
SUPABASE_SERVICE_ROLE_KEY Supabase service role key
SUPABASE_FEEDBACK_TABLE Table name (default: feedback_submissions)

If not set, feedback is saved locally to misc/feedback_submissions.jsonl.