A newer version of the Streamlit SDK is available: 1.56.0
metadata
title: SinCode
emoji: 🔤
colorFrom: blue
colorTo: yellow
sdk: streamlit
sdk_version: 1.32.0
app_file: app.py
pinned: false
SinCode: Context-Aware Dual-Neural Back-Transliteration for Code-Mixed Romanized Sinhala
A model-driven, context-aware back-transliteration system that converts Romanised Sinhala (Singlish) to native Sinhala script.
Architecture (v3)
Input sentence
|
v
Word Tokenizer
|
+-- Sinhala script? -------------------------> Pass through unchanged
|
+-- English vocab (len >= 3)? --------------> Pass through unchanged
|
`-- Singlish word?
|
v
ByT5-small seq2seq
(top-5 candidates)
|
v
XLM-RoBERTa MLM reranker
(contextual scoring)
|
v
Best candidate
Models
| Model | Role | Hub ID |
|---|---|---|
| ByT5-small | Singlish -> Sinhala candidate generation | Kalana001/byt5-small-singlish-sinhala |
| XLM-RoBERTa | Contextual MLM reranking | Kalana001/xlm-roberta-base-finetuned-sinhala |
| mBart50 | Full-sentence Sinhala output mode | Kalana001/mbart50-large-singlish-sinhala |
Modes
- Code-Mixed Output - Retains English words where contextually appropriate; Singlish words are transliterated using ByT5 + XLM-RoBERTa reranking.
- Full Sinhala Output - Transliterates the entire sentence to Sinhala script using mBart50.
Environment Variables (optional)
Set these in HF Spaces -> Settings -> Repository secrets to enable Supabase feedback storage:
| Variable | Description |
|---|---|
SUPABASE_URL |
Supabase project URL |
SUPABASE_ANON_KEY |
Supabase anon key |
SUPABASE_SERVICE_ROLE_KEY |
Supabase service role key |
SUPABASE_FEEDBACK_TABLE |
Table name (default: feedback_submissions) |
If not set, feedback is saved locally to misc/feedback_submissions.jsonl.