small-100-Singlish-Sinhala-CodeMix2-CT2
CTranslate2 INT8 quantized version of small-100-Singlish-Sinhala-CodeMix2 — a fine-tuned NLLB-200 small-100 model for translating Singlish (Romanized Sinhala / English-Sinhala code-mixed text) → Sinhala script.
What This Model Does
Translates informal Sri Lankan code-mixed text written in Roman script into Sinhala Unicode script.
| Input (Singlish) |
Output (Sinhala) |
| mama heta school yanne nehe |
මම හෙට පාසලට යන්නේ නෑ |
| ayye spotify eke audio tika nane |
අයියේ ස්පොටිෆයි එකේ ශබ්ද ටික නෑනේ |
| Kumara sangakkara sri lankawe cricket legend kenek |
කුමාර සංගක්කාර ශ්රී ලංකාවේ ක්රිකට් ලෙඩ්ජන් කෙනෙක් |
Model Variants
| Format |
Device |
CER |
WER |
Acc |
Size |
ms/sent |
vs Baseline |
| PyTorch FP32 (baseline) |
GPU |
0.1386 |
0.3241 |
0.1263 |
3,458 MB |
139.9 |
— |
| CT2 INT8 ⭐ |
CPU |
0.1428 |
0.3344 |
0.1146 |
332 MB |
182.5 |
ΔCER +0.0042 |
| CT2 FP16→FP32 |
CPU |
0.1411 |
0.3300 |
0.1193 |
645 MB |
379.9 |
ΔCER +0.0025 |
| CT2 INT8+FP16→INT8 |
CPU |
0.1422 |
0.3333 |
0.1149 |
329 MB |
183.1 |
ΔCER +0.0036 |
CPU-only Speed Comparison
| Format |
ms/sent |
Speedup vs Slowest |
| CT2 INT8 ⭐ |
182.5 |
2.08× |
| CT2 INT8+FP16→INT8 |
183.1 |
2.08× |
| CT2 FP16→FP32 |
379.9 |
1.00× |
Why CT2 INT8 Was Selected
- 90.4% smaller than the FP32 PyTorch model (332 MB vs 3,458 MB)
- 2.08× faster than CT2 FP16→FP32 on CPU
- Only +0.0042 CER degradation vs the GPU FP32 baseline
- Runs entirely on CPU — no GPU required for inference
Evaluation Dataset
- Dataset: Golden dataset — Singlish/Sinhala code-mixed test set
- Samples: 2,549 sentence pairs
- Metrics: CER (Character Error Rate), WER (Word Error Rate), Exact Match Accuracy
Quick Start
CTranslate2 (recommended)
from huggingface_hub import snapshot_download
import ctranslate2
from transformers import AutoTokenizer
model_path = snapshot_download("savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-CT2")
print(f"✓ Downloaded to: {model_path}")
translator = ctranslate2.Translator(
model_path,
device="cpu",
compute_type="int8",
intra_threads=4,
)
tokenizer = AutoTokenizer.from_pretrained(
"savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-CT2"
)
def translate(text: str) -> str:
tokenizer.src_lang = "en"
tokens = tokenizer.convert_ids_to_tokens(
tokenizer.encode(text, add_special_tokens=True)
)
result = translator.translate_batch(
[tokens],
target_prefix=[["si"]],
beam_size=3,
max_decoding_length=128,
repetition_penalty=1.2,
)
hyp_ids = tokenizer.convert_tokens_to_ids(result[0].hypotheses[0])
return tokenizer.decode(hyp_ids, skip_special_tokens=True).strip("si")
while True:
text = input("\nSinglish: ").strip()
if text.lower() == "q":
break
print(f"සිංහල: {translate(text)}")
FastAPI REST API
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import ctranslate2
from transformers import AutoTokenizer
import uvicorn
MODEL_CT2 = "savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-CT2"
MODEL_BASE = "savinugunarathna/small-100-Singlish-Sinhala-CodeMix2"
print("Loading model...")
translator = ctranslate2.Translator(MODEL_CT2, device="cpu", compute_type="int8", intra_threads=4)
tokenizer = AutoTokenizer.from_pretrained(MODEL_BASE)
print("Ready!")
app = FastAPI(title="Singlish → Sinhala API", version="1.0")
class TranslateRequest(BaseModel):
text: str
beam_size: int = 3
max_length: int = 128
class TranslateResponse(BaseModel):
input: str
translation: str
def _translate(text: str, beam_size: int = 3, max_length: int = 128) -> str:
tokenizer.src_lang = "en"
tokens = tokenizer.convert_ids_to_tokens(
tokenizer.encode(text, add_special_tokens=True)
)
result = translator.translate_batch(
[tokens],
target_prefix=[["si"]],
beam_size=beam_size,
max_decoding_length=max_length,
repetition_penalty=1.2,
)
hyp_ids = tokenizer.convert_tokens_to_ids(result[0].hypotheses[0])
return tokenizer.decode(hyp_ids, skip_special_tokens=True).strip("si")
@app.get("/health")
def health():
return {"status": "ok", "model": MODEL_CT2}
@app.post("/translate", response_model=TranslateResponse)
def translate(req: TranslateRequest):
if not req.text.strip():
raise HTTPException(status_code=400, detail="text cannot be empty")
return TranslateResponse(
input=req.text,
translation=_translate(req.text, req.beam_size, req.max_length)
)
@app.post("/translate/batch")
def translate_batch(texts: list[str]):
if not texts:
raise HTTPException(status_code=400, detail="texts list cannot be empty")
if len(texts) > 32:
raise HTTPException(status_code=400, detail="max 32 texts per batch")
return {"translations": [_translate(t) for t in texts]}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
Related Models
| Model |
Format |
Size |
Notes |
| small-100-Singlish-Sinhala-CodeMix2 |
PyTorch FP32 |
3,458 MB |
Original fine-tuned model |
| small-100-Singlish-Sinhala-CodeMix2-CT2 |
CT2 INT8 |
332 MB |
This model — smallest, CPU-optimized |
Technical Details
| Property |
Value |
| Base architecture |
M2M100 / small-100 |
| Quantization |
Dynamic INT8 (CTranslate2) |
| Source language |
English / Singlish (romanized) |
| Target language |
Sinhala (si) |
| Source lang token |
en |
| Target lang token |
si |
| Beam size (eval) |
3 |
| Repetition penalty |
1.2 |
| Max decode length |
128 tokens |
| Evaluation samples |
2,549 |
Citation
@misc{small100-singlish-sinhala-ct2,
author = {Savinu Gunarathna},
title = {small-100-Singlish-Sinhala-CodeMix2-CT2},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-CT2}
}