small-100-Singlish-Sinhala-CodeMix2-CT2

CTranslate2 INT8 quantized version of small-100-Singlish-Sinhala-CodeMix2 — a fine-tuned NLLB-200 small-100 model for translating Singlish (Romanized Sinhala / English-Sinhala code-mixed text) → Sinhala script.

What This Model Does

Translates informal Sri Lankan code-mixed text written in Roman script into Sinhala Unicode script.

Input (Singlish)	Output (Sinhala)
mama heta school yanne nehe	මම හෙට පාසලට යන්නේ නෑ
ayye spotify eke audio tika nane	අයියේ ස්පොටිෆයි එකේ ශබ්ද ටික නෑනේ
Kumara sangakkara sri lankawe cricket legend kenek	කුමාර සංගක්කාර ශ්‍රී ලංකාවේ ක්‍රිකට් ලෙඩ්ජන් කෙනෙක්

Model Variants

Format	Device	CER	WER	Acc	Size	ms/sent	vs Baseline
PyTorch FP32 (baseline)	GPU	0.1386	0.3241	0.1263	3,458 MB	139.9	—
CT2 INT8 ⭐	CPU	0.1428	0.3344	0.1146	332 MB	182.5	ΔCER +0.0042
CT2 FP16→FP32	CPU	0.1411	0.3300	0.1193	645 MB	379.9	ΔCER +0.0025
CT2 INT8+FP16→INT8	CPU	0.1422	0.3333	0.1149	329 MB	183.1	ΔCER +0.0036

CPU-only Speed Comparison

Format	ms/sent	Speedup vs Slowest
CT2 INT8 ⭐	182.5	2.08×
CT2 INT8+FP16→INT8	183.1	2.08×
CT2 FP16→FP32	379.9	1.00×

Why CT2 INT8 Was Selected

90.4% smaller than the FP32 PyTorch model (332 MB vs 3,458 MB)
2.08× faster than CT2 FP16→FP32 on CPU
Only +0.0042 CER degradation vs the GPU FP32 baseline
Runs entirely on CPU — no GPU required for inference

Evaluation Dataset

Dataset: Golden dataset — Singlish/Sinhala code-mixed test set
Samples: 2,549 sentence pairs
Metrics: CER (Character Error Rate), WER (Word Error Rate), Exact Match Accuracy

Quick Start

CTranslate2 (recommended)

from huggingface_hub import snapshot_download
import ctranslate2
from transformers import AutoTokenizer

model_path = snapshot_download("savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-CT2")
print(f"✓ Downloaded to: {model_path}")

translator = ctranslate2.Translator(
    model_path,
    device="cpu",
    compute_type="int8",
    intra_threads=4,
)

tokenizer = AutoTokenizer.from_pretrained(
    "savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-CT2"
)

def translate(text: str) -> str:
    tokenizer.src_lang = "en"
    tokens = tokenizer.convert_ids_to_tokens(
        tokenizer.encode(text, add_special_tokens=True)
    )

    result = translator.translate_batch(
        [tokens],
        target_prefix=[["si"]],
        beam_size=3,
        max_decoding_length=128,
        repetition_penalty=1.2,
    )

    hyp_ids = tokenizer.convert_tokens_to_ids(result[0].hypotheses[0])
    return tokenizer.decode(hyp_ids, skip_special_tokens=True).strip("si")

while True:
    text = input("\nSinglish: ").strip()
    if text.lower() == "q":
        break
    print(f"සිංහල: {translate(text)}")

FastAPI REST API

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import ctranslate2
from transformers import AutoTokenizer
import uvicorn

MODEL_CT2  = "savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-CT2"
MODEL_BASE = "savinugunarathna/small-100-Singlish-Sinhala-CodeMix2"

print("Loading model...")
translator = ctranslate2.Translator(MODEL_CT2, device="cpu", compute_type="int8", intra_threads=4)
tokenizer  = AutoTokenizer.from_pretrained(MODEL_BASE)
print("Ready!")

app = FastAPI(title="Singlish → Sinhala API", version="1.0")

class TranslateRequest(BaseModel):
    text: str
    beam_size: int = 3
    max_length: int = 128

class TranslateResponse(BaseModel):
    input: str
    translation: str

def _translate(text: str, beam_size: int = 3, max_length: int = 128) -> str:
    tokenizer.src_lang = "en"
    tokens  = tokenizer.convert_ids_to_tokens(
        tokenizer.encode(text, add_special_tokens=True)
    )
    result  = translator.translate_batch(
        [tokens],
        target_prefix=[["si"]],
        beam_size=beam_size,
        max_decoding_length=max_length,
        repetition_penalty=1.2,
    )
    hyp_ids = tokenizer.convert_tokens_to_ids(result[0].hypotheses[0])
    return tokenizer.decode(hyp_ids, skip_special_tokens=True).strip("si")

@app.get("/health")
def health():
    return {"status": "ok", "model": MODEL_CT2}

@app.post("/translate", response_model=TranslateResponse)
def translate(req: TranslateRequest):
    if not req.text.strip():
        raise HTTPException(status_code=400, detail="text cannot be empty")
    return TranslateResponse(
        input=req.text,
        translation=_translate(req.text, req.beam_size, req.max_length)
    )

@app.post("/translate/batch")
def translate_batch(texts: list[str]):
    if not texts:
        raise HTTPException(status_code=400, detail="texts list cannot be empty")
    if len(texts) > 32:
        raise HTTPException(status_code=400, detail="max 32 texts per batch")
    return {"translations": [_translate(t) for t in texts]}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

Related Models

Model	Format	Size	Notes
small-100-Singlish-Sinhala-CodeMix2	PyTorch FP32	3,458 MB	Original fine-tuned model
small-100-Singlish-Sinhala-CodeMix2-CT2	CT2 INT8	332 MB	This model — smallest, CPU-optimized

Technical Details

Property	Value
Base architecture	M2M100 / small-100
Quantization	Dynamic INT8 (CTranslate2)
Source language	English / Singlish (romanized)
Target language	Sinhala (si)
Source lang token	en
Target lang token	si
Beam size (eval)	3
Repetition penalty	1.2
Max decode length	128 tokens
Evaluation samples	2,549

Citation

@misc{small100-singlish-sinhala-ct2,
  author    = {Savinu Gunarathna},
  title     = {small-100-Singlish-Sinhala-CodeMix2-CT2},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-CT2}
}

Downloads last month: 5

Model tree for savinugunarathna/small-100-Singlish-Sinhala-CodeMix-CT2

Base model

savinugunarathna/small-100-Singlish-Sinhala-CodeMix2

Finetuned

(1)

this model