small-100-Singlish-Sinhala-CodeMix2-CT2

CTranslate2 INT8 quantized version of small-100-Singlish-Sinhala-CodeMix2 — a fine-tuned NLLB-200 small-100 model for translating Singlish (Romanized Sinhala / English-Sinhala code-mixed text) → Sinhala script.


What This Model Does

Translates informal Sri Lankan code-mixed text written in Roman script into Sinhala Unicode script.

Input (Singlish) Output (Sinhala)
mama heta school yanne nehe මම හෙට පාසලට යන්නේ නෑ
ayye spotify eke audio tika nane අයියේ ස්පොටිෆයි එකේ ශබ්ද ටික නෑනේ
Kumara sangakkara sri lankawe cricket legend kenek කුමාර සංගක්කාර ශ්‍රී ලංකාවේ ක්‍රිකට් ලෙඩ්ජන් කෙනෙක්

Model Variants

Format Device CER WER Acc Size ms/sent vs Baseline
PyTorch FP32 (baseline) GPU 0.1386 0.3241 0.1263 3,458 MB 139.9
CT2 INT8 CPU 0.1428 0.3344 0.1146 332 MB 182.5 ΔCER +0.0042
CT2 FP16→FP32 CPU 0.1411 0.3300 0.1193 645 MB 379.9 ΔCER +0.0025
CT2 INT8+FP16→INT8 CPU 0.1422 0.3333 0.1149 329 MB 183.1 ΔCER +0.0036

CPU-only Speed Comparison

Format ms/sent Speedup vs Slowest
CT2 INT8 182.5 2.08×
CT2 INT8+FP16→INT8 183.1 2.08×
CT2 FP16→FP32 379.9 1.00×

Why CT2 INT8 Was Selected

  • 90.4% smaller than the FP32 PyTorch model (332 MB vs 3,458 MB)
  • 2.08× faster than CT2 FP16→FP32 on CPU
  • Only +0.0042 CER degradation vs the GPU FP32 baseline
  • Runs entirely on CPU — no GPU required for inference

Evaluation Dataset

  • Dataset: Golden dataset — Singlish/Sinhala code-mixed test set
  • Samples: 2,549 sentence pairs
  • Metrics: CER (Character Error Rate), WER (Word Error Rate), Exact Match Accuracy

Quick Start

CTranslate2 (recommended)

from huggingface_hub import snapshot_download
import ctranslate2
from transformers import AutoTokenizer

model_path = snapshot_download("savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-CT2")
print(f"✓ Downloaded to: {model_path}")

translator = ctranslate2.Translator(
    model_path,
    device="cpu",
    compute_type="int8",
    intra_threads=4,
)

tokenizer = AutoTokenizer.from_pretrained(
    "savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-CT2"
)

def translate(text: str) -> str:
    tokenizer.src_lang = "en"
    tokens = tokenizer.convert_ids_to_tokens(
        tokenizer.encode(text, add_special_tokens=True)
    )

    result = translator.translate_batch(
        [tokens],
        target_prefix=[["si"]],
        beam_size=3,
        max_decoding_length=128,
        repetition_penalty=1.2,
    )

    hyp_ids = tokenizer.convert_tokens_to_ids(result[0].hypotheses[0])
    return tokenizer.decode(hyp_ids, skip_special_tokens=True).strip("si")

while True:
    text = input("\nSinglish: ").strip()
    if text.lower() == "q":
        break
    print(f"සිංහල: {translate(text)}")

FastAPI REST API

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import ctranslate2
from transformers import AutoTokenizer
import uvicorn

MODEL_CT2  = "savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-CT2"
MODEL_BASE = "savinugunarathna/small-100-Singlish-Sinhala-CodeMix2"

print("Loading model...")
translator = ctranslate2.Translator(MODEL_CT2, device="cpu", compute_type="int8", intra_threads=4)
tokenizer  = AutoTokenizer.from_pretrained(MODEL_BASE)
print("Ready!")

app = FastAPI(title="Singlish → Sinhala API", version="1.0")

class TranslateRequest(BaseModel):
    text: str
    beam_size: int = 3
    max_length: int = 128

class TranslateResponse(BaseModel):
    input: str
    translation: str

def _translate(text: str, beam_size: int = 3, max_length: int = 128) -> str:
    tokenizer.src_lang = "en"
    tokens  = tokenizer.convert_ids_to_tokens(
        tokenizer.encode(text, add_special_tokens=True)
    )
    result  = translator.translate_batch(
        [tokens],
        target_prefix=[["si"]],
        beam_size=beam_size,
        max_decoding_length=max_length,
        repetition_penalty=1.2,
    )
    hyp_ids = tokenizer.convert_tokens_to_ids(result[0].hypotheses[0])
    return tokenizer.decode(hyp_ids, skip_special_tokens=True).strip("si")

@app.get("/health")
def health():
    return {"status": "ok", "model": MODEL_CT2}

@app.post("/translate", response_model=TranslateResponse)
def translate(req: TranslateRequest):
    if not req.text.strip():
        raise HTTPException(status_code=400, detail="text cannot be empty")
    return TranslateResponse(
        input=req.text,
        translation=_translate(req.text, req.beam_size, req.max_length)
    )

@app.post("/translate/batch")
def translate_batch(texts: list[str]):
    if not texts:
        raise HTTPException(status_code=400, detail="texts list cannot be empty")
    if len(texts) > 32:
        raise HTTPException(status_code=400, detail="max 32 texts per batch")
    return {"translations": [_translate(t) for t in texts]}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

Related Models

Model Format Size Notes
small-100-Singlish-Sinhala-CodeMix2 PyTorch FP32 3,458 MB Original fine-tuned model
small-100-Singlish-Sinhala-CodeMix2-CT2 CT2 INT8 332 MB This model — smallest, CPU-optimized

Technical Details

Property Value
Base architecture M2M100 / small-100
Quantization Dynamic INT8 (CTranslate2)
Source language English / Singlish (romanized)
Target language Sinhala (si)
Source lang token en
Target lang token si
Beam size (eval) 3
Repetition penalty 1.2
Max decode length 128 tokens
Evaluation samples 2,549

Citation

@misc{small100-singlish-sinhala-ct2,
  author    = {Savinu Gunarathna},
  title     = {small-100-Singlish-Sinhala-CodeMix2-CT2},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-CT2}
}
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for savinugunarathna/small-100-Singlish-Sinhala-CodeMix-CT2

Finetuned
(1)
this model