Qwen2.5-0.5B Singlish → Sinhala Transliterator

This model transliterates Singlish (romanized Sinhala) to Sinhala Unicode script.

Training

  • Base model: Qwen/Qwen2.5-0.5B
  • Three-phase LoRA fine-tuning on ~1M phonetic pairs + ~12K adhoc pairs
  • LoRA rank: 64, alpha: 128

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Pudamya/Qwen2.5-Singlish-Sinhala")
tokenizer = AutoTokenizer.from_pretrained("Pudamya/Qwen2.5-Singlish-Sinhala")

def transliterate(text):
    messages = [
        {"role": "system", "content": "You are a Sinhala transliteration expert. Convert Singlish (romanized Sinhala) to Sinhala Unicode script accurately."},
        {"role": "user", "content": "Transliterate the following Singlish text to Sinhala:\n" + text},
    ]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    enc = tokenizer(prompt, return_tensors="pt")
    out = model.generate(**enc, max_new_tokens=80, do_sample=False, num_beams=2)
    return tokenizer.decode(out[0][enc["input_ids"].shape[1]:], skip_special_tokens=True).strip()

print(transliterate("mama kohomada"))  # → මම කොහොමද
Downloads last month
8
Safetensors
Model size
0.5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Pudamya/Qwen2.5-Singlish-Sinhala

Adapter
(380)
this model
Adapters
1 model