Qwen2.5-0.5B Singlish → Sinhala Transliterator

This model transliterates Singlish (romanized Sinhala) to Sinhala Unicode script.

Training

Base model: Qwen/Qwen2.5-0.5B
Three-phase LoRA fine-tuning on ~1M phonetic pairs + ~12K adhoc pairs
LoRA rank: 64, alpha: 128

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Pudamya/Qwen2.5-Singlish-Sinhala")
tokenizer = AutoTokenizer.from_pretrained("Pudamya/Qwen2.5-Singlish-Sinhala")

def transliterate(text):
    messages = [
        {"role": "system", "content": "You are a Sinhala transliteration expert. Convert Singlish (romanized Sinhala) to Sinhala Unicode script accurately."},
        {"role": "user", "content": "Transliterate the following Singlish text to Sinhala:\n" + text},
    ]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    enc = tokenizer(prompt, return_tensors="pt")
    out = model.generate(**enc, max_new_tokens=80, do_sample=False, num_beams=2)
    return tokenizer.decode(out[0][enc["input_ids"].shape[1]:], skip_special_tokens=True).strip()

print(transliterate("mama kohomada"))  # → මම කොහොමද

Downloads last month: 8

Safetensors

Model size

0.5B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Pudamya/Qwen2.5-Singlish-Sinhala

Base model

Qwen/Qwen2.5-0.5B

Adapter

(380)

this model

Adapters

1 model