Pnar Translation Models
Collection
A collection of machine learning models for translating text to and from the Pnar language • 1 item • Updated
This model is a fine-tuned version of facebook/nllb-200-distilled-600M for English to Pnar (Jaintia) translation. It has been trained on a custom dataset specifically curated for this low-resource language spoken in Meghalaya, India.
| Property | Value |
|---|---|
| Base Model | facebook/nllb-200-distilled-600M |
| Type | Seq2Seq MT |
| Languages | English → Pnar (eng_Latn → pbv_Latn) |
| Technique | LoRA fine-tuning + Continuation Training |
| License | CC-BY-NC-4.0 (inherits from Meta) |
| Training Data | Custom English–Pnar parallel corpus |
| Max Sequence Length | 128 tokens (truncation enabled) |
The training utilized the LoRA (Low-Rank Adaptation) technique on a substantial corpus of parallel sentences.
Final Test Metrics:
While this model achieves impressive scores, users should be aware of the following limitations:
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_id = "toiar/nllb-finetuned-english-pnar"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id).to(device)
def translate(text):
inputs = tokenizer(text, return_tensors="pt").to(device)
output = model.generate(
**inputs,
forced_bos_token_id=tokenizer.convert_tokens_to_ids("pbv_Latn"),
max_length=128,
num_beams=5,
early_stopping=True
)
return tokenizer.decode(output[0], skip_special_tokens=True)
# Example usage
text = "They are learning new skills to improve their future."
translation = translate(text)
print(f"English: {text}")
print(f"Pnar: {translation}")
Base model
facebook/nllb-200-distilled-600M