NLLB-200 600M - Swahili to Kalenjin (LoRA Adapter)
This is a rigorously fine-tuned LoRA adapter for facebook/nllb-200-distilled-600M, heavily optimized for translating Swahili (SWA) to Kalenjin (KLN).
π Evaluation Results (SWA -> KLN)
- BLEU Score:
40.24 - chrF Score:
62.38
π οΈ Technical Details & Training
- Hardware: Trained locally (42 Epochs) on 5060 8GB GPU for 13 hours.
- LoRA Config:
r=64,alpha=128, targeting["q_proj", "v_proj", "k_proj", "out_proj", "fc1", "fc2"]. - Token Strategy: Kalenjin uses "token hijacking" and routes through the
luo_Latntoken space to prevent catastrophic forgetting that comes with initializing a raw token.
π» Usage
import torch
from peft import PeftModel
from transformers import AutoModelForSeq2SeqLM, NllbTokenizerFast
# Load Base
model_id = "facebook/nllb-200-distilled-600M"
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
# Load Adapter
adapter_id = "mutaician/nllb-swahili-kalenjin-v3"
model = PeftModel.from_pretrained(model, adapter_id)
# Load Tokenizer
tokenizer = NllbTokenizerFast.from_pretrained(adapter_id)
tokenizer.src_lang = "swa_Latn"
text = "Habari yako?"
inputs = tokenizer(text, return_tensors="pt")
target_lang_id = tokenizer.convert_tokens_to_ids("kln_Latn")
with torch.no_grad():
generated_tokens = model.generate(
**inputs,
forced_bos_token_id=target_lang_id,
num_beams=5,
early_stopping=True,
max_length=256
)
print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0])
- Downloads last month
- 45
Model tree for mutaician/nllb-swahili-kalenjin-v3
Base model
facebook/nllb-200-distilled-600M