YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Swahili-English Translation Model (General Domain Expansion)
This model is a fine-tuned version of Helsinki-NLP/opus-mt-mul-en on a large corpus of general Swahili-English translations while maintaining helpline translation quality.
Model Details
- Base Model: Helsinki-NLP/opus-mt-mul-en
- Language Pair: Swahili (sw) → English (en)
- Training Data:
- CCAligned general corpus (~200k+ samples)
- Helpline conversation data (oversampled 5x for domain retention)
- Special Features:
- Domain-aware with
<HELPLINE>and<GENERAL>tags - Optimized for both general and helpline translations
- Knowledge distillation from helpline-specialized model
- Domain-aware with
Training Procedure
Memory Optimizations
- CPU teacher offloading
- Gradient checkpointing
- Batch size: 8, Gradient accumulation: 16
Training Hyperparameters
- Learning rate: 1.5e-5
- Epochs: 1
- Optimizer: AdamW
- LR Scheduler: Cosine with warmup
Performance
| Domain | BLEU | chrF |
|---|---|---|
| Helpline | X.XX | XX.X |
| General | X.XX | XX.X |
(Replace with actual metrics from training)
Usage
from transformers import MarianMTModel, MarianTokenizer
# Load model and tokenizer
model_name = "marlonbino/sw-en-opus-finetuned"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
# For general translations
text = "<GENERAL> Habari za asubuhi"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation) # "Good morning"
# For helpline translations
text = "<HELPLINE> Ninahitaji msaada wa haraka"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation) # "I need urgent help"
Limitations
- Optimized for Swahili to English (not bidirectional)
- Best performance with domain tags ( or )
- May struggle with very technical or specialized vocabulary outside training domains
Training Details
- Framework: Transformers + PyTorch
- Hardware: Single GPU training
- Training Time: ~X hours
- Checkpoint Strategy: Every 500 steps for power failure recovery
Citation
If you use this model, please cite:
@misc{{sw-en-general-expanded,
author = {{Your Name/Organization}},
title = {{Swahili-English General Domain Translation Model}},
year = {{2025}},
publisher = {{HuggingFace}},
url = {{https://huggingface.co/marlonbino/sw-en-opus-finetuned}}
}}
License
This model inherits the license from Helsinki-NLP/opus-mt-mul-en.
Contact
For questions or issues, please open an issue on the model repository.
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support