Tiri Tahi 3B - Genesis SFT (EN-DE/FR)
A supervised fine-tuned version of straker/tiri-tahi-3b-base-pt-bf16 for machine translation with translation memory (fuzzy match) augmentation.
Model Details
- Base model: Tiri Tahi 3B (MADLAD-400 architecture, T5-based encoder-decoder)
- Task: Machine translation with fuzzy match context
- Language pairs: English-German (EN-DE), English-French (EN-FR)
- Parameters: ~3B
Training Data
The model was fine-tuned on 72,230 translation pairs with 4,012 held out for validation:
| Language Pair | Training Samples |
|---|---|
| EN-DE | 44,592 |
| EN-FR | 27,638 |
Each training example includes up to 2 fuzzy matches from translation memory, providing the model with reference translations at varying similarity scores to improve output quality.
Input Format
The model uses the MADLAD-400 <2xx> prefix format with fuzzy match context prepended to the source text:
<2de>source text to translate
When fuzzy matches are available, they are prepended as context to help guide the translation.
Training Procedure
Hyperparameters
| Parameter | Value |
|---|---|
| Learning rate | 1e-4 |
| LR scheduler | Cosine |
| Warmup steps | 50 |
| Batch size | 32 |
| Epochs | 5 |
| Weight decay | 0.01 |
| Label smoothing | 0.05 |
| Max source length | 1024 |
| Max target length | 256 |
| Precision | bf16 |
| Gradient checkpointing | Enabled |
| Optimizer | AdamW (fused) |
Training Results
| Metric | Value |
|---|---|
| Final train loss | 0.49 |
| Training time | ~2.5 hours (across resumed runs) |
| Train samples/sec | 79.18 |
Intended Uses
- Machine translation for EN-DE and EN-FR language pairs
- Translation memory-augmented machine translation (leveraging fuzzy matches)
- CAT (Computer-Assisted Translation) tool integration
Limitations
- Only trained on EN-DE and EN-FR; other language pairs may produce lower quality output
- Performance depends on quality and relevance of provided fuzzy matches
- Not evaluated on standard MT benchmarks (BLEU, COMET) in this release
Framework Versions
- Transformers 4.57.6
- PyTorch 2.11.0+cu128
- Datasets 4.8.4
- Tokenizers 0.22.2
- Downloads last month
- 93
Model tree for straker/tiri-tahi-3b-genesis-sft-en-de-fr
Base model
straker/tiri-tahi-3b-base-pt-bf16