Tiri Tahi 3B - Genesis SFT (EN-DE/FR)

A supervised fine-tuned version of straker/tiri-tahi-3b-base-pt-bf16 for machine translation with translation memory (fuzzy match) augmentation.

Model Details

  • Base model: Tiri Tahi 3B (MADLAD-400 architecture, T5-based encoder-decoder)
  • Task: Machine translation with fuzzy match context
  • Language pairs: English-German (EN-DE), English-French (EN-FR)
  • Parameters: ~3B

Training Data

The model was fine-tuned on 72,230 translation pairs with 4,012 held out for validation:

Language Pair Training Samples
EN-DE 44,592
EN-FR 27,638

Each training example includes up to 2 fuzzy matches from translation memory, providing the model with reference translations at varying similarity scores to improve output quality.

Input Format

The model uses the MADLAD-400 <2xx> prefix format with fuzzy match context prepended to the source text:

<2de>source text to translate

When fuzzy matches are available, they are prepended as context to help guide the translation.

Training Procedure

Hyperparameters

Parameter Value
Learning rate 1e-4
LR scheduler Cosine
Warmup steps 50
Batch size 32
Epochs 5
Weight decay 0.01
Label smoothing 0.05
Max source length 1024
Max target length 256
Precision bf16
Gradient checkpointing Enabled
Optimizer AdamW (fused)

Training Results

Metric Value
Final train loss 0.49
Training time ~2.5 hours (across resumed runs)
Train samples/sec 79.18

Intended Uses

  • Machine translation for EN-DE and EN-FR language pairs
  • Translation memory-augmented machine translation (leveraging fuzzy matches)
  • CAT (Computer-Assisted Translation) tool integration

Limitations

  • Only trained on EN-DE and EN-FR; other language pairs may produce lower quality output
  • Performance depends on quality and relevance of provided fuzzy matches
  • Not evaluated on standard MT benchmarks (BLEU, COMET) in this release

Framework Versions

  • Transformers 4.57.6
  • PyTorch 2.11.0+cu128
  • Datasets 4.8.4
  • Tokenizers 0.22.2
Downloads last month
93
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for straker/tiri-tahi-3b-genesis-sft-en-de-fr

Finetuned
(1)
this model