TV2EN Translation Model - Stage A: Translation LoRA adapter trained on Tuvaluan-English parallel corpus
Model Details
Model Description
Stage A: Translation LoRA adapter trained on Tuvaluan-English parallel corpus
Base Model: Qwen/Qwen3-30B-A3B-Base Training Infrastructure: Tinker (Thinking Machines Lab) Fine-tuning Method: LoRA (Low-Rank Adaptation) Model Type: Encoder-Decoder/Decoder-only with translation adapter
Model Architecture
- LoRA Rank: r=32
- Max Sequence Length: 2048 tokens
- Target Modules: Attention and MLP layers
- Scaling: Adapted for low-resource Tuvaluan data
Capabilities
- Bidirectional Translation: Tuvaluan ↔ English
- Domain Coverage: Religious texts, articles, daily devotionals
- Language Pair: Tuvaluan (TVL) ↔ English (EN)
Model Performance
Evaluation Results
Overall Performance
- chrF++: 64.49
- BLEU: 46.74
- Exact Match: 2.78%
- Test Set Size: 1,260 examples
By Translation Direction
- TVL→EN: chrF++ 59.88, BLEU 42.07
- EN→TVL: chrF++ 68.18, BLEU 49.92
By Domain
- Articles: chrF++ N/A
- Bible: chrF++ N/A
- Daily Text: chrF++ N/A
Training Details
Data
- Training Set: 75,619 examples (~19M tokens)
- Validation Set: Held-out evaluation set
- Test Set: 1,260 examples covering multiple domains
- Source: Watch Tower Library Online (JW.org)
Training Configuration
- Learning Rate: 2e-4
- Batch Size: 64
- Training Duration: 3 epochs (~3,546 steps)
- Optimizer: AdamW with learning rate scheduling
- Hardware: Tinker distributed training
Training Curves
- Best validation loss: 0.5552 (step 2000)
- Final validation loss: 0.6261
- Training trend: Convergence visible, slight overfitting in epoch 3
Intended Use
Primary Use Cases
- Machine Translation: Translate between Tuvaluan and English
- Language Technology Research: Low-resource NLP, multilingual models
- Bible/Religious Text Translation: Specialized domain adaptation
- Cross-lingual Transfer: Leverage for related languages
Out-of-Scope Uses
- Deployment without fine-tuning on additional domain data
- Use with languages other than Tuvaluan/English
- Real-time production without appropriate latency evaluation
Limitations & Bias
Known Limitations
- Low-resource language: Trained on limited Tuvaluan data (~11k speakers)
- Specific domains: Optimized for religious/educational texts
- Domain shift: May perform poorly on colloquial or technical content
- May 2025 issue: Some training data has truncated Tuvaluan text
Potential Biases
- Content sourced from Watchtower publications reflects those theological positions
- Religious/doctrinal terminology bias
- Potential gender representation patterns from source materials
How to Use
With Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-30B-A3B-Base")
model = PeftModel.from_pretrained(base_model, "FriezaForce/tvl-en-llm-translation-stage-a")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-30B-A3B-Base")
# Inference
prompt = "Translate to English: [Tuvaluan text]"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))
Citation
@model{tv2en_translation,
title={Stage A: Translation LoRA adapter trained on Tuvaluan-English parallel corpus},
author={cuboniks},
year={2026},
publisher={Hugging Face Hub},
url={https://huggingface.co/FriezaForce/tvl-en-llm-translation-stage-a}
}
Model Card Contact
For questions or feedback, please open an issue on the Hub or contact the model authors.
Last Updated: 2026-03-17
Evaluation results
- chrF++self-reported64.490
- BLEUself-reported46.740