TV2EN Translation Model - Stage A: Translation LoRA adapter trained on Tuvaluan-English parallel corpus

Model Details

Model Description

Stage A: Translation LoRA adapter trained on Tuvaluan-English parallel corpus

Base Model: Qwen/Qwen3-30B-A3B-Base Training Infrastructure: Tinker (Thinking Machines Lab) Fine-tuning Method: LoRA (Low-Rank Adaptation) Model Type: Encoder-Decoder/Decoder-only with translation adapter

Model Architecture

LoRA Rank: r=32
Max Sequence Length: 2048 tokens
Target Modules: Attention and MLP layers
Scaling: Adapted for low-resource Tuvaluan data

Capabilities

Bidirectional Translation: Tuvaluan ↔ English
Domain Coverage: Religious texts, articles, daily devotionals
Language Pair: Tuvaluan (TVL) ↔ English (EN)

Model Performance

Evaluation Results

Overall Performance

chrF++: 64.49
BLEU: 46.74
Exact Match: 2.78%
Test Set Size: 1,260 examples

By Translation Direction

TVL→EN: chrF++ 59.88, BLEU 42.07
EN→TVL: chrF++ 68.18, BLEU 49.92

By Domain

Articles: chrF++ N/A
Bible: chrF++ N/A
Daily Text: chrF++ N/A

Training Details

Data

Training Set: 75,619 examples (~19M tokens)
Validation Set: Held-out evaluation set
Test Set: 1,260 examples covering multiple domains
Source: Watch Tower Library Online (JW.org)

Training Configuration

Learning Rate: 2e-4
Batch Size: 64
Training Duration: 3 epochs (~3,546 steps)
Optimizer: AdamW with learning rate scheduling
Hardware: Tinker distributed training

Training Curves

Best validation loss: 0.5552 (step 2000)
Final validation loss: 0.6261
Training trend: Convergence visible, slight overfitting in epoch 3

Intended Use

Primary Use Cases

Machine Translation: Translate between Tuvaluan and English
Language Technology Research: Low-resource NLP, multilingual models
Bible/Religious Text Translation: Specialized domain adaptation
Cross-lingual Transfer: Leverage for related languages

Out-of-Scope Uses

Deployment without fine-tuning on additional domain data
Use with languages other than Tuvaluan/English
Real-time production without appropriate latency evaluation

Limitations & Bias

Known Limitations

Low-resource language: Trained on limited Tuvaluan data (~11k speakers)
Specific domains: Optimized for religious/educational texts
Domain shift: May perform poorly on colloquial or technical content
May 2025 issue: Some training data has truncated Tuvaluan text

Potential Biases

Content sourced from Watchtower publications reflects those theological positions
Religious/doctrinal terminology bias
Potential gender representation patterns from source materials

How to Use

With Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-30B-A3B-Base")
model = PeftModel.from_pretrained(base_model, "FriezaForce/tvl-en-llm-translation-stage-a")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-30B-A3B-Base")

# Inference
prompt = "Translate to English: [Tuvaluan text]"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))

Citation

@model{tv2en_translation,
  title={Stage A: Translation LoRA adapter trained on Tuvaluan-English parallel corpus},
  author={cuboniks},
  year={2026},
  publisher={Hugging Face Hub},
  url={https://huggingface.co/FriezaForce/tvl-en-llm-translation-stage-a}
}

Model Card Contact

For questions or feedback, please open an issue on the Hub or contact the model authors.

Last Updated: 2026-03-17

Downloads last month: -; Downloads are not tracked for this model. How to track

Evaluation results

chrF++
self-reported

64.490
BLEU
self-reported

46.740