TV2EN Translation Model - Stage A: Translation LoRA adapter trained on Tuvaluan-English parallel corpus

Model Details

Model Description

Stage A: Translation LoRA adapter trained on Tuvaluan-English parallel corpus

Base Model: Qwen/Qwen3-30B-A3B-Base Training Infrastructure: Tinker (Thinking Machines Lab) Fine-tuning Method: LoRA (Low-Rank Adaptation) Model Type: Encoder-Decoder/Decoder-only with translation adapter

Model Architecture

  • LoRA Rank: r=32
  • Max Sequence Length: 2048 tokens
  • Target Modules: Attention and MLP layers
  • Scaling: Adapted for low-resource Tuvaluan data

Capabilities

  • Bidirectional Translation: Tuvaluan ↔ English
  • Domain Coverage: Religious texts, articles, daily devotionals
  • Language Pair: Tuvaluan (TVL) ↔ English (EN)

Model Performance

Evaluation Results

Overall Performance

  • chrF++: 64.49
  • BLEU: 46.74
  • Exact Match: 2.78%
  • Test Set Size: 1,260 examples

By Translation Direction

  • TVL→EN: chrF++ 59.88, BLEU 42.07
  • EN→TVL: chrF++ 68.18, BLEU 49.92

By Domain

  • Articles: chrF++ N/A
  • Bible: chrF++ N/A
  • Daily Text: chrF++ N/A

Training Details

Data

  • Training Set: 75,619 examples (~19M tokens)
  • Validation Set: Held-out evaluation set
  • Test Set: 1,260 examples covering multiple domains
  • Source: Watch Tower Library Online (JW.org)

Training Configuration

  • Learning Rate: 2e-4
  • Batch Size: 64
  • Training Duration: 3 epochs (~3,546 steps)
  • Optimizer: AdamW with learning rate scheduling
  • Hardware: Tinker distributed training

Training Curves

  • Best validation loss: 0.5552 (step 2000)
  • Final validation loss: 0.6261
  • Training trend: Convergence visible, slight overfitting in epoch 3

Intended Use

Primary Use Cases

  1. Machine Translation: Translate between Tuvaluan and English
  2. Language Technology Research: Low-resource NLP, multilingual models
  3. Bible/Religious Text Translation: Specialized domain adaptation
  4. Cross-lingual Transfer: Leverage for related languages

Out-of-Scope Uses

  • Deployment without fine-tuning on additional domain data
  • Use with languages other than Tuvaluan/English
  • Real-time production without appropriate latency evaluation

Limitations & Bias

Known Limitations

  • Low-resource language: Trained on limited Tuvaluan data (~11k speakers)
  • Specific domains: Optimized for religious/educational texts
  • Domain shift: May perform poorly on colloquial or technical content
  • May 2025 issue: Some training data has truncated Tuvaluan text

Potential Biases

  • Content sourced from Watchtower publications reflects those theological positions
  • Religious/doctrinal terminology bias
  • Potential gender representation patterns from source materials

How to Use

With Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-30B-A3B-Base")
model = PeftModel.from_pretrained(base_model, "FriezaForce/tvl-en-llm-translation-stage-a")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-30B-A3B-Base")

# Inference
prompt = "Translate to English: [Tuvaluan text]"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))

Citation

@model{tv2en_translation,
  title={Stage A: Translation LoRA adapter trained on Tuvaluan-English parallel corpus},
  author={cuboniks},
  year={2026},
  publisher={Hugging Face Hub},
  url={https://huggingface.co/FriezaForce/tvl-en-llm-translation-stage-a}
}

Model Card Contact

For questions or feedback, please open an issue on the Hub or contact the model authors.


Last Updated: 2026-03-17

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results