NLLB Trilingual Translation Model (English-Vietnamese-Japanese)
A fine-tuned and INT8 quantized NLLB-200-distilled-600M model for high-quality translation between English, Vietnamese, and Japanese. Optimized for fast CPU inference with ONNX Runtime.
๐ฏ Highlights
- 75% smaller: 7GB โ 1.8GB (INT8 quantization)
- 48% faster: Optimized for CPU inference
- All 6 directions: ENโVI, ENโJA, VIโJA
- Production ready: ONNX format with Optimum integration
๐ Performance
| Metric | FP32 | INT8 (this model) |
|---|---|---|
| Model Size | 7 GB | 1.8 GB |
| Short Text | 0.44s | 0.26s |
| Long Text | 2.56s | 1.33s |
Benchmarked on CPU with num_beams=1
๐ Supported Languages
| Source | Target | Code |
|---|---|---|
| English | Vietnamese | eng_Latn โ vie_Latn |
| English | Japanese | eng_Latn โ jpn_Jpan |
| Vietnamese | English | vie_Latn โ eng_Latn |
| Vietnamese | Japanese | vie_Latn โ jpn_Jpan |
| Japanese | English | jpn_Jpan โ eng_Latn |
| Japanese | Vietnamese | jpn_Jpan โ vie_Latn |
๐ Example Translations
| Direction | Input | Output |
|---|---|---|
| ENโVI | Hello, how are you? | Chร o, bแบกn khแปe khรดng? |
| ENโJA | Hello, how are you? | ใใใซใกใฏใใๅ ๆฐใงใใ๏ผ |
| VIโEN | Thแปi tiแบฟt hรดm nay rแบฅt ฤแบนp. | The weather is very beautiful today. |
| VIโJA | Thแปi tiแบฟt hรดm nay rแบฅt ฤแบนp. | ไปๆฅใฎๅคฉๆฐใฏใจใฆใ็พใใใงใใ |
| JAโEN | ไปๆฅใฎๅคฉๆฐใฏใจใฆใ่ฏใใงใใ | The weather is very good today. |
| JAโVI | ไปๆฅใฎๅคฉๆฐใฏใจใฆใ่ฏใใงใใ | Thแปi tiแบฟt hรดm nay rแบฅt tแปt. |
๐ Quick Start
Python (Optimum + ONNX Runtime)
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSeq2SeqLM
model_id = "sotalab/nllb-trilingual-en-vi-ja-onnx"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForSeq2SeqLM.from_pretrained(
model_id,
encoder_file_name="encoder_model_quantized.onnx",
decoder_file_name="decoder_model_quantized.onnx",
decoder_with_past_file_name="decoder_with_past_model_quantized.onnx",
)
def translate(text, src_lang, tgt_lang):
tokenizer.src_lang = src_lang
inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True)
tgt_lang_id = tokenizer.convert_tokens_to_ids(tgt_lang)
outputs = model.generate(
**inputs,
forced_bos_token_id=tgt_lang_id,
max_new_tokens=256,
num_beams=1,
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# English to Vietnamese
print(translate("Hello, how are you?", "eng_Latn", "vie_Latn"))
# Output: Chร o, bแบกn khแปe khรดng?
# English to Japanese
print(translate("Hello, how are you?", "eng_Latn", "jpn_Jpan"))
# Output: ใใใซใกใฏใใๅ
ๆฐใงใใ๏ผ
# Vietnamese to Japanese
print(translate("Tรดi thรญch hแปc tiแบฟng Nhแบญt.", "vie_Latn", "jpn_Jpan"))
# Output: ็งใฏๆฅๆฌ่ชใๅญฆใถใฎใๅฅฝใใงใใ
Batch Translation
def translate_batch(texts, src_lang, tgt_lang):
tokenizer.src_lang = src_lang
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=256)
tgt_lang_id = tokenizer.convert_tokens_to_ids(tgt_lang)
outputs = model.generate(**inputs, forced_bos_token_id=tgt_lang_id, max_new_tokens=256, num_beams=1)
return [tokenizer.decode(out, skip_special_tokens=True) for out in outputs]
texts = ["Good morning", "How are you?", "Thank you very much"]
results = translate_batch(texts, "eng_Latn", "vie_Latn")
for text, result in zip(texts, results):
print(f"{text} โ {result}")
Translator Class
class TrilingualTranslator:
LANG_CODES = {"en": "eng_Latn", "vi": "vie_Latn", "ja": "jpn_Jpan"}
def __init__(self, model_id="sotalab/nllb-trilingual-en-vi-ja-onnx"):
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSeq2SeqLM
self.tokenizer = AutoTokenizer.from_pretrained(model_id)
self.model = ORTModelForSeq2SeqLM.from_pretrained(
model_id,
encoder_file_name="encoder_model_quantized.onnx",
decoder_file_name="decoder_model_quantized.onnx",
decoder_with_past_file_name="decoder_with_past_model_quantized.onnx",
)
def translate(self, text, src="en", tgt="vi"):
src_code = self.LANG_CODES[src]
tgt_code = self.LANG_CODES[tgt]
self.tokenizer.src_lang = src_code
inputs = self.tokenizer(text, return_tensors="pt", max_length=256, truncation=True)
tgt_lang_id = self.tokenizer.convert_tokens_to_ids(tgt_code)
outputs = self.model.generate(**inputs, forced_bos_token_id=tgt_lang_id, max_new_tokens=256, num_beams=1)
return self.tokenizer.decode(outputs[0], skip_special_tokens=True)
# Usage
translator = TrilingualTranslator()
print(translator.translate("Hello world", "en", "vi"))
print(translator.translate("Hello world", "en", "ja"))
๐ Model Files
| File | Size | Description |
|---|---|---|
encoder_model_quantized.onnx |
399 MB | Encoder (INT8) |
decoder_model_quantized.onnx |
698 MB | Decoder (INT8) |
decoder_with_past_model_quantized.onnx |
674 MB | Decoder with KV-cache (INT8) |
tokenizer.json |
31 MB | Tokenizer |
| Total | ~1.8 GB |
๐ง Training Details
- Base Model: facebook/nllb-200-distilled-600M
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Training Hardware: NVIDIA H100 / A100
- Quantization: Dynamic INT8 (ONNX Runtime)
- Optimization: Optimum library
๐ฎ Live Demo
Try the model: Trilingual Translator Space
โ ๏ธ Limitations
- Optimized for general-purpose translation
- May not handle highly specialized technical content perfectly
- Best results with sentences under 256 tokens
๐ License
This model is released for Research Only.
๐ Acknowledgments
- Meta AI for NLLB-200
- Hugging Face for Optimum and Transformers
- ONNX Runtime for inference optimization
๐ Citation
@misc{nllb-trilingual-2024,
author = {SotaLab},
title = {NLLB Trilingual Translation Model (EN-VI-JA) - INT8 ONNX},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/sotalab/nllb-trilingual-en-vi-ja-onnx}
}
- Downloads last month
- 20
Model tree for belumind/nllb-trilingual-en-vi-ja-onnx
Base model
facebook/nllb-200-distilled-600M