NLLB Trilingual Translation Model (English-Vietnamese-Japanese)

A fine-tuned and INT8 quantized NLLB-200-distilled-600M model for high-quality translation between English, Vietnamese, and Japanese. Optimized for fast CPU inference with ONNX Runtime.

๐ŸŽฏ Highlights

  • 75% smaller: 7GB โ†’ 1.8GB (INT8 quantization)
  • 48% faster: Optimized for CPU inference
  • All 6 directions: ENโ†”VI, ENโ†”JA, VIโ†”JA
  • Production ready: ONNX format with Optimum integration

๐Ÿ“Š Performance

Metric FP32 INT8 (this model)
Model Size 7 GB 1.8 GB
Short Text 0.44s 0.26s
Long Text 2.56s 1.33s

Benchmarked on CPU with num_beams=1

๐ŸŒ Supported Languages

Source Target Code
English Vietnamese eng_Latn โ†’ vie_Latn
English Japanese eng_Latn โ†’ jpn_Jpan
Vietnamese English vie_Latn โ†’ eng_Latn
Vietnamese Japanese vie_Latn โ†’ jpn_Jpan
Japanese English jpn_Jpan โ†’ eng_Latn
Japanese Vietnamese jpn_Jpan โ†’ vie_Latn

๐Ÿ“ Example Translations

Direction Input Output
ENโ†’VI Hello, how are you? Chร o, bแบกn khแปe khรดng?
ENโ†’JA Hello, how are you? ใ“ใ‚“ใซใกใฏใ€ใŠๅ…ƒๆฐ—ใงใ™ใ‹๏ผŸ
VIโ†’EN Thแปi tiแบฟt hรดm nay rแบฅt ฤ‘แบนp. The weather is very beautiful today.
VIโ†’JA Thแปi tiแบฟt hรดm nay rแบฅt ฤ‘แบนp. ไปŠๆ—ฅใฎๅคฉๆฐ—ใฏใจใฆใ‚‚็พŽใ—ใ„ใงใ™ใ€‚
JAโ†’EN ไปŠๆ—ฅใฎๅคฉๆฐ—ใฏใจใฆใ‚‚่‰ฏใ„ใงใ™ใ€‚ The weather is very good today.
JAโ†’VI ไปŠๆ—ฅใฎๅคฉๆฐ—ใฏใจใฆใ‚‚่‰ฏใ„ใงใ™ใ€‚ Thแปi tiแบฟt hรดm nay rแบฅt tแป‘t.

๐Ÿš€ Quick Start

Python (Optimum + ONNX Runtime)

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSeq2SeqLM

model_id = "sotalab/nllb-trilingual-en-vi-ja-onnx"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForSeq2SeqLM.from_pretrained(
    model_id,
    encoder_file_name="encoder_model_quantized.onnx",
    decoder_file_name="decoder_model_quantized.onnx",
    decoder_with_past_file_name="decoder_with_past_model_quantized.onnx",
)

def translate(text, src_lang, tgt_lang):
    tokenizer.src_lang = src_lang
    inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True)
    tgt_lang_id = tokenizer.convert_tokens_to_ids(tgt_lang)
    
    outputs = model.generate(
        **inputs,
        forced_bos_token_id=tgt_lang_id,
        max_new_tokens=256,
        num_beams=1,
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# English to Vietnamese
print(translate("Hello, how are you?", "eng_Latn", "vie_Latn"))
# Output: Chร o, bแบกn khแปe khรดng?

# English to Japanese
print(translate("Hello, how are you?", "eng_Latn", "jpn_Jpan"))
# Output: ใ“ใ‚“ใซใกใฏใ€ใŠๅ…ƒๆฐ—ใงใ™ใ‹๏ผŸ

# Vietnamese to Japanese
print(translate("Tรดi thรญch hแปc tiแบฟng Nhแบญt.", "vie_Latn", "jpn_Jpan"))
# Output: ็งใฏๆ—ฅๆœฌ่ชžใ‚’ๅญฆใถใฎใŒๅฅฝใใงใ™ใ€‚

Batch Translation

def translate_batch(texts, src_lang, tgt_lang):
    tokenizer.src_lang = src_lang
    inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=256)
    tgt_lang_id = tokenizer.convert_tokens_to_ids(tgt_lang)
    
    outputs = model.generate(**inputs, forced_bos_token_id=tgt_lang_id, max_new_tokens=256, num_beams=1)
    return [tokenizer.decode(out, skip_special_tokens=True) for out in outputs]

texts = ["Good morning", "How are you?", "Thank you very much"]
results = translate_batch(texts, "eng_Latn", "vie_Latn")
for text, result in zip(texts, results):
    print(f"{text} โ†’ {result}")

Translator Class

class TrilingualTranslator:
    LANG_CODES = {"en": "eng_Latn", "vi": "vie_Latn", "ja": "jpn_Jpan"}
    
    def __init__(self, model_id="sotalab/nllb-trilingual-en-vi-ja-onnx"):
        from transformers import AutoTokenizer
        from optimum.onnxruntime import ORTModelForSeq2SeqLM
        
        self.tokenizer = AutoTokenizer.from_pretrained(model_id)
        self.model = ORTModelForSeq2SeqLM.from_pretrained(
            model_id,
            encoder_file_name="encoder_model_quantized.onnx",
            decoder_file_name="decoder_model_quantized.onnx",
            decoder_with_past_file_name="decoder_with_past_model_quantized.onnx",
        )
    
    def translate(self, text, src="en", tgt="vi"):
        src_code = self.LANG_CODES[src]
        tgt_code = self.LANG_CODES[tgt]
        
        self.tokenizer.src_lang = src_code
        inputs = self.tokenizer(text, return_tensors="pt", max_length=256, truncation=True)
        tgt_lang_id = self.tokenizer.convert_tokens_to_ids(tgt_code)
        
        outputs = self.model.generate(**inputs, forced_bos_token_id=tgt_lang_id, max_new_tokens=256, num_beams=1)
        return self.tokenizer.decode(outputs[0], skip_special_tokens=True)

# Usage
translator = TrilingualTranslator()
print(translator.translate("Hello world", "en", "vi"))
print(translator.translate("Hello world", "en", "ja"))

๐Ÿ“ Model Files

File Size Description
encoder_model_quantized.onnx 399 MB Encoder (INT8)
decoder_model_quantized.onnx 698 MB Decoder (INT8)
decoder_with_past_model_quantized.onnx 674 MB Decoder with KV-cache (INT8)
tokenizer.json 31 MB Tokenizer
Total ~1.8 GB

๐Ÿ”ง Training Details

  • Base Model: facebook/nllb-200-distilled-600M
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Training Hardware: NVIDIA H100 / A100
  • Quantization: Dynamic INT8 (ONNX Runtime)
  • Optimization: Optimum library

๐ŸŽฎ Live Demo

Try the model: Trilingual Translator Space

โš ๏ธ Limitations

  • Optimized for general-purpose translation
  • May not handle highly specialized technical content perfectly
  • Best results with sentences under 256 tokens

๐Ÿ“œ License

This model is released for Research Only.

๐Ÿ™ Acknowledgments

๐Ÿ“– Citation

@misc{nllb-trilingual-2024,
  author = {SotaLab},
  title = {NLLB Trilingual Translation Model (EN-VI-JA) - INT8 ONNX},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/sotalab/nllb-trilingual-en-vi-ja-onnx}
}
Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for belumind/nllb-trilingual-en-vi-ja-onnx

Quantized
(12)
this model

Space using belumind/nllb-trilingual-en-vi-ja-onnx 1