EraX-Translator-V1.0-mlx-8bit

MLX-VLM 8bit conversion of erax-ai/EraX-Translator-V1.0 for Apple Silicon.

Notes

Converted locally with mlx-vlm 0.4.0.
Quantization result: 9.139 bits per weight.
Source model architecture: Gemma3ForConditionalGeneration.
This checkpoint is intended for translation tasks and was tested here on Vietnamese translation.
Local smoke test passed with mlx-vlm text generation.
Local evaluation was run on 5 Vietnamese translation cases covering English, German, modern Chinese, and Classical Chinese.

Conversion

python3 -m mlx_vlm convert \
  --hf-path erax-ai/EraX-Translator-V1.0 \
  --mlx-path /path/to/EraX-Translator-V1.0-mlx-8bit \
  --dtype bfloat16 \
  --quantize --q-bits 8

Quick Start

python3 - <<'PY'
from mlx_vlm import load
from mlx_vlm.generate import generate

model, processor = load('/path/to/EraX-Translator-V1.0-mlx-8bit')
messages = [
    {"role": "system", "content": "Bạn là trợ lý dịch thuật nhiều ngôn ngữ. Chỉ trả về bản dịch chính xác, không giải thích thêm."},
    {"role": "user", "content": "The weather is nice today, but the traffic is terrible.\n\nDịch sang tiếng Việt."},
]
prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
result = generate(model, processor, prompt, verbose=False, max_tokens=128, temperature=0.2, top_p=0.95, top_k=64)
print(result.text)
PY

Validation Summary

Case count: 5
Avg generation speed: 64.37 tok/s
Avg wall time: 2.83s
Max peak memory: 6.31 GB
Avg similarity to reference set: 0.2025
Preamble leakage: 0 cases

Translation Quality

Observed locally:

Good fit for everyday web chat, product copy, and normal news translation into Vietnamese.
Much faster and lighter than the bf16 variant.
Slightly weaker than bf16 on harder German phrasing, Classical Chinese, and strict name fidelity.

Caution

This model is a translation-tuned checkpoint. It is not intended as a general-purpose coding or math model, and difficult literary or historical material may still require human review.