marian-mt-en-ru-high-precision
This is a specialized version of the Helsinki-NLP/opus-mt-en-ru model, fine-tuned on a deeply cleaned parallel corpus. The primary focus of this version is semantic accuracy and translation purity.
Process Description
The model was trained on a selection of 4 million sentence pairs. The key feature of the data preparation was rigorous filtering using LaBSE semantic embeddings (with a similarity threshold > 0.8). This process eliminated "noisy" pairs, loose translations, and structural mismatches commonly found in standard open datasets.
As a result, the model demonstrates a stricter, "surgical" translation style that remains as close as possible to the original in both meaning and structure.
Final Metrics
Quality was verified on a hold-out test set of 1,000 pairs from the same data distribution, which the model did not encounter during training.
| Metric | Original Model | High Precision Model | Improvement |
|---|---|---|---|
| SacreBLEU | 40.62 | 46.14 | +5.52 |
| COMET (wmt22-da) | 0.8945 | 0.9046 | +0.0101 |
Usage
from transformers import MarianMTModel, MarianTokenizer
model_name = "KvaytG/marian-mt-en-ru-high-precision"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
# Example of precise translation
text = "Workplace harmony is crucial, emphasizing group effort rather than individual accomplishments."
inputs = tokenizer(text, return_tensors="pt", padding=True)
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))
License
This model is released under the Apache License 2.0.
Citation
@misc{kvaytg_marian_mt_en_ru_high_precision,
author = {KvaytG},
title = {High-precision English-Russian MarianMT model},
year = {2026},
publisher = {Hugging Face},
journal = {Hugging Face Models},
url = {https://huggingface.co/KvaytG/marian-mt-en-ru-high-precision},
note = {Fine-tuned on 4 million LaBSE-filtered (>0.8) sentence pairs from the en-ru-parallel-20m corpus. Base model: Helsinki-NLP/opus-mt-en-ru.}
}
- Downloads last month
- 302