marian-mt-en-ru-high-precision

This is a specialized version of the Helsinki-NLP/opus-mt-en-ru model, fine-tuned on a deeply cleaned parallel corpus. The primary focus of this version is semantic accuracy and translation purity.

Process Description

The model was trained on a selection of 4 million sentence pairs. The key feature of the data preparation was rigorous filtering using LaBSE semantic embeddings (with a similarity threshold > 0.8). This process eliminated "noisy" pairs, loose translations, and structural mismatches commonly found in standard open datasets.

As a result, the model demonstrates a stricter, "surgical" translation style that remains as close as possible to the original in both meaning and structure.

Final Metrics

Quality was verified on a hold-out test set of 1,000 pairs from the same data distribution, which the model did not encounter during training.

Metric	Original Model	High Precision Model	Improvement
SacreBLEU	40.62	46.14	+5.52
COMET (wmt22-da)	0.8945	0.9046	+0.0101

Usage

from transformers import MarianMTModel, MarianTokenizer

model_name = "KvaytG/marian-mt-en-ru-high-precision"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

# Example of precise translation
text = "Workplace harmony is crucial, emphasizing group effort rather than individual accomplishments."
inputs = tokenizer(text, return_tensors="pt", padding=True)
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))

License

This model is released under the Apache License 2.0.

Citation

@misc{kvaytg_marian_mt_en_ru_high_precision,
  author       = {KvaytG},
  title        = {High-precision English-Russian MarianMT model},
  year         = {2026},
  publisher    = {Hugging Face},
  journal      = {Hugging Face Models},
  url          = {https://huggingface.co/KvaytG/marian-mt-en-ru-high-precision},
  note         = {Fine-tuned on 4 million LaBSE-filtered (>0.8) sentence pairs from the en-ru-parallel-20m corpus. Base model: Helsinki-NLP/opus-mt-en-ru.}
}

Downloads last month: 302

Safetensors

Model size

76.2M params

Tensor type

F32

Model tree for KvaytG/marian-mt-en-ru-high-precision

Base model

Helsinki-NLP/opus-mt-en-ru

Finetuned

(41)

this model

Finetunes

2 models

KvaytG
/

marian-mt-en-ru-high-precision