TartarusXXX
/

ur-en-marian

Model card Files Files and versions

ur-en Marian model

Model summary

Direction: ur -> en
Architecture: Marian transformer
Subword setup: SentencePiece spm12k-spm12k
Primary uploaded checkpoint: best-bleu
Training dataset selection: NLLB CCMatrix CCAligned OpenSubtitles Tanzil XLEnt \
Validation set: openlanguagedata_flores_plus
Test set recipe: openlanguagedata_flores_plus

Best validation metrics seen in training logs

BLEU: 38.3259 at epoch 5 / update 277500
CHRF: 60.8042 at epoch 5 / update 277500
PERPLEXITY: 287.857 at epoch 1 / update 2500

Files

.gitattributes
curated-floresdev.spm12k-spm12k.transformer.model1.npz.best-bleu.npz
best-bleu.decoder.yml
curated-floresdev.spm12k-spm12k.vocab.yml
opus.src.spm12k-model
opus.trg.spm12k-model
curated-floresdev.transformer.mk
README.md
translate_with_marian.py

Usage

This is a raw Marian model, not a Transformers conversion. To run it you need marian-decoder and spm_encode available locally.

Example:

python translate_with_marian.py input.txt -o output.txt --checkpoint best-bleu

You can also point to custom binaries:

python translate_with_marian.py input.txt -o output.txt \
  --marian-decoder /path/to/marian-decoder \
  --spm-encode /path/to/spm_encode

Notes

The decoder configs in this repo were rewritten to use relative paths so they work from a downloaded Hub snapshot.
Review dataset and license compatibility before redistributing the model publicly.

Downloads last month: -; Downloads are not tracked for this model. How to track