ur-en Marian model

Model summary

  • Direction: ur -> en
  • Architecture: Marian transformer
  • Subword setup: SentencePiece spm12k-spm12k
  • Primary uploaded checkpoint: best-bleu
  • Training dataset selection: NLLB CCMatrix CCAligned OpenSubtitles Tanzil XLEnt \
  • Validation set: openlanguagedata_flores_plus
  • Test set recipe: openlanguagedata_flores_plus

Best validation metrics seen in training logs

  • BLEU: 38.3259 at epoch 5 / update 277500
  • CHRF: 60.8042 at epoch 5 / update 277500
  • PERPLEXITY: 287.857 at epoch 1 / update 2500

Files

  • .gitattributes
  • curated-floresdev.spm12k-spm12k.transformer.model1.npz.best-bleu.npz
  • best-bleu.decoder.yml
  • curated-floresdev.spm12k-spm12k.vocab.yml
  • opus.src.spm12k-model
  • opus.trg.spm12k-model
  • curated-floresdev.transformer.mk
  • README.md
  • translate_with_marian.py

Usage

This is a raw Marian model, not a Transformers conversion. To run it you need marian-decoder and spm_encode available locally.

Example:

python translate_with_marian.py input.txt -o output.txt --checkpoint best-bleu

You can also point to custom binaries:

python translate_with_marian.py input.txt -o output.txt \
  --marian-decoder /path/to/marian-decoder \
  --spm-encode /path/to/spm_encode

Notes

  • The decoder configs in this repo were rewritten to use relative paths so they work from a downloaded Hub snapshot.
  • Review dataset and license compatibility before redistributing the model publicly.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support