ur-en Marian model
Model summary
- Direction:
ur -> en - Architecture: Marian transformer
- Subword setup: SentencePiece
spm12k-spm12k - Primary uploaded checkpoint:
best-bleu - Training dataset selection:
NLLB CCMatrix CCAligned OpenSubtitles Tanzil XLEnt \ - Validation set:
openlanguagedata_flores_plus - Test set recipe:
openlanguagedata_flores_plus
Best validation metrics seen in training logs
- BLEU: 38.3259 at epoch 5 / update 277500
- CHRF: 60.8042 at epoch 5 / update 277500
- PERPLEXITY: 287.857 at epoch 1 / update 2500
Files
.gitattributescurated-floresdev.spm12k-spm12k.transformer.model1.npz.best-bleu.npzbest-bleu.decoder.ymlcurated-floresdev.spm12k-spm12k.vocab.ymlopus.src.spm12k-modelopus.trg.spm12k-modelcurated-floresdev.transformer.mkREADME.mdtranslate_with_marian.py
Usage
This is a raw Marian model, not a Transformers conversion. To run it you need marian-decoder and spm_encode available locally.
Example:
python translate_with_marian.py input.txt -o output.txt --checkpoint best-bleu
You can also point to custom binaries:
python translate_with_marian.py input.txt -o output.txt \
--marian-decoder /path/to/marian-decoder \
--spm-encode /path/to/spm_encode
Notes
- The decoder configs in this repo were rewritten to use relative paths so they work from a downloaded Hub snapshot.
- Review dataset and license compatibility before redistributing the model publicly.