How does this compare to HY-MT1.5-7B?

by rsbdev - opened Jan 23

Jan 23

Hi there, I'd like to thank you for the detailed instructions on how to get this properly working with llama.cpp, I couldn't find this information anywhere else. Right now I'm using Tencent's HY-MT1.5-7B for my translation workflows and it does a pretty decent job for most languages I use it for, have you done any tests comparing the two and would this be a substantial improvement?

steampunque

Owner Jan 23

Thanks! I have not tested the HY-MT model yet, but it looks like it runs on llama.cpp so I can make a quant and see how it does. I have BLEU results on opus and flores200 on my HF benchlm space for various MT models. This model is benching better than plamo but worse than madlad400. I finalized my recommended minimalist prompt templates today for this model tested across the 4B, 12B, and 27B models to work. There is no information anywhere on this just have to experiment with it, but the model rides on top of gemma3 so at least has some instruct following ability.

steampunque

Owner Jan 24

@rsbdev I ran some evals on a HY-MT1.5-7B quant (very close to lossless Q6_K_H mixed precision quant) results are here: https://huggingface.co/spaces/steampunque/benchlm . It can't translate en->de and translategemma12b benches better on most of the tested languages. How it performs out in the wild on real translation tasks against other models would have to be evaluated manually, the evals may or may not be realistic on picking up good translations due to limitations of the comparison metric (bleu). Most likely need a human fluent in source and target languages to accurately assess translation quality vs others.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment