Benchmark comparison with stronger models

by RhiteshKS - opened Mar 12

•

Hi, thank you for releasing the model and the benchmark results.
I was wondering how Param-2-17B-A2.4B performs when compared with stronger models in the 15B–30B parameter range (for example Qwen2.5-14B/32B, Mixtral, etc.). The models currently shown in the benchmark table appear to be relatively lightweight or distilled variants, which makes it a bit difficult to understand where it stands.
Since Param is a MoE model with ~2.4B active parameters but larger total capacity, it would be interesting to see comparisons either with compute-matched models (like Qwen 2.5-3B) or with capacity-matched dense models in the mid-size range.
Could you share any results or evaluations against such models?
Thank you.

indiaiquest

18 days ago

Check this out - https://medium.com/@indiai/india-first-llms-on-indian-languages-sarvam-30b-and-param2-17b-6676b637f3ab

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment