Benchmark comparison with stronger models
#6
by RhiteshKS - opened
Hi, thank you for releasing the model and the benchmark results.
I was wondering how Param-2-17B-A2.4B performs when compared with stronger models in the 15Bβ30B parameter range (for example Qwen2.5-14B/32B, Mixtral, etc.). The models currently shown in the benchmark table appear to be relatively lightweight or distilled variants, which makes it a bit difficult to understand where it stands.
Since Param is a MoE model with ~2.4B active parameters but larger total capacity, it would be interesting to see comparisons either with compute-matched models (like Qwen 2.5-3B) or with capacity-matched dense models in the mid-size range.
Could you share any results or evaluations against such models?
Thank you.