Text Generation
GGUF
English
256k context
Qwen3
Mixture of Experts
MOE
MOE Dense
2 experts
4Bx12
All use cases
bfloat16
finetune
thinking
reasoning
GPT-5.1-High-Reasoning-Distill
Gemini-3-Pro-Preview-High-Reasoning-Distill
Claude-4.5-Opus-High-Reasoning-Distill
Claude-Sonnet-4-Reasoning-Distill
Kimi-K2-Thinking-Distill
Gemini-2.5-Flash-Distill
Gemini-2.5-Flash-Lite-Preview-Distill
gpt-oss-120b-Distill
GLM-Flash-4.6-Distill
Open-R1-Distill
Command-A-Reasoning-Distill
conversational
Benchmarking
#2
by daniel-dona - opened
Have you tested the model against any well-known benchmarks?
All the models (13) in the moe were bench marked prior to "moeing" them together.
Bench marking (as it is now) the model due to gating AND number of experts (and option to activate/deactivate experts) would be variable.