nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 Text Generation • 124B • Updated 3 days ago • 486k • 328
FreedomIntelligence/medical-o1-reasoning-SFT Viewer • Updated Apr 22, 2025 • 90.1k • 7.14k • 1.08k
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models Paper • 2406.13233 • Published Jun 19, 2024 • 1