Running Featured 44 Distilling 100B+ Models 40x Faster with TRL 📝 44 TRL distillation for 100B+ teachers, 40x faster
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 7 days ago • 14
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 13 days ago • 843
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 7 days ago • 14
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 7 days ago • 14
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 7 days ago • 14
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 7 days ago • 14