SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training Paper • 2605.08738 • Published 4 days ago • 7
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer Paper • 2509.16197 • Published Sep 19, 2025 • 58