mSFT: Addressing Dataset Mixtures Overfiting Heterogeneously in Multi-task SFT Paper • 2603.21606 • Published 22 days ago • 39
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 134
Running on CPU Upgrade Featured 3.1k The Smol Training Playbook 📚 3.1k The secrets to building world-class LLMs