Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures Paper • 1608.06037 • Published Aug 22, 2016 • 1
CIFAR10 to Compare Visual Recognition Performance between Deep Neural Networks and Humans Paper • 1811.07270 • Published Nov 18, 2018 • 1
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17, 2025 • 263
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Paper • 2506.07044 • Published Jun 8, 2025 • 114
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25, 2024 • 102
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model Paper • 2503.07703 • Published Mar 10, 2025 • 37
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers Paper • 2503.00865 • Published Mar 2, 2025 • 64
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published Feb 19, 2025 • 69
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10, 2025 • 153
Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published Jan 14, 2025 • 62
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published Jan 9, 2025 • 98
TinyLLaVA: A Framework of Small-scale Large Multimodal Models Paper • 2402.14289 • Published Feb 22, 2024 • 20