Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 19 days ago • 130
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published 21 days ago • 135
Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published Mar 12 • 22
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published 27 days ago • 308
Lost in Backpropagation: The LM Head is a Gradient Bottleneck Paper • 2603.10145 • Published Mar 10 • 13
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs Paper • 2603.05890 • Published Mar 6 • 93
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 352
Self-Hinting Language Models Enhance Reinforcement Learning Paper • 2602.03143 • Published Feb 3 • 31
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published Jan 30 • 110
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Paper • 2601.15165 • Published Jan 21 • 73