LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 16 days ago • 141
Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities Paper • 2601.21937 • Published Jan 29 • 19
CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation Paper • 2602.01660 • Published Feb 2 • 7
CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation Paper • 2602.01660 • Published Feb 2 • 7 • 3
CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation Paper • 2602.01660 • Published Feb 2 • 7
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 102
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs Paper • 2510.10689 • Published Oct 12, 2025 • 47
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction Paper • 2508.11987 • Published Aug 16, 2025 • 72
Efficient Agents: Building Effective Agents While Reducing Cost Paper • 2508.02694 • Published Jul 24, 2025 • 86