OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 352
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models Paper • 2604.04707 • Published 7 days ago • 199
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published 23 days ago • 330
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published 16 days ago • 347
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis Paper • 2603.20278 • Published 26 days ago • 94
BrowseComp-V^3: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents Paper • 2602.12876 • Published Feb 13 • 12
view article Article Build an Agent That Thinks Like a Data Scientist: How We Hit #1 on DABStep with Reusable Tool Generation about 1 month ago • 15
FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents Paper • 2602.01566 • Published Feb 2 • 52
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior Paper • 2512.20757 • Published Dec 23, 2025 • 18
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published Jan 14 • 127
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 215
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 230