Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 3 days ago • 135
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization Paper • 2604.09574 • Published Feb 24 • 30
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 4 days ago • 77
view article Article Red Teaming with RL: Exploiting Tinker API for Harmful RL on 235B Model Jan 1 • 19
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published 10 days ago • 316
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published 24 days ago • 53
On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning Paper • 2604.01702 • Published 14 days ago • 3
SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds Paper • 2604.08544 • Published 9 days ago • 16
ATBench: A Diverse and Realistic Trajectory Benchmark for Long-Horizon Agent Safety Paper • 2604.02022 • Published 16 days ago • 15
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published 11 days ago • 114
Rethink_SFT_generalization Collection Repo for paper Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability. • 40 items • Updated 7 days ago • 16
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development Paper • 2603.27460 • Published 20 days ago • 68
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data Paper • 2603.15594 • Published Mar 16 • 149
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration? Paper • 2603.03202 • Published Mar 3 • 17
Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning Paper • 2602.11149 • Published Feb 11 • 17
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5 Paper • 2602.14457 • Published Feb 16 • 29
Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations Paper • 2602.05885 • Published Feb 5 • 28
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published Jan 29 • 42