OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models Paper • 2604.10866 • Published 5 days ago • 58
Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory Paper • 2604.01007 • Published 16 days ago • 31
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs Paper • 2603.05890 • Published Mar 6 • 93
ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall Paper • 2510.07896 • Published Oct 9, 2025 • 8
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Paper • 2602.17684 • Published Feb 4 • 22
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published Feb 9 • 263
Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model Paper • 2602.07422 • Published Feb 7 • 22
Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning Paper • 2602.01745 • Published Feb 2 • 7
Improving Data and Reward Design for Scientific Reasoning in Large Language Models Paper • 2602.08321 • Published Feb 9 • 43
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration Paper • 2602.01734 • Published Feb 2 • 32
PaperBanana: Automating Academic Illustration for AI Scientists Paper • 2601.23265 • Published Jan 30 • 223
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published Jan 18 • 52
MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era Paper • 2601.07526 • Published Jan 12 • 23
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper • 2510.23538 • Published Oct 27, 2025 • 98
Scaling Language-Centric Omnimodal Representation Learning Paper • 2510.11693 • Published Oct 13, 2025 • 107
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs Paper • 2509.22220 • Published Sep 26, 2025 • 66
UltraIF series Collection Open-Sourced model and data for ULTRAIF: Advancing Instruction Following from the Wild. • 6 items • Updated Apr 3, 2025 • 3
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding Paper • 2506.07434 • Published Jun 9, 2025 • 7