From Runnable to Shippable: Multi-Agent Test-Driven Development for Generating Full-Stack Web Applications from Requirements Paper • 2605.17242 • Published 7 days ago • 12
From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning Paper • 2605.22074 • Published 3 days ago • 2
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 11 days ago • 262
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation Paper • 2604.28196 • Published 24 days ago • 71
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 240
Crowded in B-Space: Calibrating Shared Directions for LoRA Merging Paper • 2604.16826 • Published Apr 18 • 18
An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU Paper • 2603.16428 • Published Mar 17 • 51
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings Paper • 2604.04323 • Published Apr 6 • 41
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published Mar 27 • 364