-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2602.09856
-
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 202 -
How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs
Paper • 2602.08808 • Published • 9 -
Thinking Makes LLM Agents Introverted: How Mandatory Thinking Can Backfire in User-Engaged Agents
Paper • 2602.07796 • Published • 7 -
QP-OneModel: A Unified Generative LLM for Multi-Task Query Understanding in Xiaohongshu Search
Paper • 2602.09901 • Published • 6
-
Towards Pixel-Level VLM Perception via Simple Points Prediction
Paper • 2601.19228 • Published • 18 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27 -
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
Paper • 2601.19798 • Published • 43 -
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
Paper • 2601.21639 • Published • 51
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 10 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 12 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
-
SWE-Universe: Scale Real-World Verifiable Environments to Millions
Paper • 2602.02361 • Published • 60 -
LongCodeZip: Compress Long Context for Code Language Models
Paper • 2510.00446 • Published • 108 -
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 202 -
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
Paper • 2601.11868 • Published • 35
-
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
Paper • 2601.23143 • Published • 39 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 223 -
Agentic Reasoning for Large Language Models
Paper • 2601.12538 • Published • 203 -
BabyVision: Visual Reasoning Beyond Language
Paper • 2601.06521 • Published • 201
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 121 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 66
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 33 -
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Paper • 2503.23157 • Published • 10 -
AI Agents: Evolution, Architecture, and Real-World Applications
Paper • 2503.12687 • Published • 2
-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 2
-
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 202 -
How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs
Paper • 2602.08808 • Published • 9 -
Thinking Makes LLM Agents Introverted: How Mandatory Thinking Can Backfire in User-Engaged Agents
Paper • 2602.07796 • Published • 7 -
QP-OneModel: A Unified Generative LLM for Multi-Task Query Understanding in Xiaohongshu Search
Paper • 2602.09901 • Published • 6
-
SWE-Universe: Scale Real-World Verifiable Environments to Millions
Paper • 2602.02361 • Published • 60 -
LongCodeZip: Compress Long Context for Code Language Models
Paper • 2510.00446 • Published • 108 -
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 202 -
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
Paper • 2601.11868 • Published • 35
-
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
Paper • 2601.23143 • Published • 39 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 223 -
Agentic Reasoning for Large Language Models
Paper • 2601.12538 • Published • 203 -
BabyVision: Visual Reasoning Beyond Language
Paper • 2601.06521 • Published • 201
-
Towards Pixel-Level VLM Perception via Simple Points Prediction
Paper • 2601.19228 • Published • 18 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27 -
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
Paper • 2601.19798 • Published • 43 -
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
Paper • 2601.21639 • Published • 51
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 121 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 66
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 10 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 12 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 33 -
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Paper • 2503.23157 • Published • 10 -
AI Agents: Evolution, Architecture, and Real-World Applications
Paper • 2503.12687 • Published • 2