-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2512.15431
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 55 -
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Paper • 2512.04324 • Published • 159 -
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management
Paper • 2512.12967 • Published • 111 -
Step-GUI Technical Report
Paper • 2512.15431 • Published • 133
-
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
Paper • 2505.21496 • Published • 38 -
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation
Paper • 2506.04614 • Published • 19 -
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Paper • 2506.09790 • Published • 53 -
WebSailor: Navigating Super-human Reasoning for Web Agent
Paper • 2507.02592 • Published • 126
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 31 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 29 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 34 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7
-
Grounding Computer Use Agents on Human Demonstrations
Paper • 2511.07332 • Published • 107 -
Qwen3-VL Technical Report
Paper • 2511.21631 • Published • 161 -
Step-GUI Technical Report
Paper • 2512.15431 • Published • 133 -
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents
Paper • 2512.22047 • Published • 30
-
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
Paper • 2506.03143 • Published • 54 -
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Paper • 2505.12370 • Published -
UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning
Paper • 2505.12493 • Published
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 208 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Paper • 2410.02740 • Published • 53 -
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Paper • 2410.01215 • Published • 39 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 121 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 29
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 2
-
Grounding Computer Use Agents on Human Demonstrations
Paper • 2511.07332 • Published • 107 -
Qwen3-VL Technical Report
Paper • 2511.21631 • Published • 161 -
Step-GUI Technical Report
Paper • 2512.15431 • Published • 133 -
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents
Paper • 2512.22047 • Published • 30
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 55 -
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Paper • 2512.04324 • Published • 159 -
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management
Paper • 2512.12967 • Published • 111 -
Step-GUI Technical Report
Paper • 2512.15431 • Published • 133
-
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
Paper • 2506.03143 • Published • 54 -
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Paper • 2505.12370 • Published -
UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning
Paper • 2505.12493 • Published
-
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
Paper • 2505.21496 • Published • 38 -
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation
Paper • 2506.04614 • Published • 19 -
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Paper • 2506.09790 • Published • 53 -
WebSailor: Navigating Super-human Reasoning for Web Agent
Paper • 2507.02592 • Published • 126
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 208 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 31 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Paper • 2410.02740 • Published • 53 -
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Paper • 2410.01215 • Published • 39 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 121 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 29
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 29 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 34 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25