UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

[📖 Paper] [🤗 Checkpoints] [🤗 Data] [🤗 Daily Paper] [🚀 Github]

🔥 Overview

UI-AGILE is a framework designed to enhance Graphical User Interface (GUI) agents at both training and inference stages. It addresses common challenges in Multimodal Large Language Models (MLLMs) such as reasoning designs, ineffective rewards, and visual noise.

Key Features

Training Enhancements:
- Continuous Reward Function: Incentivizes high-precision grounding.
- "Simple Thinking" Reward: Balances planning depth with execution speed and grounding accuracy.
- Cropping-based Resampling: Mitigates the sparse reward problem and improves learning on complex tasks.
Inference Enhancements:
- Decomposed Grounding with Selection: Dramatically improves grounding accuracy on high-resolution displays by breaking the image into smaller, manageable parts.

UI-AGILE-7B achieves state-of-the-art grounding performance on benchmarks like ScreenSpot-Pro and ScreenSpot-v2 while maintaining strong general agent capabilities.

⭐️ Citation

If you find this project useful, please cite:

@misc{lian2025uiagileadvancingguiagents,
      title={UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding}, 
      author={Shuquan Lian and Yuhang Wu and Jia Ma and Zihan Song and Bingqi Chen and Xiawu Zheng and Hui Li},
      year={2025},
      eprint={2507.22025},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2507.22025}, 
}

Downloads last month: 41

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for KDEGroup/UI-AGILE-3B

Quantizations

2 models

Paper for KDEGroup/UI-AGILE-3B

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Paper • 2507.22025 • Published Jul 29, 2025 • 4

Evaluation results

Overall on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

45
Android Studio Macos on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

42.5
Autocad Windows on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

29.4
Blender Windows on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

36.6
Davinci Macos on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

45.5
Eviews Windows on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

86
Excel Macos on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

46.9
Fruitloops Windows on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

28.1
Illustrator Windows on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

25.8
Inventor Windows on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

42.9