UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding
Paper • 2507.22025 • Published • 4
UI-AGILE is a framework designed to enhance Graphical User Interface (GUI) agents at both training and inference stages. It addresses common challenges in Multimodal Large Language Models (MLLMs) such as reasoning designs, ineffective rewards, and visual noise.
UI-AGILE-7B achieves state-of-the-art grounding performance on benchmarks like ScreenSpot-Pro and ScreenSpot-v2 while maintaining strong general agent capabilities.
If you find this project useful, please cite:
@misc{lian2025uiagileadvancingguiagents,
title={UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding},
author={Shuquan Lian and Yuhang Wu and Jia Ma and Zihan Song and Bingqi Chen and Xiawu Zheng and Hui Li},
year={2025},
eprint={2507.22025},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2507.22025},
}