hansenhua
/

EAPO-ICML26

Image-Text-to-Text

Model card Files Files and versions

EAPO-ICML26 / README.md

hansenhua's picture

Update README.md

fd5dc3a verified 2 days ago

|

history blame contribute delete

1.02 kB

	---
	license: apache-2.0
	base_model:
	- Qwen/Qwen2.5-VL-7B-Instruct
	pipeline_tag: image-text-to-text
	---
	<p align="center">
	<img src="https://github.com/HansenHua/EAPO-ICML26/raw/main/introduction.png" width="90%"></img>
	</p>

	EAPO (Exploration-Aware Policy Optimization) is a reinforcement learning framework for training agentic large language models to perform adaptive exploration during test-time interaction. Unlike prior methods that apply exploration uniformly across all states, EAPO enables agents to selectively explore only when environmental uncertainty is high, improving long-horizon reasoning and decision making in interactive environments such as GUI control, web navigation, and embodied tasks.

	<p align="center">
	<img src="https://github.com/HansenHua/EAPO-ICML26/raw/main/performance.jpg" width="50%"></img>
	</p>

	Paper: Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization (https://arxiv.org/abs/2605.08978)

	Code: https://github.com/HansenHua/EAPO-ICML26