nielsr HF Staff commited on
Commit
2a30b90
·
verified ·
1 Parent(s): 6b1249d

Improve model card and add metadata

Browse files

This PR improves the model card for the EAPO model. It adds:
- Relevant metadata: `pipeline_tag: image-text-to-text` and `library_name: transformers`.
- Links to the paper ([Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization](https://huggingface.co/papers/2605.08978)), the project page, and the official GitHub repository.
- A brief description of the model based on the abstract.
- The correct BibTeX citation.

Files changed (1) hide show
  1. README.md +37 -3
README.md CHANGED
@@ -1,3 +1,37 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: image-text-to-text
5
+ ---
6
+
7
+ # Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization (EAPO)
8
+
9
+ [**📄 Paper**](https://arxiv.org/abs/2605.08978) | [**🌐 Website**](https://xingyuan-project.github.io/m2cl.github.io/) | [**💻 Code**](https://github.com/HansenHua/EAPO-ICML26)
10
+
11
+ This repository contains the model weights for **EAPO**, presented in the paper [Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization](https://huggingface.co/papers/2605.08978), which was accepted at ICML 2026.
12
+
13
+ ## Introduction
14
+
15
+ EAPO is an exploration-aware reinforcement learning framework that enables LLM agents to adaptively explore only when uncertainty is high. It introduces a fine-grained reward function via variational inference to evaluate exploratory actions and an exploration-aware grouping mechanism to separate exploratory actions from task-completion actions during optimization.
16
+
17
+ The model is based on the **Qwen2.5-VL** architecture and demonstrates consistent improvements across challenging text-based and GUI-based agent benchmarks.
18
+
19
+ ## Resources
20
+
21
+ - **Paper:** [arXiv:2605.08978](https://arxiv.org/abs/2605.08978)
22
+ - **Repository:** [GitHub - HansenHua/EAPO-ICML26](https://github.com/HansenHua/EAPO-ICML26)
23
+ - **Project Page:** [Website](https://xingyuan-project.github.io/m2cl.github.io/)
24
+
25
+ ## Citation
26
+
27
+ If you find our paper and code useful in your research, please consider citing:
28
+
29
+ ```bibtex
30
+ @inproceedings{
31
+ hua2026learning,
32
+ title={Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization},
33
+ author={Xingyuan Hua and Sheng Yue and Ju Ren},
34
+ booktitle={The Forty-third International Conference on Learning Representations},
35
+ year={2026}
36
+ }
37
+ ```