🧠 MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

📌 Model Description

Model name: NewBeeKing/MemPO_Qwen2.5-SFT-RL

This model is the reinforcement learning (RL) optimized version of NewBeeKing/MemPO_Qwen2.5-SFT, trained using the MemPO algorithm on the NewBeeKing/MemPO_RL-train-dataset.

📄 Paper: MemPO: Self-Memory Policy Optimization for Long-Horizon Agents
💻 Code: Official GitHub Repository

After downloading the model, you can refer to the code repository https://github.com/TheNewBeeKing/MemPO to test or train the model.

MemPO is designed for long-horizon agent tasks, where the interaction history with the environment can grow rapidly and hurt both performance and stability. Instead of relying only on external memory retrieval, MemPO enables the policy model itself to proactively summarize, retain, and manage memory during interaction.

By improving credit assignment based on memory effectiveness, MemPO helps the agent keep crucial information while discarding less useful context, leading to much better token efficiency without sacrificing task performance.

✨ Abstract

Long-horizon agents face the challenge of growing context size during interaction with environment, which degrades performance and stability. We propose the self-memory policy optimization algorithm (MemPO), which enables the agent to autonomously summarize and manage their memory during interaction. By improving the credit assignment mechanism, the policy model can selectively retain crucial information, significantly reducing token consumption while preserving task performance. Extensive experiments confirm that MemPO achieves absolute F1 score gains of 25.98% over the base model while reducing token usage by up to 73.12%.

🧱 Base Model

Base SFT model: NewBeeKing/MemPO_Qwen2.5-SFT

📚 Dataset

RL training dataset: NewBeeKing/MemPO_RL-train-dataset

📝 Citation

@misc{li2025mempo,
      title={MemPO: Self-Memory Policy Optimization for Long-Horizon Agents}, 
      author={Ruoran Li and Xinghua Zhang and Haiyang Yu and Shitong Duan and Xiang Li and Wenxin Xiang and Chonghua Liao and Xudong Guo and Yongbin Li and Jinli Suo},
      year={2025},
      eprint={2603.00680},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2603.00680}, 
}