🧠 MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

πŸ“Œ Model Description

Model name: NewBeeKing/MemPO_Qwen2.5-SFT-RL

This model is the reinforcement learning (RL) optimized version of NewBeeKing/MemPO_Qwen2.5-SFT, trained using the MemPO algorithm on the NewBeeKing/MemPO_RL-train-dataset.

After downloading the model, you can refer to the code repository https://github.com/TheNewBeeKing/MemPO to test or train the model.

MemPO is designed for long-horizon agent tasks, where the interaction history with the environment can grow rapidly and hurt both performance and stability. Instead of relying only on external memory retrieval, MemPO enables the policy model itself to proactively summarize, retain, and manage memory during interaction.

By improving credit assignment based on memory effectiveness, MemPO helps the agent keep crucial information while discarding less useful context, leading to much better token efficiency without sacrificing task performance.


✨ Abstract

Long-horizon agents face the challenge of growing context size during interaction with environment, which degrades performance and stability. We propose the self-memory policy optimization algorithm (MemPO), which enables the agent to autonomously summarize and manage their memory during interaction. By improving the credit assignment mechanism, the policy model can selectively retain crucial information, significantly reducing token consumption while preserving task performance. Extensive experiments confirm that MemPO achieves absolute F1 score gains of 25.98% over the base model while reducing token usage by up to 73.12%.


🧱 Base Model


πŸ“š Dataset


πŸ“ Citation

@misc{li2025mempo,
      title={MemPO: Self-Memory Policy Optimization for Long-Horizon Agents}, 
      author={Ruoran Li and Xinghua Zhang and Haiyang Yu and Shitong Duan and Xiang Li and Wenxin Xiang and Chonghua Liao and Xudong Guo and Yongbin Li and Jinli Suo},
      year={2025},
      eprint={2603.00680},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2603.00680}, 
}
Downloads last month
909
Safetensors
Model size
7B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for NewBeeKing/MemPO_Qwen2.5-SFT-RL

Finetuned
(1)
this model

Dataset used to train NewBeeKing/MemPO_Qwen2.5-SFT-RL

Collection including NewBeeKing/MemPO_Qwen2.5-SFT-RL

Paper for NewBeeKing/MemPO_Qwen2.5-SFT-RL