YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
license: apache-2.0 license_link: https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/LICENSE language:
- en pipeline_tag: text-generation base_model: Qwen/Qwen2.5-7B tags:
- chat
MemAgent β agentic/memagent/
Our code hub is :https://github.com/LMIS-ORG/slime-agentic
Reproduces the core idea of MemAgent: compressing arbitrarily long documents into a fixed-size recurrent memory via a chunk-by-chunk LLM update loop, then answering questions from memory alone. RL (GRPO) is applied to all memory-update turns using a Multi-Conversation training objective, so the model learns to retain what matters across chunks without ever seeing the full context at once.
Architecture
Input: question + long document
β
βΌ
memory = "No previous memory"
β
βββΊ for chunk in split(document, chunk_tokens):
β
ββ LLM(problem, memory, chunk) β updated memory (loss_mask=1)
β
βΌ
LLM(problem, memory) β final answer in \boxed{} (loss_mask=0)
β
βΌ
Reward: exact-match / F1 against ground truth
(distributed evenly across all memory-update turns)
Each memory-update turn is an independent training sequence. The reward is evenly amortised across all turns in the conversation (via custom_convert), matching the Multi-Conv RL objective in the MemAgent paper.
Results
Evaluated on RULER-HQA across context lengths from 7K to 448K tokens (5 runs, best score reported):
| Model | 7K | 14K | 28K | 56K | 112K | 224K | 448K |
|---|---|---|---|---|---|---|---|
| MemAgent (ours) | 78.12 | 76.56 | 75.78 | 74.22 | 77.34 | 72.66 | 69.53 |
| QwenLong-L1-32B | 72.66 | 75.00 | 72.66 | 60.94 | 31.25 | 17.19 | 13.28 |
| Qwen2.5-Instruct-14B-1M | 60.16 | 60.94 | 50.00 | 57.03 | 50.00 | 37.50 | 8.59 |
| Qwen2.5-Instruct-7B-1M | 61.72 | 56.25 | 53.91 | 55.47 | 51.56 | 33.59 | 12.50 |
| DS-Distill-Qwen-32B | 70.31 | 66.41 | 65.62 | 46.88 | 23.44 | 13.28 | 7.81 |
| DS-Distill-Qwen-14B | 64.06 | 64.84 | 57.03 | 40.62 | 14.84 | 8.59 | 3.12 |
| DS-Distill-Qwen-7B | 30.47 | 12.50 | 3.12 | 0.00 | 0.00 | 0.78 | 0.00 |
MemAgent (ours) is trained on a 7B base model and consistently outperforms all baselines, including much larger models, across all context lengths.
- Downloads last month
- 516