Update README.md

e7dd47c verified about 9 hours ago

2.5 kB

license: apache-2.0
language:
  - en
tags:
  - large-language-model
  - multi-agent-systems
  - reinforcement-learning
  - agentic-ai
  - code
  - math

MetaAgent-X: Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning

Paper 📑

Codebase 🚗

Project Page 🏆

Overview

MetaAgent-X is an end-to-end reinforcement learning framework for autonomous multi-agent systems.

Unlike conventional automatic MAS methods that rely on frozen models, hand-crafted prompts, or search-based workflows, MetaAgent-X trains one shared model to both design a multi-agent system and execute it. The model learns to generate task-adaptive agent roles, collaboration structures, and execution strategies through reinforcement learning.

MetaAgent-X demonstrates strong cross-domain adaptation and achieves state-of-the-art performance across both code and math benchmarks.

Key Features

One model for both design and execution: the same model acts as both the MAS designer and the task executor.
End-to-end reinforcement learning: the model is optimized directly from downstream task outcomes.
Autonomous multi-agent system generation: the model learns to construct and execute agent swarms for complex reasoning tasks.
Cross-domain generalization: strong performance on both coding and mathematical reasoning benchmarks.

Results

The following table reports the performance of MetaAgent-X_RL.
Numbers in parentheses denote absolute gains over the single-agent baseline.

Domain	Benchmark	MetaAgent-X_RL
Code	LiveCodeBench	41.00
Code	APPS	38.00
Code	CodeContests	17.00
Math	AIME24	40.00
Math	AIME25	33.33
Math	OlympiadBench	61.00
Overall	Average	38.33

Citation

@misc{zhang2026metaagentxbreakingceiling,
      title={MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning}, 
      author={Yaolun Zhang and Yujie Zhao and Nan Wang and Yiran Wu and Jiayu Chang and Yizhao Chen and Qingyun Wu and Jishen Zhao and Huazheng Wang},
      year={2026},
      eprint={2605.14212},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.14212}, 
}