MetaAgent-X / README.md
Mercury7353's picture
Update README.md
e7dd47c verified
metadata
license: apache-2.0
language:
  - en
tags:
  - large-language-model
  - multi-agent-systems
  - reinforcement-learning
  - agentic-ai
  - code
  - math

MetaAgent-X: Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning

Paper ๐Ÿ“‘

Codebase ๐Ÿš—

Project Page ๐Ÿ†

Overview

MetaAgent-X is an end-to-end reinforcement learning framework for autonomous multi-agent systems.

Unlike conventional automatic MAS methods that rely on frozen models, hand-crafted prompts, or search-based workflows, MetaAgent-X trains one shared model to both design a multi-agent system and execute it. The model learns to generate task-adaptive agent roles, collaboration structures, and execution strategies through reinforcement learning.

MetaAgent-X demonstrates strong cross-domain adaptation and achieves state-of-the-art performance across both code and math benchmarks.

Key Features

  • One model for both design and execution: the same model acts as both the MAS designer and the task executor.
  • End-to-end reinforcement learning: the model is optimized directly from downstream task outcomes.
  • Autonomous multi-agent system generation: the model learns to construct and execute agent swarms for complex reasoning tasks.
  • Cross-domain generalization: strong performance on both coding and mathematical reasoning benchmarks.

Results

The following table reports the performance of MetaAgent-XRL.
Numbers in parentheses denote absolute gains over the single-agent baseline.

Domain Benchmark MetaAgent-XRL
Code LiveCodeBench 41.00
Code APPS 38.00
Code CodeContests 17.00
Math AIME24 40.00
Math AIME25 33.33
Math OlympiadBench 61.00
Overall Average 38.33

Citation

@misc{zhang2026metaagentxbreakingceiling,
      title={MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning}, 
      author={Yaolun Zhang and Yujie Zhao and Nan Wang and Yiran Wu and Jiayu Chang and Yizhao Chen and Qingyun Wu and Jishen Zhao and Huazheng Wang},
      year={2026},
      eprint={2605.14212},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.14212}, 
}