--- license: apache-2.0 language: - en tags: - large-language-model - multi-agent-systems - reinforcement-learning - agentic-ai - code - math --- # MetaAgent-X: Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning [Paper 📑](https://arxiv.org/abs/2605.14212) [Codebase 🚗](https://github.com/pettingllms-ai/PettingLLMs) [Project Page 🏆](https://mercury7353.github.io/MetaAgent-X-Page/) ## Overview **MetaAgent-X** is an end-to-end reinforcement learning framework for autonomous multi-agent systems. Unlike conventional automatic MAS methods that rely on frozen models, hand-crafted prompts, or search-based workflows, MetaAgent-X trains one shared model to both **design** a multi-agent system and **execute** it. The model learns to generate task-adaptive agent roles, collaboration structures, and execution strategies through reinforcement learning. MetaAgent-X demonstrates strong cross-domain adaptation and achieves state-of-the-art performance across both **code** and **math** benchmarks. ## Key Features - **One model for both design and execution**: the same model acts as both the MAS designer and the task executor. - **End-to-end reinforcement learning**: the model is optimized directly from downstream task outcomes. - **Autonomous multi-agent system generation**: the model learns to construct and execute agent swarms for complex reasoning tasks. - **Cross-domain generalization**: strong performance on both coding and mathematical reasoning benchmarks. ## Results The following table reports the performance of **MetaAgent-XRL**. Numbers in parentheses denote absolute gains over the single-agent baseline. | Domain | Benchmark | MetaAgent-XRL | |---|---:|---:| | Code | LiveCodeBench | **41.00** | | Code | APPS | **38.00** | | Code | CodeContests | **17.00** | | Math | AIME24 | **40.00** | | Math | AIME25 | **33.33** | | Math | OlympiadBench | **61.00** | | Overall | Average | **38.33** | ## Citation ``` @misc{zhang2026metaagentxbreakingceiling, title={MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning}, author={Yaolun Zhang and Yujie Zhao and Nan Wang and Yiran Wu and Jiayu Chang and Yizhao Chen and Qingyun Wu and Jishen Zhao and Huazheng Wang}, year={2026}, eprint={2605.14212}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2605.14212}, } ```