arxiv:2603.09938

Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

Published on Mar 30

Authors:

Abstract

Model merging combines neural network parameters into single models without additional training, offering efficient alternatives to ensembles and full retraining for large language models.

AI-generated summary

Model merging combines the parameters of multiple neural networks into a single model without additional training. As fine-tuned large language models (LLMs) proliferate, merging offers a computationally efficient alternative to ensembles and full retraining, enabling practitioners to compose specialized capabilities at minimal cost. This survey examines model merging in the LLM era through the FUSE taxonomy, organized along Foundations, Unification Strategies, Scenarios, and Ecosystem. We first establish the theoretical underpinnings of merging, including loss landscape geometry and mode connectivity, then systematically review the algorithmic space spanning weight averaging, task vector arithmetic, sparsification-enhanced methods, mixture-of-experts architectures, and evolutionary optimization. We further examine downstream applications across multi-task learning, safety alignment, domain specialization, and federated learning, and survey the supporting ecosystem of tools and evaluation benchmarks. Finally, we identify key open challenges and future directions, aiming to equip researchers and practitioners with a structured foundation for advancing model merging.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2603.09938

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.09938 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.09938 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.09938 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.