MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models
Paper • 2601.11464 • Published
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Research Paper: "MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models"
This is a Stage 1 checkpoint for the MHA2MLA-VLM conversion process. This model serves as an intermediate checkpoint trained with partial RoPE (rope_dim=32) and is specifically designed to provide weight initialization for Stage 2 MHA2MLA-VLM models through SVD decomposition.
This Stage 1 model is primarily used to generate SVD initialization weights for Stage 2 MLA models:
d_kv dimensions:LLaVA-NeXT-8B-MLA-stage2-rope32-d_kv_32 LLaVA-NeXT-8B-MLA-stage2-rope32-d_kv_64 LLaVA-NeXT-8B-MLA-stage2-rope32-d_kv_128@misc{fan2026mha2mlavlmenablingdeepseekseconomical,
title={MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models},
author={Xiaoran Fan and Zhichao Sun and Tao Ji and Lixing Shen and Tao Gui},
year={2026},
eprint={2601.11464},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.11464},
}