YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Research Paper: "MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models"

Model Description

This is a Stage 1 checkpoint for the MHA2MLA-VLM conversion process. This model serves as an intermediate checkpoint trained with partial RoPE (rope_dim=32) and is specifically designed to provide weight initialization for Stage 2 MHA2MLA-VLM models through SVD decomposition.

Usage

This Stage 1 model is primarily used to generate SVD initialization weights for Stage 2 MLA models:

  • Step 1: Download this Stage 1 checkpoint
  • Step 2: For MHA2MLA-VLM models using Partial-RoPE MKL method, Download the MKL file.
  • Step 3: Download and use the corresponding Stage 2 MHA2MLA-VLM models for inference. Refer to each Stage 2 model's README for detailed inference instructions. Choose from different d_kv dimensions:
    • LLaVA-NeXT-8B-MLA-stage2-rope32-d_kv_32
    • LLaVA-NeXT-8B-MLA-stage2-rope32-d_kv_64
    • LLaVA-NeXT-8B-MLA-stage2-rope32-d_kv_128

Citation

@misc{fan2026mha2mlavlmenablingdeepseekseconomical,
      title={MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models}, 
      author={Xiaoran Fan and Zhichao Sun and Tao Ji and Lixing Shen and Tao Gui},
      year={2026},
      eprint={2601.11464},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.11464}, 
}
Downloads last month
15
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for cnxup/LLaVA-NeXT-8B-MLA-stage1-rope32