MambaMia-Qwen2-7B

This is the MambaMia model based on Qwen/Qwen2-7B-Instruct, designed for efficient long-form video understanding.

Model Description

MambaMia is a State-Space-Model-based hierarchical compression method for efficient video understanding in Large Multimodal Models (LMMs). It addresses the computational cost and information redundancy challenges in processing long videos.

Key Features:

  • Hierarchical compression using State-Space Models (SSM)
  • Gated attention mechanism
  • Learnable sampling strategy
  • Significantly reduced memory usage and faster inference compared to existing LMMs

Usage

Please refer to the MambaMia repository for detailed usage instructions.

Citation

If you use this model, please cite:

@misc{kim2025mambamia,
  title={MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models},
  author={Geewook Kim and Minjoon Seo},
  year={2025},
  eprint={2506.13564},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2506.13564}
}

License

This model is released under the Apache 2.0 License, following the base model's license.

Acknowledgements

Downloads last month
12
Safetensors
Model size
8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gwkrsrch/MambaMia-Qwen2-7B

Base model

Qwen/Qwen2-7B
Finetuned
(126)
this model

Paper for gwkrsrch/MambaMia-Qwen2-7B