MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models
Paper • 2506.13564 • Published
This is the MambaMia model based on Qwen/Qwen2-7B-Instruct, designed for efficient long-form video understanding.
MambaMia is a State-Space-Model-based hierarchical compression method for efficient video understanding in Large Multimodal Models (LMMs). It addresses the computational cost and information redundancy challenges in processing long videos.
Key Features:
Please refer to the MambaMia repository for detailed usage instructions.
If you use this model, please cite:
@misc{kim2025mambamia,
title={MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models},
author={Geewook Kim and Minjoon Seo},
year={2025},
eprint={2506.13564},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.13564}
}
This model is released under the Apache 2.0 License, following the base model's license.