--- license: mit --- # Diff-V2M: A Hierarchical Conditional Diffusion Model with Explicit Rhythmic Modeling for Video-to-Music Generation Here is the training checkpoints of **[Diff-V2M (AAAI'26)](https://arxiv.org/abs/2511.09090)** ## Overview Diff-V2M is a hierarchical diffusion model with explicit rhythmic modeling and multi-view feature conditioning, achieving state-of-the-art results in video-to-music generation.. model ## Model Sources - **Repository:** https://github.com/Tayjsl97/Diff-V2M - **Demo:** [demo page](https://tayjsl97.github.io/Diff-V2M-Demo) ## Citation If you use our models in your research, please cite it as follows: ```bib @inproceedings{ji2026diff, title={Diff-V2M: A Hierarchical Conditional Diffusion Model with Explicit Rhythmic Modeling for Video-to-Music Generation}, author={Ji, Shulei and Wang, Zihao and Yu, Jiaxing and Yang, Xiangyuan and Li, Shuyu and Wu, Songruoyao and Zhang, Kejun}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, year={2026} }