CoMoVi: Co-Generation of 3D Human Motions
and Realistic Videos
Chengfeng Zhao1,
Jiazhi Shu2,
Yubo Zhao1,
Tianyu Huang3,
Jiahao Lu1,
Zekai Gu1,
Chengwei Ren1,
Zhiyang Dou4,
Qing Shuai5,
Yuan Liu1
1HKUST
2SCUT
3CUHK
4MIT
5ZJU
Corresponding author
GitHub
Acknowledgments
Thanks to the following work that we refer to and benefit from:
- VideoX-Fun: the video generation model training framework;
- CameraHMR: the excellent SMPL estimation for pseudo labels;
- Champ: the data processing pipeline
Citation
@article{zhao2026comovi,
title={CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos},
author={Zhao, Chengfeng and Shu, Jiazhi and Zhao, Yubo and Huang, Tianyu and Lu, Jiahao and Gu, Zekai and Ren, Chengwei and Dou, Zhiyang and Shuai, Qing and Liu, Yuan},
journal={arXiv preprint arXiv:2601.10632},
year={2026}
}