---
license: mit
---
In this repository, we provide the pre-trained models, including the CDT-S and CDT-B models in the `pretrained` folder. The inference code is available at [here](https://github.com/ali-vilab/CDT).

If you find this work useful in your research, please consider citing:
```bibtex
@article{yang2025rethinking,
  title={Rethinking Video Tokenization: A Conditioned Diffusion-based Approach},
  author={Yang, Nianzu and Li, Pandeng and Zhao, Liming and Li, Yang and Xie, Chen-Wei and Tang, Yehui and Lu, Xudong and Liu, Zhihang and Zheng, Yun and Liu, Yu and Yan, Junchi},
  journal={arXiv preprint arXiv:2503.03708},
  year={2025}
}
```

Feel free to reach out to me at [yangnianzu@sjtu.edu.cn](mailto:yangnianzu@sjtu.edu.cn) for any question.