File size: 3,126 Bytes
24d4481 c8c5581 24d4481 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | # Adaptive Video Distillation
### Mitigating Oversaturation and Temporal Collapse in Few-Step Generation
[Project Page](https://Adaptive-Video-Distillation.github.io/)
> **Adaptive Video Distillation**
> Yuyang You*, Yongzhi Li*, Jiahui Li, Yadong Mu, Quan Chen, Peng ...
> *CVPR 2026*
---
## Overview
This is the official repository for ADV (Adaptive Video Distillation) — a video model distillation method based on DMD(Distribution Matching Distillation). It addresses oversaturation and slow-motion issues in video generation model distillation, and is capable of learning from new data during distillation training.
## Environment Setup
```bash
conda create -n AVD python=3.10 -y
conda activate AVD
pip install torch torchvision
pip install -r requirements.txt
python setup.py develop
```
Also download the Wan base models from [here](https://github.com/Wan-Video/Wan2.1) and save it to wan_models/Wan2.1-T2V-1.3B/
## Inference Example
First download the checkpoints: [Autoregressive Model](https://huggingface.co/).
### Inference Script
```bash
python ./tests/wan/test_bidirectional_fewstep.py
```
## Training and Evaluation
### Dataset Preparation
We use the [MixKit Dataset](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main/all_mixkit) (6K videos) as a toy example for distillation.
To prepare the dataset, follow these steps. You can also download the final LMDB dataset from [here](https://huggingface.co/tianweiy/CausVid/tree/main/mixkit_latents_lmdb)
```bash
# download and extract video from the Mixkit dataset
python distillation_data/download_mixkit.py --local_dir XXX
# convert the video to 480x832x81
python distillation_data/process_mixkit.py --input_dir XXX --output_dir XXX --width 832 --height 480 --fps 16
# precompute the vae latent
torchrun --nproc_per_node 8 distillation_data/compute_vae_latent.py --input_video_folder XXX --output_latent_folder XXX --info_path sample_dataset/video_mixkit_6484_caption.json
# combined everything into a lmdb dataset
python causvid/ode_data/create_lmdb_iterative.py --data_path XXX --lmdb_path XXX
```
## Training
Please first modify the wandb account information in the respective config.
Bidirectional DMD Training
```bash
torchrun --nnodes 1 --nproc_per_node=8 --master_port 29502 \
causvid/train_distillation_regression.py \
--config_path configs/wan_bidirectional_dmd.yaml
```
## Citation
Here is a arxiv version citation bib:
```bib
@misc{you2026adaptivevideodistillationmitigating,
title={Adaptive Video Distillation: Mitigating Oversaturation
and Temporal Collapse in Few-Step Generation},
author={Yuyang You and Yongzhi Li and Jiahui Li
and Yadong Mu and Quan Chen and Peng Jiang},
year={2026},
eprint={2603.21864},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.21864},
}
```
## Acknowledgments
Our implementation is largely based on the [Causvid](https://github.com/tianweiy/CausVid) and [Wan](https://github.com/Wan-Video/Wan2.1) model suite. |