File size: 3,126 Bytes
24d4481
 
 
 
 
 
 
 
 
 
c8c5581
 
24d4481
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# Adaptive Video Distillation
### Mitigating Oversaturation and Temporal Collapse in Few-Step Generation

[Project Page](https://Adaptive-Video-Distillation.github.io/)  


> **Adaptive Video Distillation**  
> Yuyang You*, Yongzhi Li*, Jiahui Li, Yadong Mu, Quan Chen, Peng ...  
> *CVPR 2026*  

---

## Overview

This is the official repository for ADV (Adaptive Video Distillation) — a video model distillation method based on DMD(Distribution Matching Distillation). It addresses oversaturation and slow-motion issues in video generation model distillation, and is capable of learning from new data during distillation training.


## Environment Setup

```bash
conda create -n AVD python=3.10 -y
conda activate AVD
pip install torch torchvision 
pip install -r requirements.txt 
python setup.py develop
```

Also download the Wan base models from [here](https://github.com/Wan-Video/Wan2.1) and save it to wan_models/Wan2.1-T2V-1.3B/

## Inference Example 

First download the checkpoints: [Autoregressive Model](https://huggingface.co/).


### Inference Script

```bash 
python ./tests/wan/test_bidirectional_fewstep.py
```

## Training and Evaluation  

### Dataset Preparation 

We use the [MixKit Dataset](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main/all_mixkit) (6K videos) as a toy example for distillation. 

To prepare the dataset, follow these steps. You can also download the final LMDB dataset from [here](https://huggingface.co/tianweiy/CausVid/tree/main/mixkit_latents_lmdb)

```bash
# download and extract video from the Mixkit dataset 
python distillation_data/download_mixkit.py  --local_dir XXX 

# convert the video to 480x832x81 
python distillation_data/process_mixkit.py --input_dir XXX  --output_dir XXX --width 832   --height 480  --fps 16 

# precompute the vae latent 
torchrun --nproc_per_node 8 distillation_data/compute_vae_latent.py --input_video_folder XXX  --output_latent_folder XXX   --info_path sample_dataset/video_mixkit_6484_caption.json

# combined everything into a lmdb dataset 
python causvid/ode_data/create_lmdb_iterative.py   --data_path XXX  --lmdb_path XXX
```

## Training 

Please first modify the wandb account information in the respective config.  

Bidirectional DMD Training

```bash
torchrun --nnodes 1 --nproc_per_node=8 --master_port 29502 \
    causvid/train_distillation_regression.py \
    --config_path configs/wan_bidirectional_dmd.yaml

```

## Citation 

Here is a arxiv version citation bib:

```bib
@misc{you2026adaptivevideodistillationmitigating,
      title={Adaptive Video Distillation: Mitigating Oversaturation
             and Temporal Collapse in Few-Step Generation},
      author={Yuyang You and Yongzhi Li and Jiahui Li
              and Yadong Mu and Quan Chen and Peng Jiang},
      year={2026},
      eprint={2603.21864},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.21864},
}

```

## Acknowledgments

Our implementation is largely based on the [Causvid](https://github.com/tianweiy/CausVid) and [Wan](https://github.com/Wan-Video/Wan2.1) model suite.