theyouyy commited on
Commit
24d4481
·
verified ·
1 Parent(s): c8c5581

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -11
README.md CHANGED
@@ -1,13 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- license: mit
3
- language:
4
- - en
5
- - zh
6
- base_model:
7
- - Wan-AI/Wan2.1-T2V-1.3B
8
- ---
9
- # {{ADV}}: {{Adaptive Video Distillation}}
10
 
11
- [![Paper](https://arxiv.org/abs/2603.21864)](https://arxiv.org/abs/{{arxiv_id}})
12
- [![Project Page](https://adaptive-video-distillation.github.io/)]({{project_page_url}})
13
- [![GitHub](https://github.com/yuyangyou/Adaptive-Video-Distillation)]({{github_url}})
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Adaptive Video Distillation
2
+ ### Mitigating Oversaturation and Temporal Collapse in Few-Step Generation
3
+
4
+ [Project Page](https://Adaptive-Video-Distillation.github.io/)
5
+
6
+ <video width="480" height="270" controls>
7
+ <source src="docs/sample.mp4" type="video/mp4">
8
+ Your browser does not support the video tag.
9
+ </video>
10
+
11
+ > **Adaptive Video Distillation**
12
+ > Yuyang You*, Yongzhi Li*, Jiahui Li, Yadong Mu, Quan Chen, Peng ...
13
+ > *CVPR 2026*
14
+
15
  ---
 
 
 
 
 
 
 
 
16
 
17
+ ## Overview
18
+
19
+ This is the official repository for ADV (Adaptive Video Distillation) — a video model distillation method based on DMD(Distribution Matching Distillation). It addresses oversaturation and slow-motion issues in video generation model distillation, and is capable of learning from new data during distillation training.
20
+
21
+
22
+ ## Environment Setup
23
+
24
+ ```bash
25
+ conda create -n AVD python=3.10 -y
26
+ conda activate AVD
27
+ pip install torch torchvision
28
+ pip install -r requirements.txt
29
+ python setup.py develop
30
+ ```
31
+
32
+ Also download the Wan base models from [here](https://github.com/Wan-Video/Wan2.1) and save it to wan_models/Wan2.1-T2V-1.3B/
33
+
34
+ ## Inference Example
35
+
36
+ First download the checkpoints: [Autoregressive Model](https://huggingface.co/).
37
+
38
+
39
+ ### Inference Script
40
+
41
+ ```bash
42
+ python ./tests/wan/test_bidirectional_fewstep.py
43
+ ```
44
+
45
+ ## Training and Evaluation
46
+
47
+ ### Dataset Preparation
48
+
49
+ We use the [MixKit Dataset](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main/all_mixkit) (6K videos) as a toy example for distillation.
50
+
51
+ To prepare the dataset, follow these steps. You can also download the final LMDB dataset from [here](https://huggingface.co/tianweiy/CausVid/tree/main/mixkit_latents_lmdb)
52
+
53
+ ```bash
54
+ # download and extract video from the Mixkit dataset
55
+ python distillation_data/download_mixkit.py --local_dir XXX
56
+
57
+ # convert the video to 480x832x81
58
+ python distillation_data/process_mixkit.py --input_dir XXX --output_dir XXX --width 832 --height 480 --fps 16
59
+
60
+ # precompute the vae latent
61
+ torchrun --nproc_per_node 8 distillation_data/compute_vae_latent.py --input_video_folder XXX --output_latent_folder XXX --info_path sample_dataset/video_mixkit_6484_caption.json
62
+
63
+ # combined everything into a lmdb dataset
64
+ python causvid/ode_data/create_lmdb_iterative.py --data_path XXX --lmdb_path XXX
65
+ ```
66
+
67
+ ## Training
68
+
69
+ Please first modify the wandb account information in the respective config.
70
+
71
+ Bidirectional DMD Training
72
+
73
+ ```bash
74
+ torchrun --nnodes 1 --nproc_per_node=8 --master_port 29502 \
75
+ causvid/train_distillation_regression.py \
76
+ --config_path configs/wan_bidirectional_dmd.yaml
77
+
78
+ ```
79
+
80
+ ## Citation
81
+
82
+ Here is a arxiv version citation bib:
83
+
84
+ ```bib
85
+ @misc{you2026adaptivevideodistillationmitigating,
86
+ title={Adaptive Video Distillation: Mitigating Oversaturation
87
+ and Temporal Collapse in Few-Step Generation},
88
+ author={Yuyang You and Yongzhi Li and Jiahui Li
89
+ and Yadong Mu and Quan Chen and Peng Jiang},
90
+ year={2026},
91
+ eprint={2603.21864},
92
+ archivePrefix={arXiv},
93
+ primaryClass={cs.CV},
94
+ url={https://arxiv.org/abs/2603.21864},
95
+ }
96
+
97
+ ```
98
+
99
+ ## Acknowledgments
100
+
101
+ Our implementation is largely based on the [Causvid](https://github.com/tianweiy/CausVid) and [Wan](https://github.com/Wan-Video/Wan2.1) model suite.