quanhaol commited on
Commit
6bfb8db
Β·
verified Β·
1 Parent(s): a50fa48

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +216 -0
README.md ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - quanhaol/MagicData
5
+ base_model:
6
+ - quanhaol/Wan2.2-TI2V-5B-Turbo
7
+ - Wan-AI/Wan2.2-TI2V-5B
8
+ tags:
9
+ - image-to-video
10
+ - Trajectory-Control
11
+ - Fewstep-video-gen
12
+ ---
13
+ <br>
14
+ <a href="https://arxiv.org/pdf/2603.12146"><img src="https://img.shields.io/static/v1?label=Paper&message=2603.12146&color=red&logo=arxiv"></a>
15
+ <a href="https://quanhaol.github.io/flashmotion-site/"><img src="https://img.shields.io/static/v1?label=Project&message=Page&color=green&logo=github-pages"></a>
16
+ <a href="https://huggingface.co/quanhaol/FlashMotion"><img src="https://img.shields.io/badge/πŸ€—_HuggingFace-Model-ffbd45.svg" alt="HuggingFace"></a>
17
+ <a href="https://huggingface.co/datasets/quanhaol/FlashBench"><img src="https://img.shields.io/badge/πŸ€—_HuggingFace-Benchmark-ffbd45.svg" alt="HuggingFace"></a>
18
+
19
+ > **FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance**
20
+ > <br>
21
+ > [Quanhao Li](https://github.com/quanhaol)<sup>1</sup>, [Zhen Xing](https://chenhsing.github.io/)<sup>1</sup>, [Rui Wang](https://scholar.google.com/citations?user=116smmsAAAAJ&hl=en)<sup>1</sup>, Haidong Cao<sup>1</sup>, [Qi Dai](https://daiqi1989.github.io/)<sup>2</sup>, Daoguo Dong<sup>1</sup> and [Zuxuan Wu](https://zxwu.azurewebsites.net/)<sup>1</sup>
22
+ >
23
+ > <sup>1</sup> Fudan University; <sup>2</sup> Microsoft Research Asia
24
+
25
+ ## πŸ’‘ Abstract
26
+
27
+ Recent advances in trajectory-controllable video generation have achieved remarkable progress. Previous methods mainly use adapter-based architectures for precise motion control along predefined trajectories.
28
+ However, all these methods rely on a multi-step denoising process, leading to substantial time redundancy and computational overhead.
29
+ While existing video distillation methods successfully distill multi-step generators into few-step, directly applying these approaches to trajectory-controllable video generation results in noticeable degradation in both video quality and trajectory accuracy.
30
+ To bridge this gap, we introduce **FlashMotion**, a novel training framework designed for few-step trajectory-controllable video generation.
31
+ We first train a trajectory adapter on a multi-step video generator for precise trajectory control.
32
+ Then, we distill the generator into a few-step version to accelerate video generation.
33
+ Finally, we finetune the adapter using a hybrid strategy that combines diffusion and adversarial objectives, aligning it with the few-step generator to produce high-quality, trajectory-accurate videos.
34
+ For evaluation, we introduce **FlashBench**, a benchmark for long-sequence trajectory-controllable video generation that measures both video quality and trajectory accuracy across varying numbers of foreground objects.
35
+ Experiments on two adapter architectures show that FlashMotion surpasses existing video distillation methods and previous multi-step models in both visual quality and trajectory consistency.
36
+
37
+
38
+ ## πŸ“£ Updates
39
+ - `2026/03/13` πŸ”₯πŸ”₯We released FlashMotion, including its training code, inference code, model weights and also the evaluation benchmark.
40
+ - `2026/02` πŸ”₯πŸ”₯πŸ”₯ FlashMotion has been accepted by CVPR2026!
41
+
42
+ ## πŸ“‘ Table of Contents
43
+
44
+ - [πŸ’‘ Abstract](#-abstract)
45
+ - [πŸ“£ Updates](#-updates)
46
+ - [πŸ“‘ Table of Contents](#-table-of-contents)
47
+ - [βœ… TODO List](#-todo-list)
48
+ - [🐍 Installation](#-installation)
49
+ - [πŸ“¦ Model Weights](#-model-weights)
50
+ - [Folder Structure](#folder-structure)
51
+ - [Download Links](#download-links)
52
+ - [⛽️ Dataset Prepare](#️-dataset-prepare)
53
+ - [πŸ”„ Inference](#-inference)
54
+ - [Scripts](#scripts)
55
+ - [🏎️ Train](#️-train)
56
+ - [SlowAdapter Training](#slowadapter-training)
57
+ - [FastGenerator Training](#fastgenerator-training)
58
+ - [FastAdapter Training](#fastadapter-training)
59
+ - [🀝 Acknowledgements](#-acknowledgements)
60
+ - [πŸ“š Contact](#-contact)
61
+
62
+ ## βœ… TODO List
63
+
64
+ - [x] Release our inference code and model weights
65
+ - [x] Release our training code
66
+ - [x] Release our evaluation benchmark
67
+
68
+ ## 🐍 Installation
69
+
70
+ ```bash
71
+ # Clone this repository.
72
+ git clone https://github.com/quanhaol/FlashMotion
73
+ cd FlashMotion
74
+
75
+ # Install requirements
76
+ conda create -n flashmotion python=3.10 -y
77
+ conda activate flashmotion
78
+ pip install -r requirements.txt
79
+ pip install flash-attn --no-build-isolation
80
+ python setup.py develop
81
+ ```
82
+
83
+ ## πŸ“¦ Model Weights
84
+
85
+ ### Folder Structure
86
+
87
+ ```
88
+ FlashMotion
89
+ └── ckpts
90
+ β”œβ”€β”€ FastGenerator
91
+ β”‚ β”œβ”€β”€ model.pt
92
+ β”œβ”€β”€ SlowAdapter
93
+ β”‚ β”œβ”€β”€ ResNet
94
+ β”‚ └── model.pt
95
+ β”‚ β”œβ”€β”€ ControlNet
96
+ β”‚ └── model.pt
97
+ β”œβ”€β”€ FastAdapter
98
+ β”‚ β”œβ”€β”€ ResNet
99
+ β”‚ └── model.pt
100
+ β”‚ β”œβ”€β”€ ControlNet
101
+ β”‚ └── model.pt
102
+ ```
103
+
104
+ ### Download Links
105
+
106
+ Please use the following commands to download the model weights
107
+
108
+ ```bash
109
+ pip install "huggingface_hub[hf_transfer]"
110
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download quanhaol/FlashMotion --local-dir ckpts
111
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir wan_models/Wan2.2-TI2V-5B
112
+ ```
113
+
114
+ ## ⛽️ Dataset Prepare
115
+ All three training stages of FlashMotion uses [MagicData](https://huggingface.co/datasets/quanhaol/MagicData), an open-sourced dataset built for trajectory-controllable video generation.
116
+ Please follow [this README](https://huggingface.co/datasets/quanhaol/MagicData) to download and extract the data in a proper path on your machine.
117
+
118
+ The dataset structure can be organized as follows:
119
+ ```
120
+ MagicData
121
+ β”œβ”€β”€ videos
122
+ β”‚ β”œβ”€β”€ videoid_1.mp4
123
+ β”‚ β”œβ”€β”€ videoid_2.mp4
124
+ β”‚ β”œβ”€β”€ ...
125
+ β”œβ”€β”€ masks
126
+ β”‚ β”œβ”€β”€ videoid_1
127
+ β”‚ β”‚ β”œβ”€β”€ annotated_frame_00000.png
128
+ β”‚ β”‚ β”œβ”€β”€ annotated_frame_00001.png
129
+ β”‚ β”‚ β”œβ”€β”€ ...
130
+ β”‚ β”œβ”€β”€ videoid_2
131
+ β”‚ β”‚ β”œβ”€β”€ ...
132
+ β”œβ”€β”€ boxs
133
+ β”‚ β”œβ”€β”€ videoid_1
134
+ β”‚ β”‚ β”œβ”€β”€ annotated_frame_00000.png
135
+ β”‚ β”‚ β”œβ”€β”€ annotated_frame_00001.png
136
+ β”‚ β”‚ β”œβ”€β”€ ...
137
+ β”‚ β”œβ”€β”€ videoid_2
138
+ β”‚ β”‚ β”œβ”€β”€ ...
139
+ β”œβ”€β”€ MagicData.csv # detailed information of each video
140
+ ```
141
+
142
+ ## πŸ”„ Inference
143
+ The Inference process requires around 42 GiB GPU memory to use the ResNet FastAdapter and 50GiB GPU memory to use the ControlNet FastAdapter, all tested on a single NVIDIA A100 GPU.
144
+
145
+ ⚑️⚑️⚑️ It takes only 11 seconds for denoising a video using the ResNet Adapter, and around 24 seconds to denoise a video using the ControlNet Adapter.
146
+
147
+ ### Scripts
148
+
149
+ We here provide demo scripts to run both types of trajectory adapter.
150
+ ```bash
151
+ # Demo inference script of each adapter type
152
+ bash running_scripts/inference/i2v_control_fewstep_controlnet.sh
153
+ bash running_scripts/inference/i2v_control_fewstep_resnet.sh
154
+ ```
155
+ We also provide sample input image and trajectory maps in `./assets`.
156
+
157
+ Feel free to replace the `--prompt`, `--image`, `--trajectory` with your customized input prompt, input image and input trajectory maps.
158
+ > **Note**: If you want to build your own trajectory maps, please refer to the box trajectory construction pipeline introduced in [MagicMotion](https://github.com/quanhaol/MagicMotion/tree/main/trajectory_construction#box-trajectory).
159
+
160
+ ## 🏎️ Train
161
+
162
+ We here provide scripts for all three training stages of FlashMotion, including training the SlowAdapter, FastGenerator, and the FastAdapter.
163
+
164
+ ### SlowAdapter Training
165
+ In this stage, we first train the SlowAdapter using the mask annotations in MagicData, and then finetune it using bounding box as the trajectory maps conditions.
166
+ ```bash
167
+ # Demo training script of SlowAdapter
168
+ bash running_scripts/train/stage1_mask.sh
169
+ bash running_scripts/train/stage1_box.sh
170
+ ```
171
+
172
+ ### FastGenerator Training
173
+ In this stage, we distill the Wan2.2-TI2V-5B model into a 4-steps image-to-video generation model, named as the FastGenerator.
174
+ ```bash
175
+ # Demo training script of FastGenerator
176
+ bash running_scripts/train/stage2.sh
177
+ ```
178
+
179
+ ### FastAdapter Training
180
+ In this stage, we trains the FastAdapter to fit with the FastGenerator and enable few-step trajectory controllable video generation.
181
+ ```bash
182
+ # Demo training script of FastGenerator
183
+ bash running_scripts/train/stage3.sh
184
+ ```
185
+
186
+ ## 🀝 Acknowledgements
187
+
188
+ We would like to express our gratitude to the following open-source projects that have been instrumental in the development of our project:
189
+
190
+ - [Wan](https://github.com/Wan-Video/Wan2.2): An open sourced base video generation model.
191
+ - [Self-Forcing](https://github.com/guandeh17/Self-Forcing) and [Causvid](https://github.com/tianweiy/CausVid): Two frameworks that pioneer the field of distilling video generation methods.
192
+ - [MagicMotion](https://github.com/quanhaol/MagicMotion): An open source trajectory-controllable video generation framework.
193
+ - [Wan2.2-TI2V-5B-Turbo](https://github.com/quanhaol/Wan2.2-TI2V-5B-Turbo): An open source step distillation image-to-video generation framework that distill Wan2.2-5B-TI2V model into 4 steps.
194
+
195
+
196
+ Special thanks to the contributors of these libraries for their hard work and dedication!
197
+
198
+ ## πŸ“š Contact
199
+
200
+ If you have any suggestions or find our work helpful, feel free to contact us
201
+
202
+ Email: liqh24@m.fudan.edu.cn
203
+
204
+ If you find our work useful, <b>please consider giving a star to this github repository and citing it</b>:
205
+
206
+ ```bibtex
207
+ @misc{li2026flashmotionfewstepcontrollablevideo,
208
+ title={FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance},
209
+ author={Quanhao Li and Zhen Xing and Rui Wang and Haidong Cao and Qi Dai and Daoguo Dong and Zuxuan Wu},
210
+ year={2026},
211
+ eprint={2603.12146},
212
+ archivePrefix={arXiv},
213
+ primaryClass={cs.CV},
214
+ url={https://arxiv.org/abs/2603.12146},
215
+ }
216
+ ```