--- license: mit datasets: - quanhaol/MagicData base_model: - quanhaol/Wan2.2-TI2V-5B-Turbo - Wan-AI/Wan2.2-TI2V-5B tags: - image-to-video - Trajectory-Control - Fewstep-video-gen ---
HuggingFace HuggingFace > **FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance** >
> [Quanhao Li](https://github.com/quanhaol)1, [Zhen Xing](https://chenhsing.github.io/)1, [Rui Wang](https://scholar.google.com/citations?user=116smmsAAAAJ&hl=en)1, Haidong Cao1, [Qi Dai](https://daiqi1989.github.io/)2, Daoguo Dong1 and [Zuxuan Wu](https://zxwu.azurewebsites.net/)1 > > 1 Fudan University; 2 Microsoft Research Asia ## 💡 Abstract Recent advances in trajectory-controllable video generation have achieved remarkable progress. Previous methods mainly use adapter-based architectures for precise motion control along predefined trajectories. However, all these methods rely on a multi-step denoising process, leading to substantial time redundancy and computational overhead. While existing video distillation methods successfully distill multi-step generators into few-step, directly applying these approaches to trajectory-controllable video generation results in noticeable degradation in both video quality and trajectory accuracy. To bridge this gap, we introduce **FlashMotion**, a novel training framework designed for few-step trajectory-controllable video generation. We first train a trajectory adapter on a multi-step video generator for precise trajectory control. Then, we distill the generator into a few-step version to accelerate video generation. Finally, we finetune the adapter using a hybrid strategy that combines diffusion and adversarial objectives, aligning it with the few-step generator to produce high-quality, trajectory-accurate videos. For evaluation, we introduce **FlashBench**, a benchmark for long-sequence trajectory-controllable video generation that measures both video quality and trajectory accuracy across varying numbers of foreground objects. Experiments on two adapter architectures show that FlashMotion surpasses existing video distillation methods and previous multi-step models in both visual quality and trajectory consistency. ## 📣 Updates - `2026/03/13` 🔥🔥We released FlashMotion, including its training code, inference code, model weights and also the evaluation benchmark. - `2026/02` 🔥🔥🔥 FlashMotion has been accepted by CVPR2026! ## 📑 Table of Contents - [💡 Abstract](#-abstract) - [📣 Updates](#-updates) - [📑 Table of Contents](#-table-of-contents) - [✅ TODO List](#-todo-list) - [🐍 Installation](#-installation) - [📦 Model Weights](#-model-weights) - [Folder Structure](#folder-structure) - [Download Links](#download-links) - [⛽️ Dataset Prepare](#️-dataset-prepare) - [🔄 Inference](#-inference) - [Scripts](#scripts) - [🏎️ Train](#️-train) - [SlowAdapter Training](#slowadapter-training) - [FastGenerator Training](#fastgenerator-training) - [FastAdapter Training](#fastadapter-training) - [🤝 Acknowledgements](#-acknowledgements) - [📚 Contact](#-contact) ## ✅ TODO List - [x] Release our inference code and model weights - [x] Release our training code - [x] Release our evaluation benchmark ## 🐍 Installation ```bash # Clone this repository. git clone https://github.com/quanhaol/FlashMotion cd FlashMotion # Install requirements conda create -n flashmotion python=3.10 -y conda activate flashmotion pip install -r requirements.txt pip install flash-attn --no-build-isolation python setup.py develop ``` ## 📦 Model Weights ### Folder Structure ``` FlashMotion └── ckpts ├── FastGenerator │ ├── model.pt ├── SlowAdapter │ ├── ResNet │ └── model.pt │ ├── ControlNet │ └── model.pt ├── FastAdapter │ ├── ResNet │ └── model.pt │ ├── ControlNet │ └── model.pt ``` ### Download Links Please use the following commands to download the model weights ```bash pip install "huggingface_hub[hf_transfer]" HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download quanhaol/FlashMotion --local-dir ckpts HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir wan_models/Wan2.2-TI2V-5B ``` ## ⛽️ Dataset Prepare All three training stages of FlashMotion uses [MagicData](https://huggingface.co/datasets/quanhaol/MagicData), an open-sourced dataset built for trajectory-controllable video generation. Please follow [this README](https://huggingface.co/datasets/quanhaol/MagicData) to download and extract the data in a proper path on your machine. The dataset structure can be organized as follows: ``` MagicData ├── videos │ ├── videoid_1.mp4 │ ├── videoid_2.mp4 │ ├── ... ├── masks │ ├── videoid_1 │ │ ├── annotated_frame_00000.png │ │ ├── annotated_frame_00001.png │ │ ├── ... │ ├── videoid_2 │ │ ├── ... ├── boxs │ ├── videoid_1 │ │ ├── annotated_frame_00000.png │ │ ├── annotated_frame_00001.png │ │ ├── ... │ ├── videoid_2 │ │ ├── ... ├── MagicData.csv # detailed information of each video ``` ## 🔄 Inference The Inference process requires around 42 GiB GPU memory to use the ResNet FastAdapter and 50GiB GPU memory to use the ControlNet FastAdapter, all tested on a single NVIDIA A100 GPU. ⚡️⚡️⚡️ It takes only 11 seconds for denoising a video using the ResNet Adapter, and around 24 seconds to denoise a video using the ControlNet Adapter. ### Scripts We here provide demo scripts to run both types of trajectory adapter. ```bash # Demo inference script of each adapter type bash running_scripts/inference/i2v_control_fewstep_controlnet.sh bash running_scripts/inference/i2v_control_fewstep_resnet.sh ``` We also provide sample input image and trajectory maps in `./assets`. Feel free to replace the `--prompt`, `--image`, `--trajectory` with your customized input prompt, input image and input trajectory maps. > **Note**: If you want to build your own trajectory maps, please refer to the box trajectory construction pipeline introduced in [MagicMotion](https://github.com/quanhaol/MagicMotion/tree/main/trajectory_construction#box-trajectory). ## 🏎️ Train We here provide scripts for all three training stages of FlashMotion, including training the SlowAdapter, FastGenerator, and the FastAdapter. ### SlowAdapter Training In this stage, we first train the SlowAdapter using the mask annotations in MagicData, and then finetune it using bounding box as the trajectory maps conditions. ```bash # Demo training script of SlowAdapter bash running_scripts/train/stage1_mask.sh bash running_scripts/train/stage1_box.sh ``` ### FastGenerator Training In this stage, we distill the Wan2.2-TI2V-5B model into a 4-steps image-to-video generation model, named as the FastGenerator. ```bash # Demo training script of FastGenerator bash running_scripts/train/stage2.sh ``` ### FastAdapter Training In this stage, we trains the FastAdapter to fit with the FastGenerator and enable few-step trajectory controllable video generation. ```bash # Demo training script of FastGenerator bash running_scripts/train/stage3.sh ``` ## 🤝 Acknowledgements We would like to express our gratitude to the following open-source projects that have been instrumental in the development of our project: - [Wan](https://github.com/Wan-Video/Wan2.2): An open sourced base video generation model. - [Self-Forcing](https://github.com/guandeh17/Self-Forcing) and [Causvid](https://github.com/tianweiy/CausVid): Two frameworks that pioneer the field of distilling video generation methods. - [MagicMotion](https://github.com/quanhaol/MagicMotion): An open source trajectory-controllable video generation framework. - [Wan2.2-TI2V-5B-Turbo](https://github.com/quanhaol/Wan2.2-TI2V-5B-Turbo): An open source step distillation image-to-video generation framework that distill Wan2.2-5B-TI2V model into 4 steps. Special thanks to the contributors of these libraries for their hard work and dedication! ## 📚 Contact If you have any suggestions or find our work helpful, feel free to contact us Email: liqh24@m.fudan.edu.cn If you find our work useful, please consider giving a star to this github repository and citing it: ```bibtex @misc{li2026flashmotionfewstepcontrollablevideo, title={FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance}, author={Quanhao Li and Zhen Xing and Rui Wang and Haidong Cao and Qi Dai and Daoguo Dong and Zuxuan Wu}, year={2026}, eprint={2603.12146}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2603.12146}, } ```