Text-to-Video
Diffusers
Safetensors
English
FARWanAnyFlowPipeline
Any-Step
Text-to-Video
Image-to-Video
Video-to-Video
Instructions to use nvidia/AnyFlow-FAR-Wan2.1-14B-Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use nvidia/AnyFlow-FAR-Wan2.1-14B-Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("nvidia/AnyFlow-FAR-Wan2.1-14B-Diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
File size: 7,484 Bytes
ee4cb89 89dc69d ee4cb89 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | ---
license: other
language:
- en
base_model:
- Wan-AI/Wan2.1-T2V-14B-Diffusers
pipeline_tag: text-to-video
tags:
- Any-Step
- Text-to-Video
- Image-to-Video
- Video-to-Video
---
# AnyFlow
<p align="center">
π₯οΈ <a href="https://github.com/NVlabs/AnyFlow">GitHub</a> ο½ π€ <a href="https://huggingface.co/collections/nvidia/anyflow">Hugging Face</a> ο½ π <a href="https://arxiv.org/">Paper</a> ο½ π <a href="https://nvlabs.github.io/AnyFlow">Website</a>
<br>
</p>
-----
**AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation**
In this repository, we present **AnyFlow**, the first any-step video diffusion framework built on flow maps. **AnyFlow** offers these key features:
- β‘ **Any-Step Generation**: Unlike traditional distilled models tied to fixed step budgets, **AnyFlow** enables a single model to adapt to arbitrary inference budgets. It achieves high-quality few-step generation while providing stable improvements as more sampling steps are added.
- π **Multiple Architectures**: **AnyFlow** supports any-step distillation for both **causal** and **bidirectional** video diffusion models.
- π¬ **Multiple Tasks**: **AnyFlow** supports Text-to-Video, Image-to-Video, and Video-to-Video generation within one causal video diffusion model.
- π **Scalable Performance**: **AnyFlow** is validated from **1.3B** up to **14B** parameters.
This directory contains **AnyFlow-FAR-Wan2.1-14B-Diffusers** (a 14B causal video diffusion model) in Hugging Face Diffusers format, derived from the [**Wan2.1-T2V-14B-Diffusers**](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B-Diffusers) text-to-video backbone.
## Video Demos
<div align="center">
<video width="80%" autoplay loop muted playsinline controls>
<source src="https://nvlabs.github.io/AnyFlow/assets/videos/demo_video.m4v" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
## π₯ Latest News!!
* May 4, 2026: π We've released the codebase and weights of AnyFlow.
## Quickstart
### Setup Environment
**1οΈβ£ Create Conda Environment**
```bash
conda create -n far python=3.10
conda activate far
```
**2οΈβ£ Install PyTorch and Dependencies**
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt --no-build-isolation
```
### Model Download
| Model | Tasks | Resolution | Download Link |
| ----- | ----- | ---------- | ------------- |
| `AnyFlow-FAR-Wan2.1-1.3B-Diffusers` | T2V, I2V, V2V | 480P | π€ [Hugging Face](https://huggingface.co/nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers) |
| `AnyFlow-FAR-Wan2.1-14B-Diffusers` | T2V, I2V, V2V | 480P | π€ [Hugging Face](https://huggingface.co/nvidia/AnyFlow-FAR-Wan2.1-14B-Diffusers) |
| `AnyFlow-Wan2.1-T2V-14B-Diffusers` | T2V | 480P | π€ [Hugging Face](https://huggingface.co/nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers) |
| `AnyFlow-Wan2.1-T2V-1.3B-Diffusers` | T2V | 480P | π€ [Hugging Face](https://huggingface.co/nvidia/AnyFlow-Wan2.1-T2V-1.3B-Diffusers) |
Download models using π€ hf download:
```
pip install "huggingface_hub[cli]"
hf download nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers --repo-type model --local-dir experiments/pretrained_models/AnyFlow-FAR-Wan2.1-1.3B-Diffusers
```
### Run Text-to-Video Generation with Diffusers
```python
import torch
from diffusers.utils import export_to_video
from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline
model_id = "nvidia/AnyFlow-FAR-Wan2.1-14B-Diffusers"
pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16)
prompt = "CG game concept digital art, a majestic elephant with a vibrant tusk and sleek fur running swiftly towards a herd of its kind."
video = pipeline(
prompt=prompt,
height=480,
width=832,
num_frames=81,
num_inference_steps=4,
generator=torch.Generator('cuda').manual_seed(0)
).frames[0]
export_to_video(output, "output.mp4", fps=16)
```
### Run Image-to-Video Generation with Diffusers
```python
import torch
from diffusers.utils import export_to_video
from PIL import Image
from torchvision import transforms
from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline
model_id = "nvidia/AnyFlow-FAR-Wan2.1-14B-Diffusers"
pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16)
# load image
image_path = 'assets/example_image.jpg'
prompt = 'A towering, battle-scarred humanoid robot walking through the skeletal remains of a city ruin.'
image = Image.open(image_path).convert('RGB')
image = transforms.ToTensor()(transforms.Resize([480, 832])(image)).unsqueeze(0).unsqueeze(0)
video = pipeline(
prompt=prompt,
context_sequence={'raw': image},
height=480,
width=832,
num_frames=81,
num_inference_steps=4,
generator=torch.Generator('cuda').manual_seed(0)
).frames[0]
export_to_video(output, "output.mp4", fps=16)
```
### Run Video-to-Video Generation with Diffusers
```python
import torch
from diffusers.utils import export_to_video
import decord
from torchvision import transforms
from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline
decord.bridge.set_bridge('torch')
model_id = "nvidia/AnyFlow-FAR-Wan2.1-14B-Diffusers"
pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16)
# load video
video_path = 'assets/example_video.mp4'
prompt = "A focused trail runner's powerful strides through a dense, sun-dappled forest."
video_reader = decord.VideoReader(video_path)
frame_idxs = select_frame_indices(len(video_reader), video_reader.get_avg_fps(), target_fps=16)[:num_cond_frames]
frames = video_reader.get_batch(frame_idxs)
frames = (frames / 255.0).float().permute(0, 3, 1, 2).contiguous()
frames = transforms.Resize([480, 832])(frames).unsqueeze(0)
video = pipeline(
prompt=prompt,
context_sequence={'raw': frames},
height=480,
width=832,
num_frames=81,
num_inference_steps=4,
generator=torch.Generator('cuda').manual_seed(0)
).frames[0]
export_to_video(output, "output.mp4", fps=16)
```
## License
This model is released under the NVIDIA One-Way Noncommercial License ([NSCLv1](LICENSE.md)).
Under the NVIDIA One-Way Noncommercial License (NSCLv1), NVIDIA confirms:
* Models are not for commercial use.
* NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models.
## Citation
If you find our work helpful, please cite us.
```bibtex
@article{gu2026anyflow,
title={AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation},
author={Gu, Yuchao and Fang, Guian and Jiang, Yuxin and Mao, Weijia and Han, Song and Cai, Han and Shou, Mike Zheng},
journal={arXiv preprint arXiv:2605.13724},
year={2026}
}
@article{gu2025long,
title={Long-Context Autoregressive Video Modeling with Next-Frame Prediction},
author={Gu, Yuchao and Mao, weijia and Shou, Mike Zheng},
journal={arXiv preprint arXiv:2503.19325},
year={2025}
}
```
## Acknowledgements
This codebase is built on [Diffusers](https://github.com/huggingface/diffusers). We also refer to implementations from [FAR](https://github.com/showlab/FAR), [Self-Forcing](https://github.com/guandeh17/Self-Forcing), and [TiM](https://github.com/WZDTHU/TiM). We thank the authors for open-sourcing their work.
|