Instructions to use Efficient-Large-Model/LongLive-2.0-5B-NVFP4-S4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Wan2.2
How to use Efficient-Large-Model/LongLive-2.0-5B-NVFP4-S4 with Wan2.2:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
File size: 7,585 Bytes
89fbde1 13f10ad 89fbde1 427ffb7 f60d564 89fbde1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 | ---
license: other
license_name: nvidia-open-model-license
license_link: >-
https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
pipeline_tag: text-to-video
tags:
- text-to-video
- multi-shot
- NVFP4
- video-generation
- diffusion
- long-video
- longlive2
- wan2.2
---
<p align="center">
<img src="logo.png" alt="LongLive2.0 logo" width="100%">
</p>
# LongLive2.0 5B NVFP4 Denoising Step 4
[](https://arxiv.org/abs/2605.18739)
[](https://github.com/NVlabs/LongLive)
[](https://www.youtube.com/watch?v=7oQALy32fiU)
[](https://huggingface.co/Efficient-Large-Model/LongLive-2.0-5B)
[](https://huggingface.co/Efficient-Large-Model/LongLive-2.0-5B-NVFP4-S4)
[](https://nvlabs.github.io/LongLive/LongLive2/)
[](https://nvlabs.github.io/LongLive/LongLive2/docs/)
This repository hosts the LongLive2.0 5B NVFP4 denoising step 4 checkpoint for inference
with the LongLive2.0 release code:
https://github.com/NVlabs/LongLive
LongLive2.0 inference loads the Wan2.2-TI2V-5B generator, applies the
few-step DMD adapter when a separate LoRA checkpoint is provided, and runs the
generator with NVFP4 weight quantization plus optional FP4 KV-cache
quantization.
## Installation
The NVFP4 path uses a stricter environment than the default BF16 release path.
We recommend keeping it in a separate conda environment.
```bash
git clone https://github.com/wileewang/LongLive2.0.git
cd LongLive2.0
conda create -n longlive2_nvfp4 python=3.12 -y
conda activate longlive2_nvfp4
pip install -r requirements.txt
pip install --upgrade --index-url https://download.pytorch.org/whl/cu128 \
torch==2.10.0 torchvision==0.25.0
```
Build the NVFP4 / FP4 extensions:
```bash
cd fouroversix
pip install ninja packaging psutil "setuptools>=77.0.3"
# B200 / GB200 / GB300
export CUDA_ARCHS=100
# RTX 50/60 series, if needed
# export CUDA_ARCHS=120
pip install --no-build-isolation -e .
cd ..
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.8.3
pip install -U pip setuptools wheel ninja packaging
pip install --no-build-isolation -e .
cd ..
cd utils/kernel
python setup.py build_ext --inplace
cd ../..
```
Quick environment check:
```bash
python -c "import torch, torchvision; print(torch.__version__, torch.version.cuda); print(torchvision.__version__)"
python -c "import flash_attn; print(flash_attn.__version__)"
python -c "import fouroversix; from utils.quant import LongLiveQuantizationConfig, quantize_to_fp4"
python -c "from utils.kernel.kv_dequant import dequantize_kv_cache_fp4"
```
The released LongLive2.0 checkpoint is sufficient for standard inference. You
only need to download the original Wan2.2-TI2V-5B components if you want to run
training, initialize from the original Wan weights, or use code paths that
explicitly load the base Wan model files:
```bash
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B \
--local-dir wan_models/Wan2.2-TI2V-5B
```
Download this checkpoint repository:
```bash
huggingface-cli download Perflow-Shuai/LongLive-2.0-5B-NVFP4-4Step \
--local-dir checkpoints/longlive2_5b_nvfp4_4step
```
## Configure Inference
Edit `configs/nvfp4/inference_nvfp4.yaml`.
For the released 4-step NVFP4 checkpoint, keep
`inference.sampling_steps: 4`:
```yaml
checkpoints:
generator_ckpt: checkpoints/longlive2_5b_nvfp4_4step/path/to/generator.pt
lora_ckpt: null
merge_lora: false
data:
data_path: /path/to/inference_prompts
image_or_video_shape:
- 1
- 384
- 48
- 44
- 80
output_folder: videos/longlive2_nvfp4_4step
num_samples: 1
num_output_frames: 384
inference:
sampling_steps: 4
sink_size: 8
guidance_scale: 1.0
multi_shot_sink: true
multi_shot_rope_offset: 8
kv_quant: true
kv_quant_scale_rule: mse
kv_quant_backend: cuda
streaming_vae: false
async_vae: false
vae_type: wan
model_quant: true
model_quant_use_transformer_engine: false
model_quant_scale_rule: mse
model_quant_activation_scale_rule: mse
model_quant_weight_scale_rule: mse
model_quant_gradient_scale_rule: mse
```
Replace the checkpoint filename above with the actual file in this repository.
If this repository contains a separate DMD LoRA checkpoint instead of a merged
generator, set `checkpoints.lora_ckpt` to that LoRA file and set
`merge_lora: true`, then add the LoRA adapter config:
```yaml
adapter:
type: lora
rank: 128
alpha: 128
dropout: 0.0
dtype: bfloat16
apply_to_critic: true
verbose: true
```
If `checkpoints.lora_ckpt` is `null`, remove the `adapter` section.
Do not set `model_quant_use_transformer_engine: true` when loading a FourOverSix
materialized NVFP4 checkpoint. FourOverSix checkpoints store
`quantized_weight_*` buffers and should be loaded through the FourOverSix path.
## Prompt Folder
`data.data_path` can be either:
- a `.txt` file, where each line is one single-shot prompt; or
- a directory of multi-shot prompt folders.
Example multi-shot prompt folder:
```text
inference_prompts/
robot_lab_demo/
0.json
1.json
2.json
shot_durations.txt
```
Each JSON file contains:
```json
{
"caption": "A compact silver robot with one blue optic explores a clean robotics lab."
}
```
`shot_durations.txt` is optional. If provided, each number is the number of
temporal chunks assigned to the corresponding caption, for example:
```text
2 2 4
```
## Run
Single node, 4 GPUs:
```bash
torchrun --standalone --nnodes=1 --nproc_per_node=4 inference.py \
--config_path configs/nvfp4/inference_nvfp4.yaml
```
Single GPU:
```bash
python inference.py --config_path configs/nvfp4/inference_nvfp4.yaml
```
Or use the helper script, which reads `NUM_GPUS` / `num_gpus` when provided:
```bash
scripts/inference_nvfp4.sh configs/nvfp4/inference_nvfp4.yaml
```
Outputs are written to `output_folder`.
## Notes
- This model card is for the **4-step** NVFP4 checkpoint. Use
`inference.sampling_steps: 4`.
- `model_quant` enables NVFP4 generator inference.
- `inference.kv_quant` enables FP4 KV-cache storage and requires the
`utils/kernel` extension.
- `inference.multi_shot_sink` enables the multi-shot attention sink.
- `inference.multi_shot_rope_offset` controls the multi-shot RoPE offset.
- `inference.streaming_vae`, `inference.async_vae`, `inference.vae_type`, and
`inference.vae_device` control streaming or asynchronous VAE decode.
## License/Terms of Use
GOVERNING TERMS: This trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf). Use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
## Citation
```bibtex
@article{longlive_2,
title={LongLive2.0: An NVFP4 Parallel Infrastructure for Long Video Generation},
author={Chen, Yukang and Wang, Luozhou and Huang, Wei and Yang, Shuai and Zhang, Bohan and Xiao, Yicheng and Chu, Ruihang and Mao, Weian and Hu, Qixin and Liu, Shaoteng and Zhao, Yuyang and Mao, Huizi and Chen, Ying-Cong and Xie, Enze and Qi, Xiaojuan and Han, Song},
journal={arXiv preprint arXiv},
year={2026}
}
```
|