File size: 7,585 Bytes
89fbde1
 
 
 
 
 
 
 
 
13f10ad
89fbde1
 
 
 
 
 
 
 
 
 
 
 
 
427ffb7
f60d564
 
 
 
 
 
 
89fbde1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
---
license: other
license_name: nvidia-open-model-license
license_link: >-
  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
pipeline_tag: text-to-video
tags:
  - text-to-video
  - multi-shot
  - NVFP4
  - video-generation
  - diffusion
  - long-video
  - longlive2
  - wan2.2
---

<p align="center">
  <img src="logo.png" alt="LongLive2.0 logo" width="100%">
</p>

# LongLive2.0 5B NVFP4 Denoising Step 4

[![Paper](https://img.shields.io/badge/ArXiv-Paper-brown)](https://arxiv.org/abs/2605.18739)
[![Code](https://img.shields.io/badge/GitHub-Code-blue)](https://github.com/NVlabs/LongLive)
[![Video](https://img.shields.io/badge/YouTube-Video-red)](https://www.youtube.com/watch?v=7oQALy32fiU)
[![Models](https://img.shields.io/badge/Model-BF16-yellow)](https://huggingface.co/Efficient-Large-Model/LongLive-2.0-5B)
[![Models](https://img.shields.io/badge/Model-NVFP4-orange)](https://huggingface.co/Efficient-Large-Model/LongLive-2.0-5B-NVFP4-S4)
[![Demo](https://img.shields.io/badge/Demo-Page-brightgreen)](https://nvlabs.github.io/LongLive/LongLive2/)
[![Docs](https://img.shields.io/badge/Full-Documentation-green)](https://nvlabs.github.io/LongLive/LongLive2/docs/)

This repository hosts the LongLive2.0 5B NVFP4 denoising step 4 checkpoint for inference
with the LongLive2.0 release code:

https://github.com/NVlabs/LongLive

LongLive2.0 inference loads the Wan2.2-TI2V-5B generator, applies the
few-step DMD adapter when a separate LoRA checkpoint is provided, and runs the
generator with NVFP4 weight quantization plus optional FP4 KV-cache
quantization.

## Installation

The NVFP4 path uses a stricter environment than the default BF16 release path.
We recommend keeping it in a separate conda environment.

```bash
git clone https://github.com/wileewang/LongLive2.0.git
cd LongLive2.0

conda create -n longlive2_nvfp4 python=3.12 -y
conda activate longlive2_nvfp4

pip install -r requirements.txt
pip install --upgrade --index-url https://download.pytorch.org/whl/cu128 \
  torch==2.10.0 torchvision==0.25.0
```

Build the NVFP4 / FP4 extensions:

```bash
cd fouroversix
pip install ninja packaging psutil "setuptools>=77.0.3"

# B200 / GB200 / GB300
export CUDA_ARCHS=100

# RTX 50/60 series, if needed
# export CUDA_ARCHS=120

pip install --no-build-isolation -e .
cd ..

git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.8.3
pip install -U pip setuptools wheel ninja packaging
pip install --no-build-isolation -e .
cd ..

cd utils/kernel
python setup.py build_ext --inplace
cd ../..
```

Quick environment check:

```bash
python -c "import torch, torchvision; print(torch.__version__, torch.version.cuda); print(torchvision.__version__)"
python -c "import flash_attn; print(flash_attn.__version__)"
python -c "import fouroversix; from utils.quant import LongLiveQuantizationConfig, quantize_to_fp4"
python -c "from utils.kernel.kv_dequant import dequantize_kv_cache_fp4"
```

The released LongLive2.0 checkpoint is sufficient for standard inference. You
only need to download the original Wan2.2-TI2V-5B components if you want to run
training, initialize from the original Wan weights, or use code paths that
explicitly load the base Wan model files:

```bash
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B \
  --local-dir wan_models/Wan2.2-TI2V-5B
```

Download this checkpoint repository:

```bash
huggingface-cli download Perflow-Shuai/LongLive-2.0-5B-NVFP4-4Step \
  --local-dir checkpoints/longlive2_5b_nvfp4_4step
```

## Configure Inference

Edit `configs/nvfp4/inference_nvfp4.yaml`.

For the released 4-step NVFP4 checkpoint, keep
`inference.sampling_steps: 4`:

```yaml
checkpoints:
  generator_ckpt: checkpoints/longlive2_5b_nvfp4_4step/path/to/generator.pt
  lora_ckpt: null

merge_lora: false

data:
  data_path: /path/to/inference_prompts
  image_or_video_shape:
  - 1
  - 384
  - 48
  - 44
  - 80

output_folder: videos/longlive2_nvfp4_4step
num_samples: 1
num_output_frames: 384

inference:
  sampling_steps: 4
  sink_size: 8
  guidance_scale: 1.0
  multi_shot_sink: true
  multi_shot_rope_offset: 8
  kv_quant: true
  kv_quant_scale_rule: mse
  kv_quant_backend: cuda
  streaming_vae: false
  async_vae: false
  vae_type: wan

model_quant: true
model_quant_use_transformer_engine: false
model_quant_scale_rule: mse
model_quant_activation_scale_rule: mse
model_quant_weight_scale_rule: mse
model_quant_gradient_scale_rule: mse
```

Replace the checkpoint filename above with the actual file in this repository.
If this repository contains a separate DMD LoRA checkpoint instead of a merged
generator, set `checkpoints.lora_ckpt` to that LoRA file and set
`merge_lora: true`, then add the LoRA adapter config:

```yaml
adapter:
  type: lora
  rank: 128
  alpha: 128
  dropout: 0.0
  dtype: bfloat16
  apply_to_critic: true
  verbose: true
```

If `checkpoints.lora_ckpt` is `null`, remove the `adapter` section.

Do not set `model_quant_use_transformer_engine: true` when loading a FourOverSix
materialized NVFP4 checkpoint. FourOverSix checkpoints store
`quantized_weight_*` buffers and should be loaded through the FourOverSix path.

## Prompt Folder

`data.data_path` can be either:

- a `.txt` file, where each line is one single-shot prompt; or
- a directory of multi-shot prompt folders.

Example multi-shot prompt folder:

```text
inference_prompts/
  robot_lab_demo/
    0.json
    1.json
    2.json
    shot_durations.txt
```

Each JSON file contains:

```json
{
  "caption": "A compact silver robot with one blue optic explores a clean robotics lab."
}
```

`shot_durations.txt` is optional. If provided, each number is the number of
temporal chunks assigned to the corresponding caption, for example:

```text
2 2 4
```

## Run

Single node, 4 GPUs:

```bash
torchrun --standalone --nnodes=1 --nproc_per_node=4 inference.py \
  --config_path configs/nvfp4/inference_nvfp4.yaml
```

Single GPU:

```bash
python inference.py --config_path configs/nvfp4/inference_nvfp4.yaml
```

Or use the helper script, which reads `NUM_GPUS` / `num_gpus` when provided:

```bash
scripts/inference_nvfp4.sh configs/nvfp4/inference_nvfp4.yaml
```

Outputs are written to `output_folder`.

## Notes

- This model card is for the **4-step** NVFP4 checkpoint. Use
  `inference.sampling_steps: 4`.
- `model_quant` enables NVFP4 generator inference.
- `inference.kv_quant` enables FP4 KV-cache storage and requires the
  `utils/kernel` extension.
- `inference.multi_shot_sink` enables the multi-shot attention sink.
- `inference.multi_shot_rope_offset` controls the multi-shot RoPE offset.
- `inference.streaming_vae`, `inference.async_vae`, `inference.vae_type`, and
  `inference.vae_device` control streaming or asynchronous VAE decode.

## License/Terms of Use

GOVERNING TERMS: This trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf). Use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).

## Citation

```bibtex
@article{longlive_2,
  title={LongLive2.0: An NVFP4 Parallel Infrastructure for Long Video Generation},
  author={Chen, Yukang and Wang, Luozhou and Huang, Wei and Yang, Shuai and Zhang, Bohan and Xiao, Yicheng and Chu, Ruihang and Mao, Weian and Hu, Qixin and Liu, Shaoteng and Zhao, Yuyang and Mao, Huizi and Chen, Ying-Cong and Xie, Enze and Qi, Xiaojuan and Han, Song},
  journal={arXiv preprint arXiv},
  year={2026}
}
```