Yukang commited on
Commit
3a00b9a
·
verified ·
1 Parent(s): 1e029d8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +255 -0
README.md ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: nvidia-open-model-license
4
+ license_link: >-
5
+ https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
6
+ pipeline_tag: text-to-video
7
+ tags:
8
+ - text-to-video
9
+ - multi-shot
10
+ - NVFP4
11
+ - video-generation
12
+ - diffusion
13
+ - long-video
14
+ - longlive2
15
+ - wan2.2
16
+ ---
17
+
18
+ <p align="center">
19
+ <img src="logo.png" alt="LongLive2.0 logo" width="100%">
20
+ </p>
21
+
22
+ # LongLive2.0 5B NVFP4 Denoising Step 2
23
+
24
+ This repository hosts the LongLive2.0 5B NVFP4 denoising step 2 checkpoint for inference
25
+ with the LongLive2.0 release code:
26
+
27
+ https://github.com/NVlabs/LongLive
28
+
29
+ LongLive2.0 inference loads the Wan2.2-TI2V-5B generator, applies the
30
+ few-step DMD adapter when a separate LoRA checkpoint is provided, and runs the
31
+ generator with NVFP4 weight quantization plus optional FP4 KV-cache
32
+ quantization.
33
+
34
+ ## Installation
35
+
36
+ The NVFP4 path uses a stricter environment than the default BF16 release path.
37
+ We recommend keeping it in a separate conda environment.
38
+
39
+ ```bash
40
+ git clone https://github.com/wileewang/LongLive2.0.git
41
+ cd LongLive2.0
42
+
43
+ conda create -n longlive2_nvfp4 python=3.12 -y
44
+ conda activate longlive2_nvfp4
45
+
46
+ pip install -r requirements.txt
47
+ pip install --upgrade --index-url https://download.pytorch.org/whl/cu128 \
48
+ torch==2.10.0 torchvision==0.25.0
49
+ ```
50
+
51
+ Build the NVFP4 / FP4 extensions:
52
+
53
+ ```bash
54
+ cd fouroversix
55
+ pip install ninja packaging psutil "setuptools>=77.0.3"
56
+
57
+ # B200 / GB200 / GB300
58
+ export CUDA_ARCHS=100
59
+
60
+ # RTX 50/60 series, if needed
61
+ # export CUDA_ARCHS=120
62
+
63
+ pip install --no-build-isolation -e .
64
+ cd ..
65
+
66
+ git clone https://github.com/Dao-AILab/flash-attention.git
67
+ cd flash-attention
68
+ git checkout v2.8.3
69
+ pip install -U pip setuptools wheel ninja packaging
70
+ pip install --no-build-isolation -e .
71
+ cd ..
72
+
73
+ cd utils/kernel
74
+ python setup.py build_ext --inplace
75
+ cd ../..
76
+ ```
77
+
78
+ Quick environment check:
79
+
80
+ ```bash
81
+ python -c "import torch, torchvision; print(torch.__version__, torch.version.cuda); print(torchvision.__version__)"
82
+ python -c "import flash_attn; print(flash_attn.__version__)"
83
+ python -c "import fouroversix; from utils.quant import LongLiveQuantizationConfig, quantize_to_fp4"
84
+ python -c "from utils.kernel.kv_dequant import dequantize_kv_cache_fp4"
85
+ ```
86
+
87
+ The released LongLive2.0 checkpoint is sufficient for standard inference. You
88
+ only need to download the original Wan2.2-TI2V-5B components if you want to run
89
+ training, initialize from the original Wan weights, or use code paths that
90
+ explicitly load the base Wan model files:
91
+
92
+ ```bash
93
+ huggingface-cli download Wan-AI/Wan2.2-TI2V-5B \
94
+ --local-dir wan_models/Wan2.2-TI2V-5B
95
+ ```
96
+
97
+ Download this checkpoint repository:
98
+
99
+ ```bash
100
+ huggingface-cli download Perflow-Shuai/LongLive-2.0-5B-NVFP4-2Step \
101
+ --local-dir checkpoints/longlive2_5b_nvfp4_2step
102
+ ```
103
+
104
+ ## Configure Inference
105
+
106
+ Edit `configs/nvfp4/inference_nvfp4.yaml`.
107
+
108
+ For the released 2-step NVFP4 checkpoint, keep
109
+ `inference.sampling_steps: 2`:
110
+
111
+ ```yaml
112
+ checkpoints:
113
+ generator_ckpt: checkpoints/longlive2_5b_nvfp4_2step/path/to/generator.pt
114
+ lora_ckpt: null
115
+
116
+ merge_lora: false
117
+
118
+ data:
119
+ data_path: /path/to/inference_prompts
120
+ image_or_video_shape:
121
+ - 1
122
+ - 384
123
+ - 48
124
+ - 44
125
+ - 80
126
+
127
+ output_folder: videos/longlive2_nvfp4_2step
128
+ num_samples: 1
129
+ num_output_frames: 384
130
+
131
+ inference:
132
+ sampling_steps: 2
133
+ sink_size: 8
134
+ guidance_scale: 1.0
135
+ multi_shot_sink: true
136
+ multi_shot_rope_offset: 8
137
+ kv_quant: true
138
+ kv_quant_scale_rule: mse
139
+ kv_quant_backend: cuda
140
+ streaming_vae: false
141
+ async_vae: false
142
+ vae_type: wan
143
+
144
+ model_quant: true
145
+ model_quant_use_transformer_engine: false
146
+ model_quant_scale_rule: mse
147
+ model_quant_activation_scale_rule: mse
148
+ model_quant_weight_scale_rule: mse
149
+ model_quant_gradient_scale_rule: mse
150
+ ```
151
+
152
+ Replace the checkpoint filename above with the actual file in this repository.
153
+ If this repository contains a separate DMD LoRA checkpoint instead of a merged
154
+ generator, set `checkpoints.lora_ckpt` to that LoRA file and set
155
+ `merge_lora: true`, then add the LoRA adapter config:
156
+
157
+ ```yaml
158
+ adapter:
159
+ type: lora
160
+ rank: 128
161
+ alpha: 128
162
+ dropout: 0.0
163
+ dtype: bfloat16
164
+ apply_to_critic: true
165
+ verbose: true
166
+ ```
167
+
168
+ If `checkpoints.lora_ckpt` is `null`, remove the `adapter` section.
169
+
170
+ Do not set `model_quant_use_transformer_engine: true` when loading a FourOverSix
171
+ materialized NVFP4 checkpoint. FourOverSix checkpoints store
172
+ `quantized_weight_*` buffers and should be loaded through the FourOverSix path.
173
+
174
+ ## Prompt Folder
175
+
176
+ `data.data_path` can be either:
177
+
178
+ - a `.txt` file, where each line is one single-shot prompt; or
179
+ - a directory of multi-shot prompt folders.
180
+
181
+ Example multi-shot prompt folder:
182
+
183
+ ```text
184
+ inference_prompts/
185
+ robot_lab_demo/
186
+ 0.json
187
+ 1.json
188
+ 2.json
189
+ shot_durations.txt
190
+ ```
191
+
192
+ Each JSON file contains:
193
+
194
+ ```json
195
+ {
196
+ "caption": "A compact silver robot with one blue optic explores a clean robotics lab."
197
+ }
198
+ ```
199
+
200
+ `shot_durations.txt` is optional. If provided, each number is the number of
201
+ temporal chunks assigned to the corresponding caption, for example:
202
+
203
+ ```text
204
+ 2 2 4
205
+ ```
206
+
207
+ ## Run
208
+
209
+ Single node, 4 GPUs:
210
+
211
+ ```bash
212
+ torchrun --standalone --nnodes=1 --nproc_per_node=4 inference.py \
213
+ --config_path configs/nvfp4/inference_nvfp4.yaml
214
+ ```
215
+
216
+ Single GPU:
217
+
218
+ ```bash
219
+ python inference.py --config_path configs/nvfp4/inference_nvfp4.yaml
220
+ ```
221
+
222
+ Or use the helper script, which reads `NUM_GPUS` / `num_gpus` when provided:
223
+
224
+ ```bash
225
+ scripts/inference_nvfp4.sh configs/nvfp4/inference_nvfp4.yaml
226
+ ```
227
+
228
+ Outputs are written to `output_folder`.
229
+
230
+ ## Notes
231
+
232
+ - This model card is for the **2-step** NVFP4 checkpoint. Use
233
+ `inference.sampling_steps: 2`.
234
+ - `model_quant` enables NVFP4 generator inference.
235
+ - `inference.kv_quant` enables FP4 KV-cache storage and requires the
236
+ `utils/kernel` extension.
237
+ - `inference.multi_shot_sink` enables the multi-shot attention sink.
238
+ - `inference.multi_shot_rope_offset` controls the multi-shot RoPE offset.
239
+ - `inference.streaming_vae`, `inference.async_vae`, `inference.vae_type`, and
240
+ `inference.vae_device` control streaming or asynchronous VAE decode.
241
+
242
+ ## License/Terms of Use
243
+
244
+ GOVERNING TERMS: This trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf). Use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
245
+
246
+ ## Citation
247
+
248
+ ```bibtex
249
+ @article{longlive_2,
250
+ title={LongLive2.0: An NVFP4 Parallel Infrastructure for Long Video Generation},
251
+ author={Chen, Yukang and Wang, Luozhou and Huang, Wei and Yang, Shuai and Zhang, Bohan and Xiao, Yicheng and Chu, Ruihang and Mao, Weian and Hu, Qixin and Liu, Shaoteng and Zhao, Yuyang and Mao, Huizi and Chen, Ying-Cong and Xie, Enze and Qi, Xiaojuan and Han, Song},
252
+ journal={arXiv preprint arXiv},
253
+ year={2026}
254
+ }
255
+ ```