Yukang commited on
Commit
89fbde1
·
verified ·
1 Parent(s): 3af7b5c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +254 -0
README.md ADDED
@@ -0,0 +1,254 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: nvidia-open-model-license
4
+ license_link: >-
5
+ https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
6
+ pipeline_tag: text-to-video
7
+ tags:
8
+ - text-to-video
9
+ - multi-shot
10
+ - video-generation
11
+ - diffusion
12
+ - long-video
13
+ - longlive2
14
+ - wan2.2
15
+ ---
16
+
17
+ <p align="center">
18
+ <img src="logo.png" alt="LongLive2.0 logo" width="100%">
19
+ </p>
20
+
21
+ # LongLive2.0 5B NVFP4 Denoising Step 4
22
+
23
+ This repository hosts the LongLive2.0 5B NVFP4 denoising step 4 checkpoint for inference
24
+ with the LongLive2.0 release code:
25
+
26
+ https://github.com/NVlabs/LongLive
27
+
28
+ LongLive2.0 inference loads the Wan2.2-TI2V-5B generator, applies the
29
+ few-step DMD adapter when a separate LoRA checkpoint is provided, and runs the
30
+ generator with NVFP4 weight quantization plus optional FP4 KV-cache
31
+ quantization.
32
+
33
+ ## Installation
34
+
35
+ The NVFP4 path uses a stricter environment than the default BF16 release path.
36
+ We recommend keeping it in a separate conda environment.
37
+
38
+ ```bash
39
+ git clone https://github.com/wileewang/LongLive2.0.git
40
+ cd LongLive2.0
41
+
42
+ conda create -n longlive2_nvfp4 python=3.12 -y
43
+ conda activate longlive2_nvfp4
44
+
45
+ pip install -r requirements.txt
46
+ pip install --upgrade --index-url https://download.pytorch.org/whl/cu128 \
47
+ torch==2.10.0 torchvision==0.25.0
48
+ ```
49
+
50
+ Build the NVFP4 / FP4 extensions:
51
+
52
+ ```bash
53
+ cd fouroversix
54
+ pip install ninja packaging psutil "setuptools>=77.0.3"
55
+
56
+ # B200 / GB200 / GB300
57
+ export CUDA_ARCHS=100
58
+
59
+ # RTX 50/60 series, if needed
60
+ # export CUDA_ARCHS=120
61
+
62
+ pip install --no-build-isolation -e .
63
+ cd ..
64
+
65
+ git clone https://github.com/Dao-AILab/flash-attention.git
66
+ cd flash-attention
67
+ git checkout v2.8.3
68
+ pip install -U pip setuptools wheel ninja packaging
69
+ pip install --no-build-isolation -e .
70
+ cd ..
71
+
72
+ cd utils/kernel
73
+ python setup.py build_ext --inplace
74
+ cd ../..
75
+ ```
76
+
77
+ Quick environment check:
78
+
79
+ ```bash
80
+ python -c "import torch, torchvision; print(torch.__version__, torch.version.cuda); print(torchvision.__version__)"
81
+ python -c "import flash_attn; print(flash_attn.__version__)"
82
+ python -c "import fouroversix; from utils.quant import LongLiveQuantizationConfig, quantize_to_fp4"
83
+ python -c "from utils.kernel.kv_dequant import dequantize_kv_cache_fp4"
84
+ ```
85
+
86
+ The released LongLive2.0 checkpoint is sufficient for standard inference. You
87
+ only need to download the original Wan2.2-TI2V-5B components if you want to run
88
+ training, initialize from the original Wan weights, or use code paths that
89
+ explicitly load the base Wan model files:
90
+
91
+ ```bash
92
+ huggingface-cli download Wan-AI/Wan2.2-TI2V-5B \
93
+ --local-dir wan_models/Wan2.2-TI2V-5B
94
+ ```
95
+
96
+ Download this checkpoint repository:
97
+
98
+ ```bash
99
+ huggingface-cli download Perflow-Shuai/LongLive-2.0-5B-NVFP4-4Step \
100
+ --local-dir checkpoints/longlive2_5b_nvfp4_4step
101
+ ```
102
+
103
+ ## Configure Inference
104
+
105
+ Edit `configs/nvfp4/inference_nvfp4.yaml`.
106
+
107
+ For the released 4-step NVFP4 checkpoint, keep
108
+ `inference.sampling_steps: 4`:
109
+
110
+ ```yaml
111
+ checkpoints:
112
+ generator_ckpt: checkpoints/longlive2_5b_nvfp4_4step/path/to/generator.pt
113
+ lora_ckpt: null
114
+
115
+ merge_lora: false
116
+
117
+ data:
118
+ data_path: /path/to/inference_prompts
119
+ image_or_video_shape:
120
+ - 1
121
+ - 384
122
+ - 48
123
+ - 44
124
+ - 80
125
+
126
+ output_folder: videos/longlive2_nvfp4_4step
127
+ num_samples: 1
128
+ num_output_frames: 384
129
+
130
+ inference:
131
+ sampling_steps: 4
132
+ sink_size: 8
133
+ guidance_scale: 1.0
134
+ multi_shot_sink: true
135
+ multi_shot_rope_offset: 8
136
+ kv_quant: true
137
+ kv_quant_scale_rule: mse
138
+ kv_quant_backend: cuda
139
+ streaming_vae: false
140
+ async_vae: false
141
+ vae_type: wan
142
+
143
+ model_quant: true
144
+ model_quant_use_transformer_engine: false
145
+ model_quant_scale_rule: mse
146
+ model_quant_activation_scale_rule: mse
147
+ model_quant_weight_scale_rule: mse
148
+ model_quant_gradient_scale_rule: mse
149
+ ```
150
+
151
+ Replace the checkpoint filename above with the actual file in this repository.
152
+ If this repository contains a separate DMD LoRA checkpoint instead of a merged
153
+ generator, set `checkpoints.lora_ckpt` to that LoRA file and set
154
+ `merge_lora: true`, then add the LoRA adapter config:
155
+
156
+ ```yaml
157
+ adapter:
158
+ type: lora
159
+ rank: 128
160
+ alpha: 128
161
+ dropout: 0.0
162
+ dtype: bfloat16
163
+ apply_to_critic: true
164
+ verbose: true
165
+ ```
166
+
167
+ If `checkpoints.lora_ckpt` is `null`, remove the `adapter` section.
168
+
169
+ Do not set `model_quant_use_transformer_engine: true` when loading a FourOverSix
170
+ materialized NVFP4 checkpoint. FourOverSix checkpoints store
171
+ `quantized_weight_*` buffers and should be loaded through the FourOverSix path.
172
+
173
+ ## Prompt Folder
174
+
175
+ `data.data_path` can be either:
176
+
177
+ - a `.txt` file, where each line is one single-shot prompt; or
178
+ - a directory of multi-shot prompt folders.
179
+
180
+ Example multi-shot prompt folder:
181
+
182
+ ```text
183
+ inference_prompts/
184
+ robot_lab_demo/
185
+ 0.json
186
+ 1.json
187
+ 2.json
188
+ shot_durations.txt
189
+ ```
190
+
191
+ Each JSON file contains:
192
+
193
+ ```json
194
+ {
195
+ "caption": "A compact silver robot with one blue optic explores a clean robotics lab."
196
+ }
197
+ ```
198
+
199
+ `shot_durations.txt` is optional. If provided, each number is the number of
200
+ temporal chunks assigned to the corresponding caption, for example:
201
+
202
+ ```text
203
+ 2 2 4
204
+ ```
205
+
206
+ ## Run
207
+
208
+ Single node, 4 GPUs:
209
+
210
+ ```bash
211
+ torchrun --standalone --nnodes=1 --nproc_per_node=4 inference.py \
212
+ --config_path configs/nvfp4/inference_nvfp4.yaml
213
+ ```
214
+
215
+ Single GPU:
216
+
217
+ ```bash
218
+ python inference.py --config_path configs/nvfp4/inference_nvfp4.yaml
219
+ ```
220
+
221
+ Or use the helper script, which reads `NUM_GPUS` / `num_gpus` when provided:
222
+
223
+ ```bash
224
+ scripts/inference_nvfp4.sh configs/nvfp4/inference_nvfp4.yaml
225
+ ```
226
+
227
+ Outputs are written to `output_folder`.
228
+
229
+ ## Notes
230
+
231
+ - This model card is for the **4-step** NVFP4 checkpoint. Use
232
+ `inference.sampling_steps: 4`.
233
+ - `model_quant` enables NVFP4 generator inference.
234
+ - `inference.kv_quant` enables FP4 KV-cache storage and requires the
235
+ `utils/kernel` extension.
236
+ - `inference.multi_shot_sink` enables the multi-shot attention sink.
237
+ - `inference.multi_shot_rope_offset` controls the multi-shot RoPE offset.
238
+ - `inference.streaming_vae`, `inference.async_vae`, `inference.vae_type`, and
239
+ `inference.vae_device` control streaming or asynchronous VAE decode.
240
+
241
+ ## License/Terms of Use
242
+
243
+ GOVERNING TERMS: This trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf). Use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
244
+
245
+ ## Citation
246
+
247
+ ```bibtex
248
+ @article{longlive_2,
249
+ title={LongLive2.0: An NVFP4 Parallel Infrastructure for Long Video Generation},
250
+ author={Chen, Yukang and Wang, Luozhou and Huang, Wei and Yang, Shuai and Zhang, Bohan and Xiao, Yicheng and Chu, Ruihang and Mao, Weian and Hu, Qixin and Liu, Shaoteng and Zhao, Yuyang and Mao, Huizi and Chen, Ying-Cong and Xie, Enze and Qi, Xiaojuan and Han, Song},
251
+ journal={arXiv preprint arXiv},
252
+ year={2026}
253
+ }
254
+ ```