Update README.md

9ab6c9c verified 1 day ago

7.59 kB

	---
	license: other
	license_name: nvidia-open-model-license
	license_link: >-
	https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
	pipeline_tag: text-to-video
	tags:
	- text-to-video
	- multi-shot
	- NVFP4
	- video-generation
	- diffusion
	- long-video
	- longlive2
	- wan2.2
	---

	<p align="center">
	<img src="logo.png" alt="LongLive2.0 logo" width="100%">
	</p>

	# LongLive2.0 5B NVFP4 Denoising Step 2

	[![Paper](https://img.shields.io/badge/ArXiv-Paper-brown)](https://arxiv.org/abs/2605.18739)
	[![Code](https://img.shields.io/badge/GitHub-Code-blue)](https://github.com/NVlabs/LongLive)
	[![Video](https://img.shields.io/badge/YouTube-Video-red)](https://www.youtube.com/watch?v=7oQALy32fiU)
	[![Models](https://img.shields.io/badge/Model-BF16-yellow)](https://huggingface.co/Efficient-Large-Model/LongLive-2.0-5B)
	[![Models](https://img.shields.io/badge/Model-NVFP4-orange)](https://huggingface.co/Efficient-Large-Model/LongLive-2.0-5B-NVFP4-S4)
	[![Demo](https://img.shields.io/badge/Demo-Page-brightgreen)](https://nvlabs.github.io/LongLive/LongLive2/)
	[![Docs](https://img.shields.io/badge/Full-Documentation-green)](https://nvlabs.github.io/LongLive/LongLive2/docs/)

	This repository hosts the LongLive2.0 5B NVFP4 denoising step 2 checkpoint for inference
	with the LongLive2.0 release code:

	https://github.com/NVlabs/LongLive

	LongLive2.0 inference loads the Wan2.2-TI2V-5B generator, applies the
	few-step DMD adapter when a separate LoRA checkpoint is provided, and runs the
	generator with NVFP4 weight quantization plus optional FP4 KV-cache
	quantization.

	## Installation

	The NVFP4 path uses a stricter environment than the default BF16 release path.
	We recommend keeping it in a separate conda environment.

	```bash
	git clone https://github.com/wileewang/LongLive2.0.git
	cd LongLive2.0

	conda create -n longlive2_nvfp4 python=3.12 -y
	conda activate longlive2_nvfp4

	pip install -r requirements.txt
	pip install --upgrade --index-url https://download.pytorch.org/whl/cu128 \
	torch==2.10.0 torchvision==0.25.0
	```

	Build the NVFP4 / FP4 extensions:

	```bash
	cd fouroversix
	pip install ninja packaging psutil "setuptools>=77.0.3"

	# B200 / GB200 / GB300
	export CUDA_ARCHS=100

	# RTX 50/60 series, if needed
	# export CUDA_ARCHS=120

	pip install --no-build-isolation -e .
	cd ..

	git clone https://github.com/Dao-AILab/flash-attention.git
	cd flash-attention
	git checkout v2.8.3
	pip install -U pip setuptools wheel ninja packaging
	pip install --no-build-isolation -e .
	cd ..

	cd utils/kernel
	python setup.py build_ext --inplace
	cd ../..
	```

	Quick environment check:

	```bash
	python -c "import torch, torchvision; print(torch.__version__, torch.version.cuda); print(torchvision.__version__)"
	python -c "import flash_attn; print(flash_attn.__version__)"
	python -c "import fouroversix; from utils.quant import LongLiveQuantizationConfig, quantize_to_fp4"
	python -c "from utils.kernel.kv_dequant import dequantize_kv_cache_fp4"
	```

	The released LongLive2.0 checkpoint is sufficient for standard inference. You
	only need to download the original Wan2.2-TI2V-5B components if you want to run
	training, initialize from the original Wan weights, or use code paths that
	explicitly load the base Wan model files:

	```bash
	huggingface-cli download Wan-AI/Wan2.2-TI2V-5B \
	--local-dir wan_models/Wan2.2-TI2V-5B
	```

	Download this checkpoint repository:

	```bash
	huggingface-cli download Perflow-Shuai/LongLive-2.0-5B-NVFP4-2Step \
	--local-dir checkpoints/longlive2_5b_nvfp4_2step
	```

	## Configure Inference

	Edit `configs/nvfp4/inference_nvfp4.yaml`.

	For the released 2-step NVFP4 checkpoint, keep
	`inference.sampling_steps: 2`:

	```yaml
	checkpoints:
	generator_ckpt: checkpoints/longlive2_5b_nvfp4_2step/path/to/generator.pt
	lora_ckpt: null

	merge_lora: false

	data:
	data_path: /path/to/inference_prompts
	image_or_video_shape:
	- 1
	- 384
	- 48
	- 44
	- 80

	output_folder: videos/longlive2_nvfp4_2step
	num_samples: 1
	num_output_frames: 384

	inference:
	sampling_steps: 2
	sink_size: 8
	guidance_scale: 1.0
	multi_shot_sink: true
	multi_shot_rope_offset: 8
	kv_quant: true
	kv_quant_scale_rule: mse
	kv_quant_backend: cuda
	streaming_vae: false
	async_vae: false
	vae_type: wan

	model_quant: true
	model_quant_use_transformer_engine: false
	model_quant_scale_rule: mse
	model_quant_activation_scale_rule: mse
	model_quant_weight_scale_rule: mse
	model_quant_gradient_scale_rule: mse
	```

	Replace the checkpoint filename above with the actual file in this repository.
	If this repository contains a separate DMD LoRA checkpoint instead of a merged
	generator, set `checkpoints.lora_ckpt` to that LoRA file and set
	`merge_lora: true`, then add the LoRA adapter config:

	```yaml
	adapter:
	type: lora
	rank: 128
	alpha: 128
	dropout: 0.0
	dtype: bfloat16
	apply_to_critic: true
	verbose: true
	```

	If `checkpoints.lora_ckpt` is `null`, remove the `adapter` section.

	Do not set `model_quant_use_transformer_engine: true` when loading a FourOverSix
	materialized NVFP4 checkpoint. FourOverSix checkpoints store
	`quantized_weight_*` buffers and should be loaded through the FourOverSix path.

	## Prompt Folder

	`data.data_path` can be either:

	- a `.txt` file, where each line is one single-shot prompt; or
	- a directory of multi-shot prompt folders.

	Example multi-shot prompt folder:

	```text
	inference_prompts/
	robot_lab_demo/
	0.json
	1.json
	2.json
	shot_durations.txt
	```

	Each JSON file contains:

	```json
	{
	"caption": "A compact silver robot with one blue optic explores a clean robotics lab."
	}
	```

	`shot_durations.txt` is optional. If provided, each number is the number of
	temporal chunks assigned to the corresponding caption, for example:

	```text
	2 2 4
	```

	## Run

	Single node, 4 GPUs:

	```bash
	torchrun --standalone --nnodes=1 --nproc_per_node=4 inference.py \
	--config_path configs/nvfp4/inference_nvfp4.yaml
	```

	Single GPU:

	```bash
	python inference.py --config_path configs/nvfp4/inference_nvfp4.yaml
	```

	Or use the helper script, which reads `NUM_GPUS` / `num_gpus` when provided:

	```bash
	scripts/inference_nvfp4.sh configs/nvfp4/inference_nvfp4.yaml
	```

	Outputs are written to `output_folder`.

	## Notes

	- This model card is for the 2-step NVFP4 checkpoint. Use
	`inference.sampling_steps: 2`.
	- `model_quant` enables NVFP4 generator inference.
	- `inference.kv_quant` enables FP4 KV-cache storage and requires the
	`utils/kernel` extension.
	- `inference.multi_shot_sink` enables the multi-shot attention sink.
	- `inference.multi_shot_rope_offset` controls the multi-shot RoPE offset.
	- `inference.streaming_vae`, `inference.async_vae`, `inference.vae_type`, and
	`inference.vae_device` control streaming or asynchronous VAE decode.

	## License/Terms of Use

	GOVERNING TERMS: This trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf). Use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).

	## Citation

	```bibtex
	@article{longlive_2,
	title={LongLive2.0: An NVFP4 Parallel Infrastructure for Long Video Generation},
	author={Chen, Yukang and Wang, Luozhou and Huang, Wei and Yang, Shuai and Zhang, Bohan and Xiao, Yicheng and Chu, Ruihang and Mao, Weian and Hu, Qixin and Liu, Shaoteng and Zhao, Yuyang and Mao, Huizi and Chen, Ying-Cong and Xie, Enze and Qi, Xiaojuan and Han, Song},
	journal={arXiv preprint arXiv},
	year={2026}
	}
	```