---
license: other
license_name: nvidia-open-model-license
license_link: >-
  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
pipeline_tag: text-to-video
tags:
  - text-to-video
  - multi-shot
  - NVFP4
  - video-generation
  - diffusion
  - long-video
  - longlive2
  - wan2.2
---

<p align="center">
  <img src="logo.png" alt="LongLive2.0 logo" width="100%">
</p>

# LongLive2.0 5B NVFP4 Denoising Step 2

[![Paper](https://img.shields.io/badge/ArXiv-Paper-brown)](https://arxiv.org/abs/2605.18739)
[![Code](https://img.shields.io/badge/GitHub-Code-blue)](https://github.com/NVlabs/LongLive)
[![Video](https://img.shields.io/badge/YouTube-Video-red)](https://www.youtube.com/watch?v=7oQALy32fiU)
[![Models](https://img.shields.io/badge/Model-BF16-yellow)](https://huggingface.co/Efficient-Large-Model/LongLive-2.0-5B)
[![Models](https://img.shields.io/badge/Model-NVFP4-orange)](https://huggingface.co/Efficient-Large-Model/LongLive-2.0-5B-NVFP4-S4)
[![Demo](https://img.shields.io/badge/Demo-Page-brightgreen)](https://nvlabs.github.io/LongLive/LongLive2/)
[![Docs](https://img.shields.io/badge/Full-Documentation-green)](https://nvlabs.github.io/LongLive/LongLive2/docs/)

This repository hosts the LongLive2.0 5B NVFP4 denoising step 2 checkpoint for inference
with the LongLive2.0 release code:

https://github.com/NVlabs/LongLive

LongLive2.0 inference loads the Wan2.2-TI2V-5B generator, applies the
few-step DMD adapter when a separate LoRA checkpoint is provided, and runs the
generator with NVFP4 weight quantization plus optional FP4 KV-cache
quantization.

## Installation

The NVFP4 path uses a stricter environment than the default BF16 release path.
We recommend keeping it in a separate conda environment.

```bash
git clone https://github.com/wileewang/LongLive2.0.git
cd LongLive2.0

conda create -n longlive2_nvfp4 python=3.12 -y
conda activate longlive2_nvfp4

pip install -r requirements.txt
pip install --upgrade --index-url https://download.pytorch.org/whl/cu128 \
  torch==2.10.0 torchvision==0.25.0
```

Build the NVFP4 / FP4 extensions:

```bash
cd fouroversix
pip install ninja packaging psutil "setuptools>=77.0.3"

# B200 / GB200 / GB300
export CUDA_ARCHS=100

# RTX 50/60 series, if needed
# export CUDA_ARCHS=120

pip install --no-build-isolation -e .
cd ..

git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.8.3
pip install -U pip setuptools wheel ninja packaging
pip install --no-build-isolation -e .
cd ..

cd utils/kernel
python setup.py build_ext --inplace
cd ../..
```

Quick environment check:

```bash
python -c "import torch, torchvision; print(torch.__version__, torch.version.cuda); print(torchvision.__version__)"
python -c "import flash_attn; print(flash_attn.__version__)"
python -c "import fouroversix; from utils.quant import LongLiveQuantizationConfig, quantize_to_fp4"
python -c "from utils.kernel.kv_dequant import dequantize_kv_cache_fp4"
```

The released LongLive2.0 checkpoint is sufficient for standard inference. You
only need to download the original Wan2.2-TI2V-5B components if you want to run
training, initialize from the original Wan weights, or use code paths that
explicitly load the base Wan model files:

```bash
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B \
  --local-dir wan_models/Wan2.2-TI2V-5B
```

Download this checkpoint repository:

```bash
huggingface-cli download Perflow-Shuai/LongLive-2.0-5B-NVFP4-2Step \
  --local-dir checkpoints/longlive2_5b_nvfp4_2step
```

## Configure Inference

Edit `configs/nvfp4/inference_nvfp4.yaml`.

For the released 2-step NVFP4 checkpoint, keep
`inference.sampling_steps: 2`:

```yaml
checkpoints:
  generator_ckpt: checkpoints/longlive2_5b_nvfp4_2step/path/to/generator.pt
  lora_ckpt: null

merge_lora: false

data:
  data_path: /path/to/inference_prompts
  image_or_video_shape:
  - 1
  - 384
  - 48
  - 44
  - 80

output_folder: videos/longlive2_nvfp4_2step
num_samples: 1
num_output_frames: 384

inference:
  sampling_steps: 2
  sink_size: 8
  guidance_scale: 1.0
  multi_shot_sink: true
  multi_shot_rope_offset: 8
  kv_quant: true
  kv_quant_scale_rule: mse
  kv_quant_backend: cuda
  streaming_vae: false
  async_vae: false
  vae_type: wan

model_quant: true
model_quant_use_transformer_engine: false
model_quant_scale_rule: mse
model_quant_activation_scale_rule: mse
model_quant_weight_scale_rule: mse
model_quant_gradient_scale_rule: mse
```

Replace the checkpoint filename above with the actual file in this repository.
If this repository contains a separate DMD LoRA checkpoint instead of a merged
generator, set `checkpoints.lora_ckpt` to that LoRA file and set
`merge_lora: true`, then add the LoRA adapter config:

```yaml
adapter:
  type: lora
  rank: 128
  alpha: 128
  dropout: 0.0
  dtype: bfloat16
  apply_to_critic: true
  verbose: true
```

If `checkpoints.lora_ckpt` is `null`, remove the `adapter` section.

Do not set `model_quant_use_transformer_engine: true` when loading a FourOverSix
materialized NVFP4 checkpoint. FourOverSix checkpoints store
`quantized_weight_*` buffers and should be loaded through the FourOverSix path.

## Prompt Folder

`data.data_path` can be either:

- a `.txt` file, where each line is one single-shot prompt; or
- a directory of multi-shot prompt folders.

Example multi-shot prompt folder:

```text
inference_prompts/
  robot_lab_demo/
    0.json
    1.json
    2.json
    shot_durations.txt
```

Each JSON file contains:

```json
{
  "caption": "A compact silver robot with one blue optic explores a clean robotics lab."
}
```

`shot_durations.txt` is optional. If provided, each number is the number of
temporal chunks assigned to the corresponding caption, for example:

```text
2 2 4
```

## Run

Single node, 4 GPUs:

```bash
torchrun --standalone --nnodes=1 --nproc_per_node=4 inference.py \
  --config_path configs/nvfp4/inference_nvfp4.yaml
```

Single GPU:

```bash
python inference.py --config_path configs/nvfp4/inference_nvfp4.yaml
```

Or use the helper script, which reads `NUM_GPUS` / `num_gpus` when provided:

```bash
scripts/inference_nvfp4.sh configs/nvfp4/inference_nvfp4.yaml
```

Outputs are written to `output_folder`.

## Notes

- This model card is for the **2-step** NVFP4 checkpoint. Use
  `inference.sampling_steps: 2`.
- `model_quant` enables NVFP4 generator inference.
- `inference.kv_quant` enables FP4 KV-cache storage and requires the
  `utils/kernel` extension.
- `inference.multi_shot_sink` enables the multi-shot attention sink.
- `inference.multi_shot_rope_offset` controls the multi-shot RoPE offset.
- `inference.streaming_vae`, `inference.async_vae`, `inference.vae_type`, and
  `inference.vae_device` control streaming or asynchronous VAE decode.

## License/Terms of Use

GOVERNING TERMS: This trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf). Use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).

## Citation

```bibtex
@article{longlive_2,
  title={LongLive2.0: An NVFP4 Parallel Infrastructure for Long Video Generation},
  author={Chen, Yukang and Wang, Luozhou and Huang, Wei and Yang, Shuai and Zhang, Bohan and Xiao, Yicheng and Chu, Ruihang and Mao, Weian and Hu, Qixin and Liu, Shaoteng and Zhao, Yuyang and Mao, Huizi and Chen, Ying-Cong and Xie, Enze and Qi, Xiaojuan and Han, Song},
  journal={arXiv preprint arXiv},
  year={2026}
}
```