Instructions to use Perflow-Shuai/LongLive-2.0-5B-NVFP4-2Step with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Wan2.2
How to use Perflow-Shuai/LongLive-2.0-5B-NVFP4-2Step with Wan2.2:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| pipeline_tag: text-to-video | |
| tags: | |
| - text-to-video | |
| - video-generation | |
| - diffusion | |
| - long-video | |
| - longlive2 | |
| - wan2.2 | |
| - nvfp4 | |
| - 2-step | |
| # LongLive2.0 5B NVFP4 2-Step Checkpoint | |
| This repository hosts the LongLive2.0 5B NVFP4 2-step checkpoint for inference | |
| with the LongLive2.0 release code: | |
| https://github.com/wileewang/LongLive2.0 | |
| LongLive2.0 inference loads the Wan2.2-TI2V-5B generator, applies the | |
| few-step DMD adapter when a separate LoRA checkpoint is provided, and runs the | |
| generator with NVFP4 weight quantization plus optional FP4 KV-cache | |
| quantization. | |
| ## Installation | |
| The NVFP4 path uses a stricter environment than the default BF16 release path. | |
| We recommend keeping it in a separate conda environment. | |
| ```bash | |
| git clone https://github.com/wileewang/LongLive2.0.git | |
| cd LongLive2.0 | |
| conda create -n longlive2_nvfp4 python=3.12 -y | |
| conda activate longlive2_nvfp4 | |
| pip install -r requirements.txt | |
| pip install --upgrade --index-url https://download.pytorch.org/whl/cu128 \ | |
| torch==2.10.0 torchvision==0.25.0 | |
| ``` | |
| Build the NVFP4 / FP4 extensions: | |
| ```bash | |
| cd fouroversix | |
| pip install ninja packaging psutil "setuptools>=77.0.3" | |
| # B200 / GB200 / GB300 | |
| export CUDA_ARCHS=100 | |
| # RTX 50/60 series, if needed | |
| # export CUDA_ARCHS=120 | |
| pip install --no-build-isolation -e . | |
| cd .. | |
| git clone https://github.com/Dao-AILab/flash-attention.git | |
| cd flash-attention | |
| git checkout v2.8.3 | |
| pip install -U pip setuptools wheel ninja packaging | |
| pip install --no-build-isolation -e . | |
| cd .. | |
| cd utils/kernel | |
| python setup.py build_ext --inplace | |
| cd ../.. | |
| ``` | |
| Quick environment check: | |
| ```bash | |
| python -c "import torch, torchvision; print(torch.__version__, torch.version.cuda); print(torchvision.__version__)" | |
| python -c "import flash_attn; print(flash_attn.__version__)" | |
| python -c "import fouroversix; from utils.quant import LongLiveQuantizationConfig, quantize_to_fp4" | |
| python -c "from utils.kernel.kv_dequant import dequantize_kv_cache_fp4" | |
| ``` | |
| The released LongLive2.0 checkpoint is sufficient for standard inference. You | |
| only need to download the original Wan2.2-TI2V-5B components if you want to run | |
| training, initialize from the original Wan weights, or use code paths that | |
| explicitly load the base Wan model files: | |
| ```bash | |
| huggingface-cli download Wan-AI/Wan2.2-TI2V-5B \ | |
| --local-dir wan_models/Wan2.2-TI2V-5B | |
| ``` | |
| Download this checkpoint repository: | |
| ```bash | |
| huggingface-cli download Perflow-Shuai/LongLive-2.0-5B-NVFP4-2Step \ | |
| --local-dir checkpoints/longlive2_5b_nvfp4_2step | |
| ``` | |
| ## Configure Inference | |
| Edit `configs/nvfp4/inference_nvfp4.yaml`. | |
| For the released 2-step NVFP4 checkpoint, keep | |
| `inference.sampling_steps: 2`: | |
| ```yaml | |
| checkpoints: | |
| generator_ckpt: checkpoints/longlive2_5b_nvfp4_2step/path/to/generator.pt | |
| lora_ckpt: null | |
| merge_lora: false | |
| data: | |
| data_path: /path/to/inference_prompts | |
| image_or_video_shape: | |
| - 1 | |
| - 384 | |
| - 48 | |
| - 44 | |
| - 80 | |
| output_folder: videos/longlive2_nvfp4_2step | |
| num_samples: 1 | |
| num_output_frames: 384 | |
| inference: | |
| sampling_steps: 2 | |
| sink_size: 8 | |
| guidance_scale: 1.0 | |
| multi_shot_sink: true | |
| multi_shot_rope_offset: 8 | |
| kv_quant: true | |
| kv_quant_scale_rule: mse | |
| kv_quant_backend: cuda | |
| streaming_vae: false | |
| async_vae: false | |
| vae_type: wan | |
| model_quant: true | |
| model_quant_use_transformer_engine: false | |
| model_quant_scale_rule: mse | |
| model_quant_activation_scale_rule: mse | |
| model_quant_weight_scale_rule: mse | |
| model_quant_gradient_scale_rule: mse | |
| ``` | |
| Replace the checkpoint filename above with the actual file in this repository. | |
| If this repository contains a separate DMD LoRA checkpoint instead of a merged | |
| generator, set `checkpoints.lora_ckpt` to that LoRA file and set | |
| `merge_lora: true`, then add the LoRA adapter config: | |
| ```yaml | |
| adapter: | |
| type: lora | |
| rank: 128 | |
| alpha: 128 | |
| dropout: 0.0 | |
| dtype: bfloat16 | |
| apply_to_critic: true | |
| verbose: true | |
| ``` | |
| If `checkpoints.lora_ckpt` is `null`, remove the `adapter` section. | |
| Do not set `model_quant_use_transformer_engine: true` when loading a FourOverSix | |
| materialized NVFP4 checkpoint. FourOverSix checkpoints store | |
| `quantized_weight_*` buffers and should be loaded through the FourOverSix path. | |
| ## Prompt Folder | |
| `data.data_path` can be either: | |
| - a `.txt` file, where each line is one single-shot prompt; or | |
| - a directory of multi-shot prompt folders. | |
| Example multi-shot prompt folder: | |
| ```text | |
| inference_prompts/ | |
| robot_lab_demo/ | |
| 0.json | |
| 1.json | |
| 2.json | |
| shot_durations.txt | |
| ``` | |
| Each JSON file contains: | |
| ```json | |
| { | |
| "caption": "A compact silver robot with one blue optic explores a clean robotics lab." | |
| } | |
| ``` | |
| `shot_durations.txt` is optional. If provided, each number is the number of | |
| temporal chunks assigned to the corresponding caption, for example: | |
| ```text | |
| 2 2 4 | |
| ``` | |
| ## Run | |
| Single node, 4 GPUs: | |
| ```bash | |
| torchrun --standalone --nnodes=1 --nproc_per_node=4 inference.py \ | |
| --config_path configs/nvfp4/inference_nvfp4.yaml | |
| ``` | |
| Single GPU: | |
| ```bash | |
| python inference.py --config_path configs/nvfp4/inference_nvfp4.yaml | |
| ``` | |
| Or use the helper script, which reads `NUM_GPUS` / `num_gpus` when provided: | |
| ```bash | |
| scripts/inference_nvfp4.sh configs/nvfp4/inference_nvfp4.yaml | |
| ``` | |
| Outputs are written to `output_folder`. | |
| ## Notes | |
| - This model card is for the **2-step** NVFP4 checkpoint. Use | |
| `inference.sampling_steps: 2`. | |
| - `model_quant` enables NVFP4 generator inference. | |
| - `inference.kv_quant` enables FP4 KV-cache storage and requires the | |
| `utils/kernel` extension. | |
| - `inference.multi_shot_sink` enables the multi-shot attention sink. | |
| - `inference.multi_shot_rope_offset` controls the multi-shot RoPE offset. | |
| - `inference.streaming_vae`, `inference.async_vae`, `inference.vae_type`, and | |
| `inference.vae_device` control streaming or asynchronous VAE decode. | |