Efficient-Large-Model
/

LongLive-2.0-5B

+---
+license: other
+license_name: nvidia-open-model-license
+license_link: >-
+  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
+pipeline_tag: text-to-video
+tags:
+  - text-to-video
+  - multi-shot
+  - video-generation
+  - diffusion
+  - long-video
+  - longlive2
+  - wan2.2
+---
+<p align="center">
+  <img src="https://github.com/wileewang/LongLive2.0/blob/release-clean-merge/assets/longlive2/logo.png?raw=true" alt="LongLive2.0 logo" width="100%">
+</p>
+# LongLive2.0 5B Checkpoints
+This repository hosts temporary LongLive2.0 5B checkpoints for inference with
+the LongLive2.0 release code:
+https://github.com/wileewang/LongLive2.0
+The checkpoint package contains two parts:
+- **Base generator checkpoint**: the AR-trained Wan2.2-TI2V-5B generator.
+- **LoRA checkpoint**: the DMD-distilled few-step LoRA adapter.
+LongLive2.0 inference loads the base generator first, applies the LoRA modules,
+and then loads the LoRA weights.
+## Installation
+```bash
+git clone https://github.com/wileewang/LongLive2.0.git
+cd LongLive2.0
+conda create -n longlive2 python=3.10 -y
+conda activate longlive2
+pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
+pip install -r requirements.txt
+pip install flash-attn --no-build-isolation
+```
+The released LongLive2.0 checkpoint is sufficient for standard inference. You
+only need to download the original Wan2.2-TI2V-5B components if you want to run
+training, initialize from the original Wan weights, or use code paths that
+explicitly load the base Wan model files:
+```bash
+huggingface-cli download Wan-AI/Wan2.2-TI2V-5B \
+  --local-dir wan_models/Wan2.2-TI2V-5B
+```
+Download this checkpoint repository:
+```bash
+huggingface-cli download Perflow-Shuai/longlive_2.0_5B_tmp_20260507 \
+  --local-dir checkpoints/longlive2_5b
+```
+## Configure Inference
+Edit `configs/inference.yaml`:
+```yaml
+checkpoints:
+  generator_ckpt: checkpoints/longlive2_5b/path/to/base_generator.pt
+  lora_ckpt: checkpoints/longlive2_5b/path/to/dmd_lora.pt
+adapter:
+  type: lora
+  rank: 128
+  alpha: 128
+  dropout: 0.0
+  verbose: true
+data:
+  data_path: /path/to/inference_prompts
+output_folder: videos/longlive2
+num_samples: 1
+inference:
+  sampling_steps: 4
+  sink_size: 8
+  guidance_scale: 1.0
+  multi_shot_sink: true
+  multi_shot_rope_offset: 8
+```
+Replace the checkpoint filenames above with the actual files in this repository.
+If the LoRA checkpoint is not used, remove the `adapter` section and leave
+`lora_ckpt` unset.
+## Prompt Folder
+`data.data_path` is passed to `MultiTextConcatDataset` in `inference.py`. It can
+be either:
+- a `.txt` file, where each line is one single-shot prompt; or
+- a directory of multi-shot prompt folders.
+For a directory input, the code supports both of the following layouts. The
+direct caption-root layout is the simplest:
+```text
+inference_prompts/
+  robot_lab_demo/
+    0.json
+    1.json
+    2.json
+    shot_durations.txt
+```
+It also supports a dataset root with an outer `caption/` folder:
+```text
+inference_prompts/
+  caption/
+    robot_lab_demo/
+      0.json
+      1.json
+      2.json
+      shot_durations.txt
+```
+Each JSON file contains:
+```json
+{
+  "caption": "A compact silver robot with one blue optic explores a clean robotics lab."
+}
+```
+`shot_durations.txt` is optional. If provided, each number is the number of
+temporal chunks assigned to the corresponding caption, for example:
+```text
+2 2 4
+```
+## Run
+Single node, 8 GPUs:
+```bash
+torchrun --standalone --nnodes=1 --nproc_per_node=8 inference.py \
+  --config_path configs/inference.yaml
+```
+Single GPU:
+```bash
+python inference.py --config_path configs/inference.yaml
+```
+Outputs are written to `output_folder`.
+## Notes
+- The base checkpoint and LoRA checkpoint should be loaded together for the
+  few-step DMD model.
+- `inference.sampling_steps` controls the number of denoising steps.
+- `inference.multi_shot_sink` enables the multi-shot attention sink.
+- `inference.multi_shot_rope_offset` controls the multi-shot RoPE offset.
+- For NVFP4 inference, use the separate NVFP4 config and setup instructions in
+  the LongLive2.0 documentation.
+## License/Terms of Use
+GOVERNING TERMS: This trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf). Use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
+## Citation
+```bibtex
+@article{longlive_2,
+  title={LongLive2.0: An NVFP4 Parallel Infrastructure for Long Video Generation},
+  author={Chen, Yukang and Wang, Luozhou and Huang, Wei and Yang, Shuai and Zhang, Bohan and Xiao, Yicheng and Chu, Ruihang and Mao, Weian and Hu, Qixin and Liu, Shaoteng and Zhao, Yuyang and Mao, Huizi and Chen, Ying-Cong and Xie, Enze and Qi, Xiaojuan and Han, Song},
+  journal={arXiv preprint arXiv},
+  year={2026}
+}
+```