Instructions to use Leon1000/Ltx2.3-VBVR-lora-I2V with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Leon1000/Ltx2.3-VBVR-lora-I2V with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Lightricks/LTX-2.3", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("Leon1000/Ltx2.3-VBVR-lora-I2V") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
Commit ·
8f3afe1
0
Parent(s):
Duplicate from LiconStudio/Ltx2.3-VBVR-lora-I2V
Browse filesCo-authored-by: LiconStudio <LiconStudio@users.noreply.huggingface.co>
- .gitattributes +43 -0
- 240K.mp4 +3 -0
- Ltx2.3-Licon-VBVR-I2V-240K-R32.safetensors +3 -0
- Ltx2.3-Licon-VBVR-I2V-390K-R32.safetensors +3 -0
- Ltx2.3-Licon-VBVR-I2V-96000-R32.safetensors +3 -0
- README.md +136 -0
- S3.mp4 +3 -0
- VBVR-official-comfyui.safetensors +3 -0
- original01.mp4 +3 -0
.gitattributes
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
step_000000_1.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
step_006000_1.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
15000_01.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
official02.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
original.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
240K.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 42 |
+
original01.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 43 |
+
S3.mp4 filter=lfs diff=lfs merge=lfs -text
|
240K.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1e058a1890a47b23ada2264201ec8f33b12f070ecadeb55152c9283ea589a409
|
| 3 |
+
size 1048149
|
Ltx2.3-Licon-VBVR-I2V-240K-R32.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9b242af2c92a077a48138c607fffe865defed75f5e1eb23cb77135323513ce21
|
| 3 |
+
size 554006432
|
Ltx2.3-Licon-VBVR-I2V-390K-R32.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4cb77bc088b04fd69fe5a711356aec05e1a9d503b47e6e252a7f2577c7716a18
|
| 3 |
+
size 554006432
|
Ltx2.3-Licon-VBVR-I2V-96000-R32.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ec5ff61d3e2959babf01112a8d7f00776273415573aa29a82f2af00070fd1408
|
| 3 |
+
size 554006432
|
README.md
ADDED
|
@@ -0,0 +1,136 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
license_name: ltx-2-community-license-agreement
|
| 4 |
+
license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
- zh
|
| 8 |
+
library_name: diffusers
|
| 9 |
+
tags:
|
| 10 |
+
- video-generation
|
| 11 |
+
- video-reasoning
|
| 12 |
+
- logical-reasoning
|
| 13 |
+
- lora
|
| 14 |
+
- ltx-2.3
|
| 15 |
+
base_model:
|
| 16 |
+
- Lightricks/LTX-2.3
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
# LTX-2 VBVR LoRA - Video Reasoning
|
| 20 |
+
|
| 21 |
+
LoRA fine-tuned weights for LTX-2.3 22B on the VBVR (A Very Big Video Reasoning Suite) dataset.
|
| 22 |
+
|
| 23 |
+
## Training Data
|
| 24 |
+
|
| 25 |
+
**To ensure training quality, we preprocessed the full 1,000,000 videos from the official dataset and randomly sample during training to maintain data diversity. We adopt the official parameters with batch_size=16 and rank=32 to prevent catastrophic forgetting caused by excessively large rank.**
|
| 26 |
+
|
| 27 |
+
The VBVR dataset contains 100 reasoning task categories, with ~10,000 variants per task, totaling ~1M videos. Main task types include:
|
| 28 |
+
|
| 29 |
+
- **Object Trajectory**: Objects moving to target positions
|
| 30 |
+
- **Physical Reasoning**: Rolling balls, collisions, gravity
|
| 31 |
+
- **Causal Relationships**: Conditional triggers, chain reactions
|
| 32 |
+
- **Spatial Relationships**: Relative positions, path planning
|
| 33 |
+
|
| 34 |
+
## Model Details
|
| 35 |
+
|
| 36 |
+
| Item | Details |
|
| 37 |
+
|------|---------|
|
| 38 |
+
| Base Model | ltx-2.3-22b-dev |
|
| 39 |
+
| Training Method | LoRA Fine-tuning |
|
| 40 |
+
| LoRA Rank | 32 |
|
| 41 |
+
| Effective Batch Size | 16 |
|
| 42 |
+
| Mixed Precision | BF16 |
|
| 43 |
+
|
| 44 |
+
## TODO List
|
| 45 |
+
|
| 46 |
+
### Dataset Release Plan
|
| 47 |
+
|
| 48 |
+
| Dataset | Videos | Status |
|
| 49 |
+
|---------|--------|--------|
|
| 50 |
+
| VBVR-96K | 96,000 | ✅ Released |
|
| 51 |
+
| VBVR-240K | 240,000 | ✅ Released |
|
| 52 |
+
| VBVR-reinforce | 240K+150K | ✅ Released |
|
| 53 |
+
|
| 54 |
+
## LoRA Capabilities
|
| 55 |
+
|
| 56 |
+
This LoRA adapter enhances the base LTX-2 model for production video generation workflows:
|
| 57 |
+
|
| 58 |
+
- **Enhanced Complex Prompt Understanding**: Accurately interprets multi-object, multi-condition prompts with detailed spatial descriptions and temporal sequences, reducing prompt misinterpretation in production scenarios.
|
| 59 |
+
|
| 60 |
+
- **Improved Motion Dynamics**: Generates smooth, physically plausible object movements with natural acceleration, deceleration, and trajectory curves, avoiding robotic or unnatural motion patterns.
|
| 61 |
+
|
| 62 |
+
- **Temporal Consistency**: Maintains object appearance, lighting, and scene coherence throughout the video sequence, reducing flickering and frame-to-frame artifacts common in generated videos.
|
| 63 |
+
|
| 64 |
+
- **Precise Timing Control**: Enables accurate control over action duration, pacing, and synchronization between multiple moving elements based on prompt semantics.
|
| 65 |
+
|
| 66 |
+
- **Multi-Object Interaction**: Handles complex scenes with multiple objects interacting simultaneously, including collisions, following, avoiding, and coordinated movements.
|
| 67 |
+
|
| 68 |
+
- **Camera and Framing Stability**: Maintains consistent camera perspective and framing throughout the sequence, avoiding unwanted camera shake or unexpected viewpoint changes.
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
## Training Configuration
|
| 72 |
+
|
| 73 |
+
### Stage 1: VBVR Foundation (96K)
|
| 74 |
+
| Config | Value |
|
| 75 |
+
|--------|-------|
|
| 76 |
+
| Dataset | 96K VBVR general videos |
|
| 77 |
+
| Learning Rate | 1e-4 |
|
| 78 |
+
| Scheduler | Cosine |
|
| 79 |
+
| Batch Size | 1 × 16 (gradient accumulation) |
|
| 80 |
+
| Optimizer | AdamW |
|
| 81 |
+
| Max Grad Norm | 1.0 |
|
| 82 |
+
| Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` |
|
| 83 |
+
|
| 84 |
+
### Stage 2: VBVR Extended (240K)
|
| 85 |
+
| Config | Value |
|
| 86 |
+
|--------|-------|
|
| 87 |
+
| Dataset | 240K general videos |
|
| 88 |
+
| Learning Rate | 1e-4 |
|
| 89 |
+
| Scheduler | Cosine |
|
| 90 |
+
| Batch Size | 1 × 16 (gradient accumulation) |
|
| 91 |
+
| Optimizer | AdamW |
|
| 92 |
+
| Max Grad Norm | 1.0 |
|
| 93 |
+
| Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` |
|
| 94 |
+
|
| 95 |
+
### Stage 3: General + Hard Reasoning (490K)
|
| 96 |
+
| Config | Value |
|
| 97 |
+
|--------|-------|
|
| 98 |
+
| Dataset | 240K general videos + 150K high-difficulty reasoning videos |
|
| 99 |
+
| Learning Rate | 5e-5 |
|
| 100 |
+
| Scheduler | Cosine |
|
| 101 |
+
| Batch Size | 1 × 16 (gradient accumulation) |
|
| 102 |
+
| Optimizer | AdamW |
|
| 103 |
+
| Max Grad Norm | 1.0 |
|
| 104 |
+
| Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0` (FFN frozen) |
|
| 105 |
+
|
| 106 |
+
## Video Demo
|
| 107 |
+
|
| 108 |
+
### Training Progress Comparison
|
| 109 |
+
|
| 110 |
+
<div style="display: flex; gap: 10px; flex-wrap: wrap;">
|
| 111 |
+
|
| 112 |
+
<div style="flex: 1; min-width: 300px;">
|
| 113 |
+
<video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/original01.mp4" controls style="width: 100%;"></video>
|
| 114 |
+
<p style="text-align: center; margin: 8px 0 0 0;"><strong>Original Model</strong></p>
|
| 115 |
+
</div>
|
| 116 |
+
|
| 117 |
+
<div style="flex: 1; min-width: 300px;">
|
| 118 |
+
<video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/240K.mp4" controls style="width: 100%;"></video>
|
| 119 |
+
<p style="text-align: center; margin: 8px 0 0 0;"><strong>240K</strong></p>
|
| 120 |
+
</div>
|
| 121 |
+
|
| 122 |
+
<div style="flex: 1; min-width: 300px;">
|
| 123 |
+
<video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/S3.mp4" controls style="width: 100%;"></video>
|
| 124 |
+
<p style="text-align: center; margin: 8px 0 0 0;"><strong>390K</strong></p>
|
| 125 |
+
</div>
|
| 126 |
+
|
| 127 |
+
</div>
|
| 128 |
+
|
| 129 |
+
## Dataset
|
| 130 |
+
|
| 131 |
+
This model is trained on the VBVR (Video Benchmark for Video Reasoning) dataset from [video-reason.com](https://video-reason.com/).
|
| 132 |
+
|
| 133 |
+
|
| 134 |
+
## Contact
|
| 135 |
+
|
| 136 |
+
For questions or suggestions, please open an issue on Hugging Face or contact the author directly.
|
S3.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:795a0b2a0e26ceef3cb458c18cb9747f71907f98208843588efe6e7607e5a73f
|
| 3 |
+
size 1086541
|
VBVR-official-comfyui.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:09583c64089a23f6998d2619c55aa332387859962df6eeef48a73c518e73213c
|
| 3 |
+
size 428150664
|
original01.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d3ab3b8a278805971462b9ac7bf6f98eb65d5da9e72a17eb3ed60fcf7a6a3be0
|
| 3 |
+
size 1321129
|