Instructions to use Leon1000/Ltx2.3-VBVR-lora-I2V with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Leon1000/Ltx2.3-VBVR-lora-I2V with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Lightricks/LTX-2.3", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("Leon1000/Ltx2.3-VBVR-lora-I2V") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
File size: 5,043 Bytes
8f3afe1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | ---
license: other
license_name: ltx-2-community-license-agreement
license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE
language:
- en
- zh
library_name: diffusers
tags:
- video-generation
- video-reasoning
- logical-reasoning
- lora
- ltx-2.3
base_model:
- Lightricks/LTX-2.3
---
# LTX-2 VBVR LoRA - Video Reasoning
LoRA fine-tuned weights for LTX-2.3 22B on the VBVR (A Very Big Video Reasoning Suite) dataset.
## Training Data
**To ensure training quality, we preprocessed the full 1,000,000 videos from the official dataset and randomly sample during training to maintain data diversity. We adopt the official parameters with batch_size=16 and rank=32 to prevent catastrophic forgetting caused by excessively large rank.**
The VBVR dataset contains 100 reasoning task categories, with ~10,000 variants per task, totaling ~1M videos. Main task types include:
- **Object Trajectory**: Objects moving to target positions
- **Physical Reasoning**: Rolling balls, collisions, gravity
- **Causal Relationships**: Conditional triggers, chain reactions
- **Spatial Relationships**: Relative positions, path planning
## Model Details
| Item | Details |
|------|---------|
| Base Model | ltx-2.3-22b-dev |
| Training Method | LoRA Fine-tuning |
| LoRA Rank | 32 |
| Effective Batch Size | 16 |
| Mixed Precision | BF16 |
## TODO List
### Dataset Release Plan
| Dataset | Videos | Status |
|---------|--------|--------|
| VBVR-96K | 96,000 | ✅ Released |
| VBVR-240K | 240,000 | ✅ Released |
| VBVR-reinforce | 240K+150K | ✅ Released |
## LoRA Capabilities
This LoRA adapter enhances the base LTX-2 model for production video generation workflows:
- **Enhanced Complex Prompt Understanding**: Accurately interprets multi-object, multi-condition prompts with detailed spatial descriptions and temporal sequences, reducing prompt misinterpretation in production scenarios.
- **Improved Motion Dynamics**: Generates smooth, physically plausible object movements with natural acceleration, deceleration, and trajectory curves, avoiding robotic or unnatural motion patterns.
- **Temporal Consistency**: Maintains object appearance, lighting, and scene coherence throughout the video sequence, reducing flickering and frame-to-frame artifacts common in generated videos.
- **Precise Timing Control**: Enables accurate control over action duration, pacing, and synchronization between multiple moving elements based on prompt semantics.
- **Multi-Object Interaction**: Handles complex scenes with multiple objects interacting simultaneously, including collisions, following, avoiding, and coordinated movements.
- **Camera and Framing Stability**: Maintains consistent camera perspective and framing throughout the sequence, avoiding unwanted camera shake or unexpected viewpoint changes.
## Training Configuration
### Stage 1: VBVR Foundation (96K)
| Config | Value |
|--------|-------|
| Dataset | 96K VBVR general videos |
| Learning Rate | 1e-4 |
| Scheduler | Cosine |
| Batch Size | 1 × 16 (gradient accumulation) |
| Optimizer | AdamW |
| Max Grad Norm | 1.0 |
| Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` |
### Stage 2: VBVR Extended (240K)
| Config | Value |
|--------|-------|
| Dataset | 240K general videos |
| Learning Rate | 1e-4 |
| Scheduler | Cosine |
| Batch Size | 1 × 16 (gradient accumulation) |
| Optimizer | AdamW |
| Max Grad Norm | 1.0 |
| Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` |
### Stage 3: General + Hard Reasoning (490K)
| Config | Value |
|--------|-------|
| Dataset | 240K general videos + 150K high-difficulty reasoning videos |
| Learning Rate | 5e-5 |
| Scheduler | Cosine |
| Batch Size | 1 × 16 (gradient accumulation) |
| Optimizer | AdamW |
| Max Grad Norm | 1.0 |
| Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0` (FFN frozen) |
## Video Demo
### Training Progress Comparison
<div style="display: flex; gap: 10px; flex-wrap: wrap;">
<div style="flex: 1; min-width: 300px;">
<video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/original01.mp4" controls style="width: 100%;"></video>
<p style="text-align: center; margin: 8px 0 0 0;"><strong>Original Model</strong></p>
</div>
<div style="flex: 1; min-width: 300px;">
<video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/240K.mp4" controls style="width: 100%;"></video>
<p style="text-align: center; margin: 8px 0 0 0;"><strong>240K</strong></p>
</div>
<div style="flex: 1; min-width: 300px;">
<video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/S3.mp4" controls style="width: 100%;"></video>
<p style="text-align: center; margin: 8px 0 0 0;"><strong>390K</strong></p>
</div>
</div>
## Dataset
This model is trained on the VBVR (Video Benchmark for Video Reasoning) dataset from [video-reason.com](https://video-reason.com/).
## Contact
For questions or suggestions, please open an issue on Hugging Face or contact the author directly. |