Instructions to use Leon1000/Ltx2.3-VBVR-lora-I2V with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Leon1000/Ltx2.3-VBVR-lora-I2V with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Lightricks/LTX-2.3", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("Leon1000/Ltx2.3-VBVR-lora-I2V") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
| license: other | |
| license_name: ltx-2-community-license-agreement | |
| license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE | |
| language: | |
| - en | |
| - zh | |
| library_name: diffusers | |
| tags: | |
| - video-generation | |
| - video-reasoning | |
| - logical-reasoning | |
| - lora | |
| - ltx-2.3 | |
| base_model: | |
| - Lightricks/LTX-2.3 | |
| # LTX-2 VBVR LoRA - Video Reasoning | |
| LoRA fine-tuned weights for LTX-2.3 22B on the VBVR (A Very Big Video Reasoning Suite) dataset. | |
| ## Training Data | |
| **To ensure training quality, we preprocessed the full 1,000,000 videos from the official dataset and randomly sample during training to maintain data diversity. We adopt the official parameters with batch_size=16 and rank=32 to prevent catastrophic forgetting caused by excessively large rank.** | |
| The VBVR dataset contains 100 reasoning task categories, with ~10,000 variants per task, totaling ~1M videos. Main task types include: | |
| - **Object Trajectory**: Objects moving to target positions | |
| - **Physical Reasoning**: Rolling balls, collisions, gravity | |
| - **Causal Relationships**: Conditional triggers, chain reactions | |
| - **Spatial Relationships**: Relative positions, path planning | |
| ## Model Details | |
| | Item | Details | | |
| |------|---------| | |
| | Base Model | ltx-2.3-22b-dev | | |
| | Training Method | LoRA Fine-tuning | | |
| | LoRA Rank | 32 | | |
| | Effective Batch Size | 16 | | |
| | Mixed Precision | BF16 | | |
| ## TODO List | |
| ### Dataset Release Plan | |
| | Dataset | Videos | Status | | |
| |---------|--------|--------| | |
| | VBVR-96K | 96,000 | ✅ Released | | |
| | VBVR-240K | 240,000 | ✅ Released | | |
| | VBVR-reinforce | 240K+150K | ✅ Released | | |
| ## LoRA Capabilities | |
| This LoRA adapter enhances the base LTX-2 model for production video generation workflows: | |
| - **Enhanced Complex Prompt Understanding**: Accurately interprets multi-object, multi-condition prompts with detailed spatial descriptions and temporal sequences, reducing prompt misinterpretation in production scenarios. | |
| - **Improved Motion Dynamics**: Generates smooth, physically plausible object movements with natural acceleration, deceleration, and trajectory curves, avoiding robotic or unnatural motion patterns. | |
| - **Temporal Consistency**: Maintains object appearance, lighting, and scene coherence throughout the video sequence, reducing flickering and frame-to-frame artifacts common in generated videos. | |
| - **Precise Timing Control**: Enables accurate control over action duration, pacing, and synchronization between multiple moving elements based on prompt semantics. | |
| - **Multi-Object Interaction**: Handles complex scenes with multiple objects interacting simultaneously, including collisions, following, avoiding, and coordinated movements. | |
| - **Camera and Framing Stability**: Maintains consistent camera perspective and framing throughout the sequence, avoiding unwanted camera shake or unexpected viewpoint changes. | |
| ## Training Configuration | |
| ### Stage 1: VBVR Foundation (96K) | |
| | Config | Value | | |
| |--------|-------| | |
| | Dataset | 96K VBVR general videos | | |
| | Learning Rate | 1e-4 | | |
| | Scheduler | Cosine | | |
| | Batch Size | 1 × 16 (gradient accumulation) | | |
| | Optimizer | AdamW | | |
| | Max Grad Norm | 1.0 | | |
| | Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` | | |
| ### Stage 2: VBVR Extended (240K) | |
| | Config | Value | | |
| |--------|-------| | |
| | Dataset | 240K general videos | | |
| | Learning Rate | 1e-4 | | |
| | Scheduler | Cosine | | |
| | Batch Size | 1 × 16 (gradient accumulation) | | |
| | Optimizer | AdamW | | |
| | Max Grad Norm | 1.0 | | |
| | Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` | | |
| ### Stage 3: General + Hard Reasoning (490K) | |
| | Config | Value | | |
| |--------|-------| | |
| | Dataset | 240K general videos + 150K high-difficulty reasoning videos | | |
| | Learning Rate | 5e-5 | | |
| | Scheduler | Cosine | | |
| | Batch Size | 1 × 16 (gradient accumulation) | | |
| | Optimizer | AdamW | | |
| | Max Grad Norm | 1.0 | | |
| | Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0` (FFN frozen) | | |
| ## Video Demo | |
| ### Training Progress Comparison | |
| <div style="display: flex; gap: 10px; flex-wrap: wrap;"> | |
| <div style="flex: 1; min-width: 300px;"> | |
| <video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/original01.mp4" controls style="width: 100%;"></video> | |
| <p style="text-align: center; margin: 8px 0 0 0;"><strong>Original Model</strong></p> | |
| </div> | |
| <div style="flex: 1; min-width: 300px;"> | |
| <video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/240K.mp4" controls style="width: 100%;"></video> | |
| <p style="text-align: center; margin: 8px 0 0 0;"><strong>240K</strong></p> | |
| </div> | |
| <div style="flex: 1; min-width: 300px;"> | |
| <video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/S3.mp4" controls style="width: 100%;"></video> | |
| <p style="text-align: center; margin: 8px 0 0 0;"><strong>390K</strong></p> | |
| </div> | |
| </div> | |
| ## Dataset | |
| This model is trained on the VBVR (Video Benchmark for Video Reasoning) dataset from [video-reason.com](https://video-reason.com/). | |
| ## Contact | |
| For questions or suggestions, please open an issue on Hugging Face or contact the author directly. |