Duplicate from LiconStudio/Ltx2.3-VBVR-lora-I2V

Browse files

Co-authored-by: LiconStudio <LiconStudio@users.noreply.huggingface.co>

Files changed (9) hide show

.gitattributes +43 -0
240K.mp4 +3 -0
Ltx2.3-Licon-VBVR-I2V-240K-R32.safetensors +3 -0
Ltx2.3-Licon-VBVR-I2V-390K-R32.safetensors +3 -0
Ltx2.3-Licon-VBVR-I2V-96000-R32.safetensors +3 -0
README.md +136 -0
S3.mp4 +3 -0
VBVR-official-comfyui.safetensors +3 -0
original01.mp4 +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,43 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+step_000000_1.mp4 filter=lfs diff=lfs merge=lfs -text
+step_006000_1.mp4 filter=lfs diff=lfs merge=lfs -text
+15000_01.mp4 filter=lfs diff=lfs merge=lfs -text
+official02.mp4 filter=lfs diff=lfs merge=lfs -text
+original.mp4 filter=lfs diff=lfs merge=lfs -text
+240K.mp4 filter=lfs diff=lfs merge=lfs -text
+original01.mp4 filter=lfs diff=lfs merge=lfs -text
+S3.mp4 filter=lfs diff=lfs merge=lfs -text

240K.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1e058a1890a47b23ada2264201ec8f33b12f070ecadeb55152c9283ea589a409
+size 1048149

Ltx2.3-Licon-VBVR-I2V-240K-R32.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9b242af2c92a077a48138c607fffe865defed75f5e1eb23cb77135323513ce21
+size 554006432

Ltx2.3-Licon-VBVR-I2V-390K-R32.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4cb77bc088b04fd69fe5a711356aec05e1a9d503b47e6e252a7f2577c7716a18
+size 554006432

Ltx2.3-Licon-VBVR-I2V-96000-R32.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec5ff61d3e2959babf01112a8d7f00776273415573aa29a82f2af00070fd1408
+size 554006432

README.md ADDED Viewed

	@@ -0,0 +1,136 @@

+---
+license: other
+license_name: ltx-2-community-license-agreement
+license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE
+language:
+- en
+- zh
+library_name: diffusers
+tags:
+- video-generation
+- video-reasoning
+- logical-reasoning
+- lora
+- ltx-2.3
+base_model:
+- Lightricks/LTX-2.3
+---
+# LTX-2 VBVR LoRA - Video Reasoning
+LoRA fine-tuned weights for LTX-2.3 22B on the VBVR (A Very Big Video Reasoning Suite) dataset.
+## Training Data
+**To ensure training quality, we preprocessed the full 1,000,000 videos from the official dataset and randomly sample during training to maintain data diversity. We adopt the official parameters with batch_size=16 and rank=32 to prevent catastrophic forgetting caused by excessively large rank.**
+The VBVR dataset contains 100 reasoning task categories, with ~10,000 variants per task, totaling ~1M videos. Main task types include:
+- **Object Trajectory**: Objects moving to target positions
+- **Physical Reasoning**: Rolling balls, collisions, gravity
+- **Causal Relationships**: Conditional triggers, chain reactions
+- **Spatial Relationships**: Relative positions, path planning
+## Model Details
+| Item | Details |
+|------|---------|
+| Base Model | ltx-2.3-22b-dev |
+| Training Method | LoRA Fine-tuning |
+| LoRA Rank | 32 |
+| Effective Batch Size | 16 |
+| Mixed Precision | BF16 |
+## TODO List
+### Dataset Release Plan
+| Dataset | Videos | Status |
+|---------|--------|--------|
+| VBVR-96K | 96,000 | ✅ Released |
+| VBVR-240K | 240,000 | ✅ Released |
+| VBVR-reinforce | 240K+150K | ✅ Released |
+## LoRA Capabilities
+This LoRA adapter enhances the base LTX-2 model for production video generation workflows:
+- **Enhanced Complex Prompt Understanding**: Accurately interprets multi-object, multi-condition prompts with detailed spatial descriptions and temporal sequences, reducing prompt misinterpretation in production scenarios.
+- **Improved Motion Dynamics**: Generates smooth, physically plausible object movements with natural acceleration, deceleration, and trajectory curves, avoiding robotic or unnatural motion patterns.
+- **Temporal Consistency**: Maintains object appearance, lighting, and scene coherence throughout the video sequence, reducing flickering and frame-to-frame artifacts common in generated videos.
+- **Precise Timing Control**: Enables accurate control over action duration, pacing, and synchronization between multiple moving elements based on prompt semantics.
+- **Multi-Object Interaction**: Handles complex scenes with multiple objects interacting simultaneously, including collisions, following, avoiding, and coordinated movements.
+- **Camera and Framing Stability**: Maintains consistent camera perspective and framing throughout the sequence, avoiding unwanted camera shake or unexpected viewpoint changes.
+## Training Configuration
+### Stage 1: VBVR Foundation (96K)
+| Config | Value |
+|--------|-------|
+| Dataset | 96K VBVR general videos |
+| Learning Rate | 1e-4 |
+| Scheduler | Cosine  |
+| Batch Size | 1 × 16 (gradient accumulation) |
+| Optimizer | AdamW |
+| Max Grad Norm | 1.0 |
+| Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` |
+### Stage 2: VBVR Extended (240K)
+| Config | Value |
+|--------|-------|
+| Dataset | 240K general videos |
+| Learning Rate | 1e-4 |
+| Scheduler | Cosine |
+| Batch Size | 1 × 16 (gradient accumulation) |
+| Optimizer | AdamW |
+| Max Grad Norm | 1.0 |
+| Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` |
+### Stage 3: General + Hard Reasoning (490K)
+| Config | Value |
+|--------|-------|
+| Dataset | 240K general videos + 150K high-difficulty reasoning videos |
+| Learning Rate | 5e-5 |
+| Scheduler | Cosine |
+| Batch Size | 1 × 16 (gradient accumulation) |
+| Optimizer | AdamW |
+| Max Grad Norm | 1.0 |
+| Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0` (FFN frozen) |
+## Video Demo
+### Training Progress Comparison
+<div style="display: flex; gap: 10px; flex-wrap: wrap;">
+<div style="flex: 1; min-width: 300px;">
+<video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/original01.mp4" controls style="width: 100%;"></video>
+<p style="text-align: center; margin: 8px 0 0 0;"><strong>Original Model</strong></p>
+</div>
+<div style="flex: 1; min-width: 300px;">
+<video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/240K.mp4" controls style="width: 100%;"></video>
+<p style="text-align: center; margin: 8px 0 0 0;"><strong>240K</strong></p>
+</div>
+<div style="flex: 1; min-width: 300px;">
+<video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/S3.mp4" controls style="width: 100%;"></video>
+<p style="text-align: center; margin: 8px 0 0 0;"><strong>390K</strong></p>
+</div>
+</div>
+## Dataset
+This model is trained on the VBVR (Video Benchmark for Video Reasoning) dataset from [video-reason.com](https://video-reason.com/).
+## Contact
+For questions or suggestions, please open an issue on Hugging Face or contact the author directly.

S3.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:795a0b2a0e26ceef3cb458c18cb9747f71907f98208843588efe6e7607e5a73f
+size 1086541

VBVR-official-comfyui.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:09583c64089a23f6998d2619c55aa332387859962df6eeef48a73c518e73213c
+size 428150664

original01.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d3ab3b8a278805971462b9ac7bf6f98eb65d5da9e72a17eb3ed60fcf7a6a3be0
+size 1321129