Leon1000 LiconStudio commited on
Commit
8f3afe1
·
0 Parent(s):

Duplicate from LiconStudio/Ltx2.3-VBVR-lora-I2V

Browse files

Co-authored-by: LiconStudio <LiconStudio@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ step_000000_1.mp4 filter=lfs diff=lfs merge=lfs -text
37
+ step_006000_1.mp4 filter=lfs diff=lfs merge=lfs -text
38
+ 15000_01.mp4 filter=lfs diff=lfs merge=lfs -text
39
+ official02.mp4 filter=lfs diff=lfs merge=lfs -text
40
+ original.mp4 filter=lfs diff=lfs merge=lfs -text
41
+ 240K.mp4 filter=lfs diff=lfs merge=lfs -text
42
+ original01.mp4 filter=lfs diff=lfs merge=lfs -text
43
+ S3.mp4 filter=lfs diff=lfs merge=lfs -text
240K.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1e058a1890a47b23ada2264201ec8f33b12f070ecadeb55152c9283ea589a409
3
+ size 1048149
Ltx2.3-Licon-VBVR-I2V-240K-R32.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b242af2c92a077a48138c607fffe865defed75f5e1eb23cb77135323513ce21
3
+ size 554006432
Ltx2.3-Licon-VBVR-I2V-390K-R32.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4cb77bc088b04fd69fe5a711356aec05e1a9d503b47e6e252a7f2577c7716a18
3
+ size 554006432
Ltx2.3-Licon-VBVR-I2V-96000-R32.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec5ff61d3e2959babf01112a8d7f00776273415573aa29a82f2af00070fd1408
3
+ size 554006432
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: ltx-2-community-license-agreement
4
+ license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE
5
+ language:
6
+ - en
7
+ - zh
8
+ library_name: diffusers
9
+ tags:
10
+ - video-generation
11
+ - video-reasoning
12
+ - logical-reasoning
13
+ - lora
14
+ - ltx-2.3
15
+ base_model:
16
+ - Lightricks/LTX-2.3
17
+ ---
18
+
19
+ # LTX-2 VBVR LoRA - Video Reasoning
20
+
21
+ LoRA fine-tuned weights for LTX-2.3 22B on the VBVR (A Very Big Video Reasoning Suite) dataset.
22
+
23
+ ## Training Data
24
+
25
+ **To ensure training quality, we preprocessed the full 1,000,000 videos from the official dataset and randomly sample during training to maintain data diversity. We adopt the official parameters with batch_size=16 and rank=32 to prevent catastrophic forgetting caused by excessively large rank.**
26
+
27
+ The VBVR dataset contains 100 reasoning task categories, with ~10,000 variants per task, totaling ~1M videos. Main task types include:
28
+
29
+ - **Object Trajectory**: Objects moving to target positions
30
+ - **Physical Reasoning**: Rolling balls, collisions, gravity
31
+ - **Causal Relationships**: Conditional triggers, chain reactions
32
+ - **Spatial Relationships**: Relative positions, path planning
33
+
34
+ ## Model Details
35
+
36
+ | Item | Details |
37
+ |------|---------|
38
+ | Base Model | ltx-2.3-22b-dev |
39
+ | Training Method | LoRA Fine-tuning |
40
+ | LoRA Rank | 32 |
41
+ | Effective Batch Size | 16 |
42
+ | Mixed Precision | BF16 |
43
+
44
+ ## TODO List
45
+
46
+ ### Dataset Release Plan
47
+
48
+ | Dataset | Videos | Status |
49
+ |---------|--------|--------|
50
+ | VBVR-96K | 96,000 | ✅ Released |
51
+ | VBVR-240K | 240,000 | ✅ Released |
52
+ | VBVR-reinforce | 240K+150K | ✅ Released |
53
+
54
+ ## LoRA Capabilities
55
+
56
+ This LoRA adapter enhances the base LTX-2 model for production video generation workflows:
57
+
58
+ - **Enhanced Complex Prompt Understanding**: Accurately interprets multi-object, multi-condition prompts with detailed spatial descriptions and temporal sequences, reducing prompt misinterpretation in production scenarios.
59
+
60
+ - **Improved Motion Dynamics**: Generates smooth, physically plausible object movements with natural acceleration, deceleration, and trajectory curves, avoiding robotic or unnatural motion patterns.
61
+
62
+ - **Temporal Consistency**: Maintains object appearance, lighting, and scene coherence throughout the video sequence, reducing flickering and frame-to-frame artifacts common in generated videos.
63
+
64
+ - **Precise Timing Control**: Enables accurate control over action duration, pacing, and synchronization between multiple moving elements based on prompt semantics.
65
+
66
+ - **Multi-Object Interaction**: Handles complex scenes with multiple objects interacting simultaneously, including collisions, following, avoiding, and coordinated movements.
67
+
68
+ - **Camera and Framing Stability**: Maintains consistent camera perspective and framing throughout the sequence, avoiding unwanted camera shake or unexpected viewpoint changes.
69
+
70
+
71
+ ## Training Configuration
72
+
73
+ ### Stage 1: VBVR Foundation (96K)
74
+ | Config | Value |
75
+ |--------|-------|
76
+ | Dataset | 96K VBVR general videos |
77
+ | Learning Rate | 1e-4 |
78
+ | Scheduler | Cosine |
79
+ | Batch Size | 1 × 16 (gradient accumulation) |
80
+ | Optimizer | AdamW |
81
+ | Max Grad Norm | 1.0 |
82
+ | Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` |
83
+
84
+ ### Stage 2: VBVR Extended (240K)
85
+ | Config | Value |
86
+ |--------|-------|
87
+ | Dataset | 240K general videos |
88
+ | Learning Rate | 1e-4 |
89
+ | Scheduler | Cosine |
90
+ | Batch Size | 1 × 16 (gradient accumulation) |
91
+ | Optimizer | AdamW |
92
+ | Max Grad Norm | 1.0 |
93
+ | Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` |
94
+
95
+ ### Stage 3: General + Hard Reasoning (490K)
96
+ | Config | Value |
97
+ |--------|-------|
98
+ | Dataset | 240K general videos + 150K high-difficulty reasoning videos |
99
+ | Learning Rate | 5e-5 |
100
+ | Scheduler | Cosine |
101
+ | Batch Size | 1 × 16 (gradient accumulation) |
102
+ | Optimizer | AdamW |
103
+ | Max Grad Norm | 1.0 |
104
+ | Target Modules | `to_q`, `to_k`, `to_v`, `to_out.0` (FFN frozen) |
105
+
106
+ ## Video Demo
107
+
108
+ ### Training Progress Comparison
109
+
110
+ <div style="display: flex; gap: 10px; flex-wrap: wrap;">
111
+
112
+ <div style="flex: 1; min-width: 300px;">
113
+ <video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/original01.mp4" controls style="width: 100%;"></video>
114
+ <p style="text-align: center; margin: 8px 0 0 0;"><strong>Original Model</strong></p>
115
+ </div>
116
+
117
+ <div style="flex: 1; min-width: 300px;">
118
+ <video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/240K.mp4" controls style="width: 100%;"></video>
119
+ <p style="text-align: center; margin: 8px 0 0 0;"><strong>240K</strong></p>
120
+ </div>
121
+
122
+ <div style="flex: 1; min-width: 300px;">
123
+ <video src="https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V/resolve/main/S3.mp4" controls style="width: 100%;"></video>
124
+ <p style="text-align: center; margin: 8px 0 0 0;"><strong>390K</strong></p>
125
+ </div>
126
+
127
+ </div>
128
+
129
+ ## Dataset
130
+
131
+ This model is trained on the VBVR (Video Benchmark for Video Reasoning) dataset from [video-reason.com](https://video-reason.com/).
132
+
133
+
134
+ ## Contact
135
+
136
+ For questions or suggestions, please open an issue on Hugging Face or contact the author directly.
S3.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:795a0b2a0e26ceef3cb458c18cb9747f71907f98208843588efe6e7607e5a73f
3
+ size 1086541
VBVR-official-comfyui.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:09583c64089a23f6998d2619c55aa332387859962df6eeef48a73c518e73213c
3
+ size 428150664
original01.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3ab3b8a278805971462b9ac7bf6f98eb65d5da9e72a17eb3ed60fcf7a6a3be0
3
+ size 1321129