GuardSkill commited on
Commit
a9e6298
·
verified ·
1 Parent(s): 11206b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +201 -68
README.md CHANGED
@@ -2,130 +2,263 @@
2
  license: apache-2.0
3
  language:
4
  - en
5
- library_name: other
 
 
 
6
  tags:
7
- - video-generation
8
  - video-editing
 
9
  - ltx-video
10
- - lora
11
- - safetensors
12
- - image-to-video
13
  - watermark-removal
 
14
  - super-resolution
15
- pipeline_tag: image-to-video
 
16
  ---
17
 
18
  # LTX2.3-ICEdit-Insight
19
 
20
  <table>
21
  <tr>
22
- <td align="center"><img src="./assets/effects/output_004_bigger.webp" alt="Super-resolution preview" width="700"/></td>
23
  </tr>
24
  </table>
25
 
26
- Research-oriented model release for task-aware video restoration and editing under the `LTX-2.3` framework.
27
-
28
- Project links: [GitHub project](https://github.com/Valiant-Cat/LTX2-ICEdit-Insight) | [Valiant Cat on Hugging Face](https://huggingface.co/valiantcat)
29
-
30
- This repository contains:
31
-
32
- - `ltx-2.3-edit-insight-dev-fp8.safetensors`
33
- - the all-in-one Insight checkpoint
34
- - includes transformer + video VAE + audio VAE + text projection + vocoder
35
- - `ltx2.3-video-upscale-v2.safetensors`
36
- - IC-LoRA for video super-resolution and detail recovery
37
- - `ltx2.3-ic-watermarkeRM.safetensors`
38
- - IC-LoRA for video watermark removal and occlusion restoration
39
-
40
- These weights are intended to be used with the project's `run_pipeline.py` workflow. The recommended default is single-stage inference, where the IC-LoRA guidance remains active through the full-resolution denoising pass.
41
-
42
- ## Research Positioning
43
-
44
- `ltx-2.3-edit-insight-dev-fp8.safetensors` is not presented as a bare deployment checkpoint. It is the unified base model release for the Insight branch of this project: a task-aware spatiotemporal editing backbone that consolidates the diffusion transformer, video VAE, audio VAE, text projection module, and vocoder into a single reproducible artifact.
45
-
46
- From a research perspective, the checkpoint is intended to support controlled video restoration and editing under a shared latent diffusion formulation. The paired IC-LoRA adapters specialize the backbone toward structure-preserving super-resolution and watermark-aware content recovery, while the unified checkpoint packaging keeps the full generative stack aligned for repeatable experiments and downstream ablations.
47
-
48
- ## English Overview
49
 
50
- This package is built for the Insight version of the project's LTX-2.3 editing pipeline. Instead of shipping only task adapters, it also includes the corresponding Insight base checkpoint so the workflow can be reproduced with the exact model assets used by the project.
51
 
52
- Recommended usage:
 
 
 
53
 
54
- - run the companion `run_pipeline.py`
55
- - keep single-stage inference enabled by default
56
- - load one task LoRA at a time depending on the editing goal
57
 
58
- ## 🧠 Training
59
 
60
- This model was trained and optimized by the AI Laboratory of Chongqing Valiant Cat Technology Co., LTD.
61
 
62
- Visit [vvicat.com](https://vvicat.com/) for business collaborations or research partnerships.
63
-
64
- ## 🧩 Integration with ComfyUI
65
-
66
- This model works with the modified ComfyUI [workflows](https://github.com/Valiant-Cat/LTX2-ICEdit-Insight/tree/main/workflows) provided by the project.
67
-
68
- For ComfyUI-based editing, load the base model in the UNet-side model path required by the workflow, then attach the task-specific IC-LoRA for the corresponding edit objective.
69
-
70
- ## Files
71
  | File | Purpose |
72
  | --- | --- |
73
- | `ltx-2.3-edit-insight-dev-fp8.safetensors` | All-in-one Insight base checkpoint |
74
- | `ltx2.3-video-upscale-v2.safetensors` | Super-resolution / detail enhancement IC-LoRA |
75
- | `ltx2.3-ic-watermarkeRM.safetensors` | Watermark removal / occlusion restoration IC-LoRA |
 
 
76
 
77
  ## Showcase
78
 
79
  <table>
80
  <tr>
81
- <td align="center"><img src="./assets/effects/output_004.webp" alt="Super-resolution preview 1" width="600"/></td>
82
- <td align="center"><img src="./assets/effects/output_005.webp" alt="Super-resolution preview 2" width="600"/></td>
 
 
 
 
 
 
 
 
 
 
 
 
83
  </tr>
84
  </table>
85
 
86
- ## Usage With This Project
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
- Run all commands from the project root.
89
 
90
- ### Super-resolution
91
 
92
  ```bash
93
  PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
94
  python run_pipeline.py \
95
- --mode upscale \
96
- --video ./inputs/input.mp4 \
97
- --prompt "Convert the video to ultra-high definition quality, rebuilding high-frequency details while eliminating artifacts." \
98
- --output ./outputs/output_upscale.mp4 \
99
  --height 1184 --width 704 --num-frames 97 \
100
  --fps 24.0 --seed 42 \
101
  --sigma-profile workflow \
 
102
  --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
103
- --lora ./models/loras/ltx2.3-train/ltx2.3-video-upscale-v2.safetensors
104
  ```
105
 
106
- ### Watermark removal
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
 
108
  ```bash
109
  PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
110
  python run_pipeline.py \
111
  --mode watermark_rm \
112
- --video ./inputs/input.mp4 \
113
  --prompt "Remove short-video platform watermarks and related occlusions from the video, restoring a clean, clear, and natural original image." \
114
  --output ./outputs/output_watermark_rm.mp4 \
115
  --height 1184 --width 704 --num-frames 97 \
116
  --fps 24.0 --seed 1546 \
117
  --sigma-profile workflow \
 
118
  --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
119
- --lora ./models/loras/ltx2.3-train/ltx2.3-ic-watermarkeRM.safetensors
120
  ```
121
 
122
- ## Notes
123
 
124
- - Single-stage inference is the default recommendation.
125
- - In two-stage mode, the second-stage refinement does not keep the IC-LoRA constraint, which can increase content drift.
126
- - Frame count must satisfy `8k + 1`.
127
- - Single-stage output height and width should be multiples of `32`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
128
 
129
  ## License
130
 
131
- This repository is released under the Apache 2.0 license.
 
 
 
2
  license: apache-2.0
3
  language:
4
  - en
5
+ library_name: diffusers
6
+ base_model:
7
+ - Lightricks/LTX-2.3
8
+ pipeline_tag: image-to-video
9
  tags:
 
10
  - video-editing
11
+ - video-restoration
12
  - ltx-video
13
+ - ltx-2-3
14
+ - dit
15
+ - ic-lora
16
  - watermark-removal
17
+ - subtitle-removal
18
  - super-resolution
19
+ - hd-enhancement
20
+ - joyfox
21
  ---
22
 
23
  # LTX2.3-ICEdit-Insight
24
 
25
  <table>
26
  <tr>
27
+ <td align="center"><img src="./assets/effects/output_004_bigger.webp" alt="LTX2.3-ICEdit-Insight preview" width="700"/></td>
28
  </tr>
29
  </table>
30
 
31
+ **LTX2.3-ICEdit-Insight** is a task-aware video restoration and editing model family developed by **JoyFox Lab**, built on top of the **LTX-2.3 DiT-based audio-video foundation model**.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
+ This release focuses on four practical video editing directions:
34
 
35
+ - **Video Restoration**: degradation recovery, compression cleanup, blur and noise reduction, and damaged detail restoration.
36
+ - **Video HD Enhancement**: super-resolution, detail reconstruction, texture sharpening, and perceptual quality improvement.
37
+ - **Watermark Removal**: logo cleanup, semi-transparent overlay removal, and occlusion-aware background reconstruction.
38
+ - **Subtitle Removal**: hard subtitle removal, caption cleanup, text overlay removal, and temporally stable inpainting.
39
 
40
+ Unlike conventional frame-level enhancement pipelines, this model family operates as a **generative video restoration system** in latent video space. It is designed to preserve global structure, camera motion, object identity, and temporal consistency while reconstructing missing or degraded visual content.
 
 
41
 
42
+ Project links: [GitHub project](https://github.com/Valiant-Cat/LTX2-ICEdit-Insight) | [JoyFox on Hugging Face](https://huggingface.co/joyfox)
43
 
44
+ ## Model Files
45
 
 
 
 
 
 
 
 
 
 
46
  | File | Purpose |
47
  | --- | --- |
48
+ | `ltx-2.3-edit-insight-dev-fp8.safetensors` | Unified Insight base checkpoint for LTX-2.3 editing |
49
+ | `ltx2.3-video-restoration-general.safetensors` | Video restoration, artifact cleanup, blur and noise recovery |
50
+ | `ltx2.3-ic-video-upscale-general.safetensors` | Video HD enhancement, super-resolution, and detail recovery |
51
+ | `ltx2.3-ic-watermark-remove-general.safetensors` | Watermark removal and occlusion-aware reconstruction |
52
+ | `ltx2.3-ic-subtitles-remove-general.safetensors` | Subtitle removal and text overlay cleanup |
53
 
54
  ## Showcase
55
 
56
  <table>
57
  <tr>
58
+ <td align="center"><b>Video Restoration</b></td>
59
+ <td align="center"><b>Video HD Enhancement</b></td>
60
+ </tr>
61
+ <tr>
62
+ <td align="center"><img src="./assets/effects/output_004.webp" alt="Video restoration preview" width="600"/></td>
63
+ <td align="center"><img src="./assets/effects/视频高清对比效果.webp" alt="Video HD enhancement preview" width="600"/></td>
64
+ </tr>
65
+ <tr>
66
+ <td align="center"><b>Watermark Removal</b></td>
67
+ <td align="center"><b>Subtitle Removal</b></td>
68
+ </tr>
69
+ <tr>
70
+ <td align="center"><img src="./assets/effects/去水印对比效果.webp" alt="Watermark removal preview" width="600"/></td>
71
+ <td align="center"><img src="./assets/effects/去字幕对比效果.webp" alt="Subtitle removal preview" width="600"/></td>
72
  </tr>
73
  </table>
74
 
75
+ ## Script Usage
76
+
77
+ Run all scripts from the project root.
78
+
79
+ ```bash
80
+ bash run_restoration.sh
81
+ bash run_hd.sh
82
+ bash run_hd.sh /path/to/input.mp4
83
+ bash run_watermark_rm.sh
84
+ bash run_watermark_rm.sh /path/to/input.mp4
85
+ bash run_subtitle_rm.sh
86
+ bash run_subtitle_rm.sh /path/to/input.mp4
87
+ ```
88
 
89
+ ## Command Examples
90
 
91
+ ### Video Restoration
92
 
93
  ```bash
94
  PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
95
  python run_pipeline.py \
96
+ --mode restoration \
97
+ --video ./inputs/input_480p.mp4 \
98
+ --prompt "Convert the video to ultra-high-definition quality while removing artifacts and rebuilding high-frequency details." \
99
+ --output ./outputs/output_restoration.mp4 \
100
  --height 1184 --width 704 --num-frames 97 \
101
  --fps 24.0 --seed 42 \
102
  --sigma-profile workflow \
103
+ --streaming-prefetch-count 2 \
104
  --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
105
+ --lora ./models/loras/ltx2.3-train/ltx2.3-video-restoration-general.safetensors
106
  ```
107
 
108
+ ### Video HD Enhancement
109
+
110
+ ```bash
111
+ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
112
+ python run_pipeline.py \
113
+ --mode hd \
114
+ --video ./inputs/input_480p.mp4 \
115
+ --prompt "Convert the video to ultra-high-definition quality, significantly improving clarity, fine detail richness, texture fidelity, and overall perceptual sharpness." \
116
+ --output ./outputs/output_hd.mp4 \
117
+ --height 1184 --width 704 --num-frames 97 \
118
+ --fps 24.0 --seed 42 \
119
+ --sigma-profile workflow \
120
+ --streaming-prefetch-count 2 \
121
+ --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
122
+ --lora ./models/loras/ltx2.3-train/ltx2.3-ic-video-upscale-general.safetensors
123
+ ```
124
+
125
+ ### Watermark Removal
126
 
127
  ```bash
128
  PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
129
  python run_pipeline.py \
130
  --mode watermark_rm \
131
+ --video ./inputs/input_480p.mp4 \
132
  --prompt "Remove short-video platform watermarks and related occlusions from the video, restoring a clean, clear, and natural original image." \
133
  --output ./outputs/output_watermark_rm.mp4 \
134
  --height 1184 --width 704 --num-frames 97 \
135
  --fps 24.0 --seed 1546 \
136
  --sigma-profile workflow \
137
+ --streaming-prefetch-count 2 \
138
  --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
139
+ --lora ./models/loras/ltx2.3-train/ltx2.3-ic-watermark-remove-general.safetensors
140
  ```
141
 
142
+ ### Subtitle Removal
143
 
144
+ ```bash
145
+ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
146
+ python run_pipeline.py \
147
+ --mode subtitle_rm \
148
+ --video ./inputs/input_480p.mp4 \
149
+ --prompt "Remove subtitles, captions, and related text occlusions from the video, restoring a clean and natural underlying image." \
150
+ --output ./outputs/output_subtitle_rm.mp4 \
151
+ --height 1184 --width 704 --num-frames 97 \
152
+ --fps 24.0 --seed 42 \
153
+ --sigma-profile workflow \
154
+ --streaming-prefetch-count 2 \
155
+ --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
156
+ --lora ./models/loras/ltx2.3-train/ltx2.3-ic-subtitles-remove-general.safetensors
157
+ ```
158
+
159
+ ## Key Improvements
160
+
161
+ ### Task-Aware IC-Edit Framework
162
+
163
+ We introduce a task-aware IC-Edit training framework for LTX-2.3, where each restoration direction is optimized with dedicated instruction conditioning and task-specific IC-LoRA adapters.
164
+
165
+ The model is trained not only to improve visual quality, but also to understand the editing goal behind different restoration tasks, including watermark removal, subtitle cleanup, damaged region recovery, and high-definition enhancement.
166
+
167
+ ### LTX-2.3 DiT Backbone Adaptation
168
+
169
+ The model family is built on the LTX-2.3 foundation architecture, a diffusion-transformer video model designed for high-fidelity image-to-video and video generation workflows.
170
+
171
+ Our adaptation targets video restoration by improving:
172
+
173
+ - latent-space editability
174
+ - instruction-following behavior
175
+ - frame-to-frame stability
176
+ - high-frequency detail recovery
177
+ - local reconstruction around degraded or occluded regions
178
+
179
+ ### Spatiotemporal Consistency Optimization
180
+
181
+ Video restoration requires more than strong single-frame quality. We optimize temporal consistency so that restored areas remain stable across adjacent frames.
182
+
183
+ This reduces common artifacts such as:
184
+
185
+ - flickering textures
186
+ - unstable reconstructed backgrounds
187
+ - inconsistent watermark removal
188
+ - subtitle ghosting
189
+ - frame-wise color shift
190
+ - detail popping during motion
191
+
192
+ ### Degradation-Aware Training Curriculum
193
+
194
+ The training curriculum covers realistic video defects including:
195
+
196
+ - compression artifacts
197
+ - motion blur
198
+ - sensor noise
199
+ - low-bitrate video
200
+ - text overlays
201
+ - hard subtitles
202
+ - semi-transparent watermarks
203
+ - platform logos
204
+ - local occlusions
205
+ - low-resolution inputs
206
+
207
+ This improves generalization across short videos, social-media clips, mobile footage, downloaded videos, and compressed production material.
208
+
209
+ ### Occlusion-Aware Reconstruction
210
+
211
+ For watermark and subtitle removal, the model is optimized to reconstruct the hidden visual content behind occluded regions.
212
+
213
+ Instead of smearing or blurring the target area, it uses surrounding spatial context and temporal cues to infer plausible background structure, object boundaries, lighting, and texture continuity.
214
+
215
+ ### Frequency-Enhanced HD Restoration
216
+
217
+ For HD enhancement, the model improves perceptual sharpness and fine visual detail through frequency-aware restoration training.
218
+
219
+ This is especially helpful for recovering:
220
+
221
+ - hair strands
222
+ - fabric texture
223
+ - skin detail
224
+ - product edges
225
+ - background patterns
226
+ - typography-like fine structures
227
+ - natural image clarity
228
+
229
+ ## Inference Notes
230
+
231
+ - Single-stage inference is recommended for most editing tasks.
232
+ - Two-stage refinement can improve visual polish but may weaken task-specific LoRA constraints.
233
+ - Watermark and subtitle removal perform best when the occlusion area is stable and not excessively large.
234
+ - HD enhancement quality depends on input resolution, motion complexity, and compression level.
235
+ - Higher output resolution improves detail but requires more VRAM.
236
+ - For strong-motion videos, conservative denoising settings are recommended to preserve temporal structure.
237
+ - Frame count should follow the `8k + 1` rule.
238
+ - Output height and width should be multiples of `32` in single-stage inference.
239
+
240
+ ## Training
241
+
242
+ This model family was trained and optimized by **JoyFox Lab** (**Chengdu Xuanhu Technology Co., Ltd.**).
243
+
244
+ The training pipeline includes:
245
+
246
+ - task-aware video restoration data construction
247
+ - degradation synthesis and curriculum training
248
+ - IC-LoRA specialization for four editing directions
249
+ - temporal consistency regularization
250
+ - occlusion-aware reconstruction training
251
+ - high-frequency perceptual enhancement
252
+ - instruction-guided video editing optimization
253
+
254
+ ## Contact
255
+
256
+ For research collaboration, commercial licensing, or workflow integration, contact:
257
+
258
+ - `z@vvicat.com`
259
 
260
  ## License
261
 
262
+ Licensed under **Apache 2.0**.
263
+
264
+ Please also review the license terms of the upstream LTX-2.3 base model when using or redistributing derivative checkpoints.