File size: 11,280 Bytes
3502451
 
 
 
a9e6298
 
 
9e741c4
3502451
 
a9e6298
3502451
a9e6298
 
 
3502451
a9e6298
3502451
a9e6298
 
3502451
 
 
 
492edf4
 
392a6b1
 
 
 
 
 
492edf4
 
 
a9e6298
3502451
a9e6298
3502451
a9e6298
 
 
 
3502451
a9e6298
3502451
a9e6298
492edf4
392a6b1
492edf4
3502451
 
a9e6298
 
 
 
 
3502451
392a6b1
8f66c97
 
 
a9e6298
 
 
 
 
 
 
 
 
 
 
 
 
 
8f66c97
 
 
8ad3de8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
392a6b1
a9e6298
 
 
 
 
 
 
 
 
 
 
 
3502451
392a6b1
3502451
a9e6298
3502451
 
 
 
a9e6298
 
 
 
3502451
 
 
a9e6298
3502451
a9e6298
3502451
 
a9e6298
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3502451
 
 
 
 
a9e6298
3502451
 
 
 
 
a9e6298
3502451
a9e6298
3502451
 
a9e6298
3502451
a9e6298
 
 
 
 
 
 
 
 
 
 
 
 
 
 
392a6b1
a9e6298
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
392a6b1
a9e6298
 
 
 
 
 
 
 
 
 
392a6b1
a9e6298
 
 
 
 
 
 
 
 
 
 
 
 
392a6b1
a9e6298
 
 
 
3502451
392a6b1
3502451
a9e6298
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
---
license: apache-2.0
language:
  - en
library_name: diffusers
base_model:
  - Lightricks/LTX-2.3
pipeline_tag: video-to-video
tags:
  - video-editing
  - video-restoration
  - ltx-video
  - ltx-2-3
  - dit
  - ic-lora
  - watermark-removal
  - subtitle-removal
  - super-resolution
  - hd-enhancement
  - joyfox
---

# LTX2.3-ICEdit-Insight

<table>
  <tr>
    <td align="center"><img src="./assets/effects/output_004.webp" alt="Video restoration preview" width="420"/></td>
    <td align="center"><img src="./assets/effects/视频高清对比效果.webp" alt="Video HD enhancement preview" width="420"/></td>
  </tr>
  <tr>
    <td align="center"><img src="./assets/effects/去水印对比效果.webp" alt="Watermark removal preview" width="420"/></td>
    <td align="center"><img src="./assets/effects/去字幕对比效果.webp" alt="Subtitle removal preview" width="420"/></td>
  </tr>
</table>

**LTX2.3-ICEdit-Insight** is a task-aware video restoration and editing model family developed by **JoyFox Lab**, built on top of the **LTX-2.3 DiT-based audio-video foundation model**.

This release focuses on four practical video editing directions:

- **Video Restoration**: degradation recovery, compression cleanup, blur and noise reduction, and damaged detail restoration.
- **Video HD Enhancement**: super-resolution, detail reconstruction, texture sharpening, and perceptual quality improvement.
- **Watermark Removal**: logo cleanup, semi-transparent overlay removal, and occlusion-aware background reconstruction.
- **Subtitle Removal**: hard subtitle removal, caption cleanup, text overlay removal, and temporally stable inpainting.

Unlike conventional frame-level enhancement pipelines, this model family operates as a **generative video restoration system** in latent video space. It is designed to preserve global structure, camera motion, object identity, and temporal consistency while reconstructing missing or degraded visual content.

Project links: [GitHub project](https://github.com/Valiant-Cat/LTX2-ICEdit-Insight) | [JoyFox on Hugging Face](https://huggingface.co/joyfox)

## 📦 Model Files

| File | Purpose |
| --- | --- |
| `ltx-2.3-edit-insight-dev-fp8.safetensors` | Unified Insight base checkpoint for LTX-2.3 editing |
| `ltx2.3-video-restoration-general.safetensors` | Video restoration, artifact cleanup, blur and noise recovery |
| `ltx2.3-ic-video-upscale-general.safetensors` | Video HD enhancement, super-resolution, and detail recovery |
| `ltx2.3-ic-watermark-remove-general.safetensors` | Watermark removal and occlusion-aware reconstruction |
| `ltx2.3-ic-subtitles-remove-general.safetensors` | Subtitle removal and text overlay cleanup |

## 🎬 Showcase

<table>
  <tr>
    <td align="center"><b>Video Restoration</b></td>
    <td align="center"><b>Video HD Enhancement</b></td>
  </tr>
  <tr>
    <td align="center"><img src="./assets/effects/output_004.webp" alt="Video restoration preview" width="600"/></td>
    <td align="center"><img src="./assets/effects/视频高清对比效果.webp" alt="Video HD enhancement preview" width="600"/></td>
  </tr>
  <tr>
    <td align="center"><b>Watermark Removal</b></td>
    <td align="center"><b>Subtitle Removal</b></td>
  </tr>
  <tr>
    <td align="center"><img src="./assets/effects/去水印对比效果.webp" alt="Watermark removal preview" width="600"/></td>
    <td align="center"><img src="./assets/effects/去字幕对比效果.webp" alt="Subtitle removal preview" width="600"/></td>
  </tr>
</table>

<br/>

<table>
  <tr>
    <td align="center"><b>Video Restoration</b></td>
    <td align="center"><b>Video HD Enhancement</b></td>
  </tr>
  <tr>
    <td align="center"><img src="./assets/effects/视频修复对比效果2.webp" alt="Video restoration preview 2" width="600"/></td>
    <td align="center"><img src="./assets/effects/视频高清对比效果2.webp" alt="Video HD enhancement preview 2" width="600"/></td>
  </tr>
  <tr>
    <td align="center"><b>Watermark Removal</b></td>
    <td align="center"><b>Subtitle Removal</b></td>
  </tr>
  <tr>
    <td align="center"><img src="./assets/effects/去水印对比效果2.webp" alt="Watermark removal preview 2" width="600"/></td>
    <td align="center"><img src="./assets/effects/去字幕对比效果2.webp" alt="Subtitle removal preview 2" width="600"/></td>
  </tr>
</table>

## 🚀 Script Usage

Run all scripts from the project root.

```bash
bash run_restoration.sh
bash run_hd.sh
bash run_hd.sh /path/to/input.mp4
bash run_watermark_rm.sh
bash run_watermark_rm.sh /path/to/input.mp4
bash run_subtitle_rm.sh
bash run_subtitle_rm.sh /path/to/input.mp4
```

## 💻 Command Examples

### Video Restoration

```bash
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
python run_pipeline.py \
  --mode restoration \
  --video ./inputs/input_480p.mp4 \
  --prompt "Convert the video to ultra-high-definition quality while removing artifacts and rebuilding high-frequency details." \
  --output ./outputs/output_restoration.mp4 \
  --height 1184 --width 704 --num-frames 97 \
  --fps 24.0 --seed 42 \
  --sigma-profile workflow \
  --streaming-prefetch-count 2 \
  --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
  --lora ./models/loras/ltx2.3-train/ltx2.3-video-restoration-general.safetensors
```

### Video HD Enhancement

```bash
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
python run_pipeline.py \
  --mode hd \
  --video ./inputs/input_480p.mp4 \
  --prompt "Convert the video to ultra-high-definition quality, significantly improving clarity, fine detail richness, texture fidelity, and overall perceptual sharpness." \
  --output ./outputs/output_hd.mp4 \
  --height 1184 --width 704 --num-frames 97 \
  --fps 24.0 --seed 42 \
  --sigma-profile workflow \
  --streaming-prefetch-count 2 \
  --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
  --lora ./models/loras/ltx2.3-train/ltx2.3-ic-video-upscale-general.safetensors
```

### Watermark Removal

```bash
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
python run_pipeline.py \
  --mode watermark_rm \
  --video ./inputs/input_480p.mp4 \
  --prompt "Remove short-video platform watermarks and related occlusions from the video, restoring a clean, clear, and natural original image." \
  --output ./outputs/output_watermark_rm.mp4 \
  --height 1184 --width 704 --num-frames 97 \
  --fps 24.0 --seed 1546 \
  --sigma-profile workflow \
  --streaming-prefetch-count 2 \
  --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
  --lora ./models/loras/ltx2.3-train/ltx2.3-ic-watermark-remove-general.safetensors
```

### Subtitle Removal

```bash
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
python run_pipeline.py \
  --mode subtitle_rm \
  --video ./inputs/input_480p.mp4 \
  --prompt "Remove subtitles, captions, and related text occlusions from the video, restoring a clean and natural underlying image." \
  --output ./outputs/output_subtitle_rm.mp4 \
  --height 1184 --width 704 --num-frames 97 \
  --fps 24.0 --seed 42 \
  --sigma-profile workflow \
  --streaming-prefetch-count 2 \
  --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
  --lora ./models/loras/ltx2.3-train/ltx2.3-ic-subtitles-remove-general.safetensors
```

## ✨ Key Improvements

### Task-Aware IC-Edit Framework

We introduce a task-aware IC-Edit training framework for LTX-2.3, where each restoration direction is optimized with dedicated instruction conditioning and task-specific IC-LoRA adapters.

The model is trained not only to improve visual quality, but also to understand the editing goal behind different restoration tasks, including watermark removal, subtitle cleanup, damaged region recovery, and high-definition enhancement.

### LTX-2.3 DiT Backbone Adaptation

The model family is built on the LTX-2.3 foundation architecture, a diffusion-transformer video model designed for high-fidelity image-to-video and video generation workflows.

Our adaptation targets video restoration by improving:

- latent-space editability
- instruction-following behavior
- frame-to-frame stability
- high-frequency detail recovery
- local reconstruction around degraded or occluded regions

### Spatiotemporal Consistency Optimization

Video restoration requires more than strong single-frame quality. We optimize temporal consistency so that restored areas remain stable across adjacent frames.

This reduces common artifacts such as:

- flickering textures
- unstable reconstructed backgrounds
- inconsistent watermark removal
- subtitle ghosting
- frame-wise color shift
- detail popping during motion

### Degradation-Aware Training Curriculum

The training curriculum covers realistic video defects including:

- compression artifacts
- motion blur
- sensor noise
- low-bitrate video
- text overlays
- hard subtitles
- semi-transparent watermarks
- platform logos
- local occlusions
- low-resolution inputs

This improves generalization across short videos, social-media clips, mobile footage, downloaded videos, and compressed production material.

### Occlusion-Aware Reconstruction

For watermark and subtitle removal, the model is optimized to reconstruct the hidden visual content behind occluded regions.

Instead of smearing or blurring the target area, it uses surrounding spatial context and temporal cues to infer plausible background structure, object boundaries, lighting, and texture continuity.

### Frequency-Enhanced HD Restoration

For HD enhancement, the model improves perceptual sharpness and fine visual detail through frequency-aware restoration training.

This is especially helpful for recovering:

- hair strands
- fabric texture
- skin detail
- product edges
- background patterns
- typography-like fine structures
- natural image clarity

## 🧠 Inference Notes

- Single-stage inference is recommended for most editing tasks.
- Two-stage refinement can improve visual polish but may weaken task-specific LoRA constraints.
- Watermark and subtitle removal perform best when the occlusion area is stable and not excessively large.
- HD enhancement quality depends on input resolution, motion complexity, and compression level.
- Higher output resolution improves detail but requires more VRAM.
- For strong-motion videos, conservative denoising settings are recommended to preserve temporal structure.
- Frame count should follow the `8k + 1` rule.
- Output height and width should be multiples of `32` in single-stage inference.

## 🏗️ Training

This model family was trained and optimized by **JoyFox Lab** (**Chengdu Xuanhu Technology Co., Ltd.**).

The training pipeline includes:

- task-aware video restoration data construction
- degradation synthesis and curriculum training
- IC-LoRA specialization for four editing directions
- temporal consistency regularization
- occlusion-aware reconstruction training
- high-frequency perceptual enhancement
- instruction-guided video editing optimization

## 📬 Contact

For research collaboration, commercial licensing, or workflow integration, contact:

- `z@vvicat.com`

## 📜 License

Licensed under **Apache 2.0**.

Please also review the license terms of the upstream LTX-2.3 base model when using or redistributing derivative checkpoints.