GuardSkill commited on
Commit
3502451
·
verified ·
1 Parent(s): cb9712d

Add updated model card

Browse files
Files changed (1) hide show
  1. README.md +135 -0
README.md ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ library_name: other
7
+ tags:
8
+ - video-generation
9
+ - video-editing
10
+ - ltx-video
11
+ - lora
12
+ - safetensors
13
+ - image-to-video
14
+ - watermark-removal
15
+ - super-resolution
16
+ pipeline_tag: image-to-video
17
+ ---
18
+
19
+ # LTX2.3-ICEdit-Insight
20
+
21
+ Research-oriented model release for task-aware video restoration and editing under the `LTX-2.3` framework.
22
+
23
+ This repository contains:
24
+
25
+ - `ltx-2.3-edit-insight-dev-fp8.safetensors`
26
+ - the all-in-one Insight checkpoint
27
+ - includes transformer + video VAE + audio VAE + text projection + vocoder
28
+ - `ltx2.3-video-upscale-v2.safetensors`
29
+ - IC-LoRA for video super-resolution and detail recovery
30
+ - `ltx2.3-ic-watermarkeRM.safetensors`
31
+ - IC-LoRA for video watermark removal and occlusion restoration
32
+
33
+ These weights are intended to be used with the project's `run_pipeline.py` workflow. The recommended default is single-stage inference, where the IC-LoRA guidance remains active through the full-resolution denoising pass.
34
+
35
+ ## Research Positioning
36
+
37
+ `ltx-2.3-edit-insight-dev-fp8.safetensors` is not presented as a bare deployment checkpoint. It is the unified base model release for the Insight branch of this project: a task-aware spatiotemporal editing backbone that consolidates the diffusion transformer, video VAE, audio VAE, text projection module, and vocoder into a single reproducible artifact.
38
+
39
+ From a research perspective, the checkpoint is intended to support controlled video restoration and editing under a shared latent diffusion formulation. The paired IC-LoRA adapters specialize the backbone toward structure-preserving super-resolution and watermark-aware content recovery, while the unified checkpoint packaging keeps the full generative stack aligned for repeatable experiments and downstream ablations.
40
+
41
+ ## 中文说明
42
+
43
+ 这是当前项目使用的 Hugging Face 模型仓库,包含一个 Insight 一体化基模和两个任务型 IC-LoRA:
44
+
45
+ - 超分增强:`ltx2.3-video-upscale-v2.safetensors`
46
+ - 视频去水印:`ltx2.3-ic-watermarkeRM.safetensors`
47
+ - Insight 基模:`ltx-2.3-edit-insight-dev-fp8.safetensors`
48
+
49
+ 整体定位是面向视频超分、去水印和细节恢复的统一编辑框架。项目在 `LTX-2.3` 基础上使用任务感知型 IC-LoRA,并通过参考 latent 条件引导来增强结构恢复与纹理细节控制。
50
+
51
+ 其中 `ltx-2.3-edit-insight-dev-fp8.safetensors` 并不是普通的推理底模打包,而是本项目 Insight 分支的统一研究型基模发布。它把扩散 transformer、video VAE、audio VAE、text projection 与 vocoder 组织为单一 all-in-one checkpoint,用来支撑结构保持型视频修复、细节重建与任务定向编辑的可复现实验设置。
52
+
53
+ 当前推荐用法:
54
+
55
+ - 使用本项目的 `run_pipeline.py`
56
+ - 默认使用单阶段推理
57
+ - 按任务切换 LoRA,而不是把两个 LoRA 同时叠加
58
+
59
+ ## English Overview
60
+
61
+ This package is built for the Insight version of the project's LTX-2.3 editing pipeline. Instead of shipping only task adapters, it also includes the corresponding Insight base checkpoint so the workflow can be reproduced with the exact model assets used by the project.
62
+
63
+ Recommended usage:
64
+
65
+ - run the companion `run_pipeline.py`
66
+ - keep single-stage inference enabled by default
67
+ - load one task LoRA at a time depending on the editing goal
68
+
69
+ ## Files
70
+
71
+ | File | Purpose |
72
+ | --- | --- |
73
+ | `ltx-2.3-edit-insight-dev-fp8.safetensors` | All-in-one Insight base checkpoint |
74
+ | `ltx2.3-video-upscale-v2.safetensors` | Super-resolution / detail enhancement IC-LoRA |
75
+ | `ltx2.3-ic-watermarkeRM.safetensors` | Watermark removal / occlusion restoration IC-LoRA |
76
+ | `assets/effects/output_004.webp` | Effect preview |
77
+ | `assets/effects/output_005.webp` | Effect preview |
78
+
79
+ ## Super-Resolution Showcase
80
+
81
+ The following previews are included directly from the current project assets.
82
+
83
+ <table>
84
+ <tr>
85
+ <td align="center"><img src="./assets/effects/output_004.webp" alt="Super-resolution preview 1" width="600"/></td>
86
+ <td align="center"><img src="./assets/effects/output_005.webp" alt="Super-resolution preview 2" width="600"/></td>
87
+ </tr>
88
+ </table>
89
+
90
+ ## Usage With This Project
91
+
92
+ Run all commands from the project root.
93
+
94
+ ### Super-resolution
95
+
96
+ ```bash
97
+ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
98
+ python run_pipeline.py \
99
+ --mode upscale \
100
+ --video ./inputs/input.mp4 \
101
+ --prompt "Convert the video to ultra-high definition quality, rebuilding high-frequency details while eliminating artifacts." \
102
+ --output ./outputs/output_upscale.mp4 \
103
+ --height 1184 --width 704 --num-frames 97 \
104
+ --fps 24.0 --seed 42 \
105
+ --sigma-profile workflow \
106
+ --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
107
+ --lora ./models/loras/ltx2.3-train/ltx2.3-video-upscale-v2.safetensors
108
+ ```
109
+
110
+ ### Watermark removal
111
+
112
+ ```bash
113
+ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
114
+ python run_pipeline.py \
115
+ --mode watermark_rm \
116
+ --video ./inputs/input.mp4 \
117
+ --prompt "Remove short-video platform watermarks and related occlusions from the video, restoring a clean, clear, and natural original image." \
118
+ --output ./outputs/output_watermark_rm.mp4 \
119
+ --height 1184 --width 704 --num-frames 97 \
120
+ --fps 24.0 --seed 1546 \
121
+ --sigma-profile workflow \
122
+ --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
123
+ --lora ./models/loras/ltx2.3-train/ltx2.3-ic-watermarkeRM.safetensors
124
+ ```
125
+
126
+ ## Notes
127
+
128
+ - Single-stage inference is the default recommendation.
129
+ - In two-stage mode, the second-stage refinement does not keep the IC-LoRA constraint, which can increase content drift.
130
+ - Frame count must satisfy `8k + 1`.
131
+ - Single-stage output height and width should be multiples of `32`.
132
+
133
+ ## License
134
+
135
+ This repository is released under the Apache 2.0 license.