Add updated model card
Browse files
README.md
ADDED
|
@@ -0,0 +1,135 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
- zh
|
| 6 |
+
library_name: other
|
| 7 |
+
tags:
|
| 8 |
+
- video-generation
|
| 9 |
+
- video-editing
|
| 10 |
+
- ltx-video
|
| 11 |
+
- lora
|
| 12 |
+
- safetensors
|
| 13 |
+
- image-to-video
|
| 14 |
+
- watermark-removal
|
| 15 |
+
- super-resolution
|
| 16 |
+
pipeline_tag: image-to-video
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
# LTX2.3-ICEdit-Insight
|
| 20 |
+
|
| 21 |
+
Research-oriented model release for task-aware video restoration and editing under the `LTX-2.3` framework.
|
| 22 |
+
|
| 23 |
+
This repository contains:
|
| 24 |
+
|
| 25 |
+
- `ltx-2.3-edit-insight-dev-fp8.safetensors`
|
| 26 |
+
- the all-in-one Insight checkpoint
|
| 27 |
+
- includes transformer + video VAE + audio VAE + text projection + vocoder
|
| 28 |
+
- `ltx2.3-video-upscale-v2.safetensors`
|
| 29 |
+
- IC-LoRA for video super-resolution and detail recovery
|
| 30 |
+
- `ltx2.3-ic-watermarkeRM.safetensors`
|
| 31 |
+
- IC-LoRA for video watermark removal and occlusion restoration
|
| 32 |
+
|
| 33 |
+
These weights are intended to be used with the project's `run_pipeline.py` workflow. The recommended default is single-stage inference, where the IC-LoRA guidance remains active through the full-resolution denoising pass.
|
| 34 |
+
|
| 35 |
+
## Research Positioning
|
| 36 |
+
|
| 37 |
+
`ltx-2.3-edit-insight-dev-fp8.safetensors` is not presented as a bare deployment checkpoint. It is the unified base model release for the Insight branch of this project: a task-aware spatiotemporal editing backbone that consolidates the diffusion transformer, video VAE, audio VAE, text projection module, and vocoder into a single reproducible artifact.
|
| 38 |
+
|
| 39 |
+
From a research perspective, the checkpoint is intended to support controlled video restoration and editing under a shared latent diffusion formulation. The paired IC-LoRA adapters specialize the backbone toward structure-preserving super-resolution and watermark-aware content recovery, while the unified checkpoint packaging keeps the full generative stack aligned for repeatable experiments and downstream ablations.
|
| 40 |
+
|
| 41 |
+
## 中文说明
|
| 42 |
+
|
| 43 |
+
这是当前项目使用的 Hugging Face 模型仓库,包含一个 Insight 一体化基模和两个任务型 IC-LoRA:
|
| 44 |
+
|
| 45 |
+
- 超分增强:`ltx2.3-video-upscale-v2.safetensors`
|
| 46 |
+
- 视频去水印:`ltx2.3-ic-watermarkeRM.safetensors`
|
| 47 |
+
- Insight 基模:`ltx-2.3-edit-insight-dev-fp8.safetensors`
|
| 48 |
+
|
| 49 |
+
整体定位是面向视频超分、去水印和细节恢复的统一编辑框架。项目在 `LTX-2.3` 基础上使用任务感知型 IC-LoRA,并通过参考 latent 条件引导来增强结构恢复与纹理细节控制。
|
| 50 |
+
|
| 51 |
+
其中 `ltx-2.3-edit-insight-dev-fp8.safetensors` 并不是普通的推理底模打包,而是本项目 Insight 分支的统一研究型基模发布。它把扩散 transformer、video VAE、audio VAE、text projection 与 vocoder 组织为单一 all-in-one checkpoint,用来支撑结构保持型视频修复、细节重建与任务定向编辑的可复现实验设置。
|
| 52 |
+
|
| 53 |
+
当前推荐用法:
|
| 54 |
+
|
| 55 |
+
- 使用本项目的 `run_pipeline.py`
|
| 56 |
+
- 默认使用单阶段推理
|
| 57 |
+
- 按任务切换 LoRA,而不是把两个 LoRA 同时叠加
|
| 58 |
+
|
| 59 |
+
## English Overview
|
| 60 |
+
|
| 61 |
+
This package is built for the Insight version of the project's LTX-2.3 editing pipeline. Instead of shipping only task adapters, it also includes the corresponding Insight base checkpoint so the workflow can be reproduced with the exact model assets used by the project.
|
| 62 |
+
|
| 63 |
+
Recommended usage:
|
| 64 |
+
|
| 65 |
+
- run the companion `run_pipeline.py`
|
| 66 |
+
- keep single-stage inference enabled by default
|
| 67 |
+
- load one task LoRA at a time depending on the editing goal
|
| 68 |
+
|
| 69 |
+
## Files
|
| 70 |
+
|
| 71 |
+
| File | Purpose |
|
| 72 |
+
| --- | --- |
|
| 73 |
+
| `ltx-2.3-edit-insight-dev-fp8.safetensors` | All-in-one Insight base checkpoint |
|
| 74 |
+
| `ltx2.3-video-upscale-v2.safetensors` | Super-resolution / detail enhancement IC-LoRA |
|
| 75 |
+
| `ltx2.3-ic-watermarkeRM.safetensors` | Watermark removal / occlusion restoration IC-LoRA |
|
| 76 |
+
| `assets/effects/output_004.webp` | Effect preview |
|
| 77 |
+
| `assets/effects/output_005.webp` | Effect preview |
|
| 78 |
+
|
| 79 |
+
## Super-Resolution Showcase
|
| 80 |
+
|
| 81 |
+
The following previews are included directly from the current project assets.
|
| 82 |
+
|
| 83 |
+
<table>
|
| 84 |
+
<tr>
|
| 85 |
+
<td align="center"><img src="./assets/effects/output_004.webp" alt="Super-resolution preview 1" width="600"/></td>
|
| 86 |
+
<td align="center"><img src="./assets/effects/output_005.webp" alt="Super-resolution preview 2" width="600"/></td>
|
| 87 |
+
</tr>
|
| 88 |
+
</table>
|
| 89 |
+
|
| 90 |
+
## Usage With This Project
|
| 91 |
+
|
| 92 |
+
Run all commands from the project root.
|
| 93 |
+
|
| 94 |
+
### Super-resolution
|
| 95 |
+
|
| 96 |
+
```bash
|
| 97 |
+
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
|
| 98 |
+
python run_pipeline.py \
|
| 99 |
+
--mode upscale \
|
| 100 |
+
--video ./inputs/input.mp4 \
|
| 101 |
+
--prompt "Convert the video to ultra-high definition quality, rebuilding high-frequency details while eliminating artifacts." \
|
| 102 |
+
--output ./outputs/output_upscale.mp4 \
|
| 103 |
+
--height 1184 --width 704 --num-frames 97 \
|
| 104 |
+
--fps 24.0 --seed 42 \
|
| 105 |
+
--sigma-profile workflow \
|
| 106 |
+
--model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
|
| 107 |
+
--lora ./models/loras/ltx2.3-train/ltx2.3-video-upscale-v2.safetensors
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
### Watermark removal
|
| 111 |
+
|
| 112 |
+
```bash
|
| 113 |
+
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
|
| 114 |
+
python run_pipeline.py \
|
| 115 |
+
--mode watermark_rm \
|
| 116 |
+
--video ./inputs/input.mp4 \
|
| 117 |
+
--prompt "Remove short-video platform watermarks and related occlusions from the video, restoring a clean, clear, and natural original image." \
|
| 118 |
+
--output ./outputs/output_watermark_rm.mp4 \
|
| 119 |
+
--height 1184 --width 704 --num-frames 97 \
|
| 120 |
+
--fps 24.0 --seed 1546 \
|
| 121 |
+
--sigma-profile workflow \
|
| 122 |
+
--model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
|
| 123 |
+
--lora ./models/loras/ltx2.3-train/ltx2.3-ic-watermarkeRM.safetensors
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
## Notes
|
| 127 |
+
|
| 128 |
+
- Single-stage inference is the default recommendation.
|
| 129 |
+
- In two-stage mode, the second-stage refinement does not keep the IC-LoRA constraint, which can increase content drift.
|
| 130 |
+
- Frame count must satisfy `8k + 1`.
|
| 131 |
+
- Single-stage output height and width should be multiples of `32`.
|
| 132 |
+
|
| 133 |
+
## License
|
| 134 |
+
|
| 135 |
+
This repository is released under the Apache 2.0 license.
|