File size: 16,813 Bytes
27161b7
c899ca5
27161b7
 
 
 
80f5694
27161b7
 
01a78fa
12ca777
7bdb311
 
 
 
 
 
 
c53ac67
7bdb311
 
27161b7
 
5a81fc9
27161b7
5a81fc9
27161b7
5a81fc9
 
 
 
 
 
27161b7
5a81fc9
27161b7
5a81fc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27161b7
5a81fc9
 
 
27161b7
 
5a81fc9
27161b7
5a81fc9
27161b7
5a81fc9
27161b7
 
5a81fc9
 
 
 
 
27161b7
5a81fc9
27161b7
5a81fc9
27161b7
 
5a81fc9
 
27161b7
 
5a81fc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27161b7
 
5a81fc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
---
title: LTX 2.3 Studio
emoji: 🎬
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: "5.50.0"
app_file: app.py
python_version: "3.11"
suggested_hardware: zero-a10g
hf_oauth: false
preload_from_hub:
  - Comfy-Org/ltx-2 split_files/text_encoders/gemma_3_12B_it.safetensors
  - Kijai/LTX2.3_comfy diffusion_models/ltx-2.3-22b-dev_transformer_only_bf16.safetensors,loras/ltx-2.3-22b-distilled-lora-dynamic_fro09_avg_rank_105_bf16.safetensors,text_encoders/ltx-2.3_text_projection_bf16.safetensors,vae/LTX23_audio_vae_bf16.safetensors,vae/LTX23_video_vae_bf16.safetensors,vae/taeltx2_3.safetensors
  - Lightricks/LTX-2-19b-IC-LoRA-Detailer ltx-2-19b-ic-lora-detailer.safetensors
  - Lightricks/LTX-2-19b-LoRA-Camera-Control-Jib-Down ltx-2-19b-lora-camera-control-jib-down.safetensors
  - Lightricks/LTX-2-19b-LoRA-Camera-Control-Jib-Up ltx-2-19b-lora-camera-control-jib-up.safetensors
  - Lightricks/LTX-2-19b-LoRA-Camera-Control-Static ltx-2-19b-lora-camera-control-static.safetensors
  - Lightricks/LTX-2.3 ltx-2.3-22b-distilled-lora-384.safetensors,ltx-2.3-spatial-upscaler-x2-1.0.safetensors
  - Lightricks/LTX-2.3-22b-IC-LoRA-Union-Control ltx-2.3-22b-ic-lora-union-control-ref0.5.safetensors
  - google/gemma-3-12b-it-qat-q4_0-unquantized gemma-3-12b-it/model-00001-of-00005.safetensors,gemma-3-12b-it/model-00002-of-00005.safetensors,gemma-3-12b-it/model-00003-of-00005.safetensors,gemma-3-12b-it/model-00004-of-00005.safetensors,gemma-3-12b-it/model-00005-of-00005.safetensors,gemma-3-12b-it/model.safetensors.index.json,gemma-3-12b-it/preprocessor_config.json,gemma-3-12b-it/tokenizer.model
---

# LTX 2.3 Studio

A single-process Gradio app that wraps [LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3) β€” Lightricks' open 22B video generation model β€” under one focused UI. Six modes (text Β· image Β· audio Β· lipsync Β· keyframe Β· style) sharing the same ComfyUI All-In-One workflow. Runs locally on Apple Silicon (MPS) or NVIDIA (CUDA), deploys to Hugging Face Spaces (ZeroGPU).

[![Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Spaces-Live-E0A458?style=flat-square)](https://huggingface.co/spaces/techfreakworm/LTX2.3-Studio)
[![GitHub stars](https://img.shields.io/github/stars/techfreakworm/ltx2.3-AIO-generator?style=flat-square&color=E0A458)](https://github.com/techfreakworm/ltx2.3-AIO-generator/stargazers)
[![License: MIT](https://img.shields.io/badge/License-MIT-E0A458?style=flat-square)](LICENSE)
[![Python 3.11](https://img.shields.io/badge/Python-3.11-E0A458?style=flat-square&logo=python&logoColor=white)](pyproject.toml)
[![Powered by ComfyUI](https://img.shields.io/badge/backend-ComfyUI-E0A458?style=flat-square)](https://github.com/comfyanonymous/ComfyUI)
[![Built on LTX-2.3](https://img.shields.io/badge/model-LTX--2.3%2022B-E0A458?style=flat-square)](https://huggingface.co/Lightricks/LTX-2.3)

β†’ **Live demo:** https://huggingface.co/spaces/techfreakworm/LTX2.3-Studio

---

## What's inside

Six modes wired through the same ComfyUI All-In-One workflow. Each mode exposes only the inputs it actually consumes β€” the form stays short and focused.

| Mode | Inputs | Output | Notes |
|---|---|---|---|
| **Text β†’ Video** | Prompt (+ optional audio prompt) | mp4 (+ optional wav) | The core mode. Camera-control LoRAs auto-applied by keyword. |
| **Audio β†’ Video** | Prompt + audio track | mp4 with the input audio preserved | Conditions motion on the audio waveform. |
| **Image β†’ Video** | Image + prompt | mp4 (+ optional audio) | Image-conditioned generation. |
| **Lipsync** | Image + audio | mp4 with audio | Viseme-aligned mouth motion. |
| **Keyframe** | First + last frames + prompt | mp4 | Latent interpolation between two anchors. |
| **Style Transfer** | Source video + style image | mp4 | IC-LoRA restyle; motion preserved from source. |

Every mode carries **Fast / Balanced / Quality** presets (steps Γ— 1, Γ— 1.5, Γ— 3). A per-mode ZeroGPU duration estimator adapts the call timeout to the requested workload.

---

## Quick start (local)

Requires **Python 3.11**, ~80 GB free disk for the weight set, and ~24 GB VRAM (CUDA) or ~32 GB unified memory (Apple Silicon).

```bash
git clone --recurse-submodules https://github.com/techfreakworm/ltx2.3-AIO-generator
cd ltx2.3-AIO-generator
bash setup.sh           # creates .venv, installs ComfyUI + pinned custom nodes + app deps
source .venv/bin/activate
python app.py           # http://127.0.0.1:7860
```

The first run resolves model weights into your HF cache (`~/.cache/huggingface/hub/`) and symlinks them into `comfyui/models/<comfy_type>/`. Subsequent starts skip the download. Expect ~70 GB of weights pulled on a cold first run.

**Apple Silicon notes.** `PYTORCH_ENABLE_MPS_FALLBACK=1` is set automatically so the few MPS-unsupported ops fall back to CPU. ComfyUI's VRAM autodetect picks the right tier; override with `LTX23_AIO_VRAM=lowvram|normalvram|highvram` if you need to force one.

**LAN access** (phone / tablet on the same WiFi): `python app.py` binds `0.0.0.0:7860`. Visit `http://<your-LAN-IP>:7860` from another device. On macOS, allow inbound for `python` in System Settings β†’ Network β†’ Firewall if the connection refuses.

## Quick start (HF Spaces)

This repo is a Gradio Space. The Pro tier provides ZeroGPU (A10G) access and the per-call duration budget needed for the Balanced and Quality presets.

```bash
git remote add space https://huggingface.co/spaces/<your-handle>/LTX2.3-Studio
git push space master:main       # local branch is master; HF Space deploys from main
```

> ⚠ The refspec `master:main` matters. The local default branch is `master` (GitHub convention); the HF Space deploys from `main`. A bare `git push space master` creates an orphan remote branch that does NOT trigger a deploy.

The Space's `preload_from_hub` directive (see the YAML at the top of this file) bakes ~111 GB of weights into the build image. `app.py:_bootstrap()` then:

1. Clones ComfyUI + pinned custom nodes into `~/comfyui` on cold start (ZeroGPU container freezes preserve them across calls)
2. Mirrors the read-only preload cache into `~/hf-cache-rw/` β€” works around the build-user-vs-runtime-user permissions trap (preloaded files are root-owned; we run as uid 1000 and can't write to them, so any lazy download to the cache would fail with `Permission denied`)
3. Stages seed input files into `comfyui/input/` so workflow loaders don't error before any user upload arrives

Subsequent requests hit warm cache β€” no network traffic on inference 2+.

**ZeroGPU duration estimator.** Each generate call carries a dynamic `@spaces.GPU(duration=N)` calculated from mode, preset, and frame count. Clamped at `[60, 900] s`. On timeout (`"GPU task aborted"`), the handler auto-retries once at 2Γ— duration.

---

## Architecture

```
                                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                          browser ──▢│   app.py β€” Gradio Blocks         β”‚
                                     β”‚   header Β· drawer Β· 6 mode tabs  β”‚
                                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                        β”‚
                                                        β–Ό
                                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                     β”‚   backend.py                     β”‚
                                     β”‚   ComfyUILibraryBackend          β”‚
                                     β”‚   @spaces.GPU(duration=callable) β”‚
                                     β”‚   calls PromptExecutor directly  β”‚
                                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                        β”‚
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β–Ό              β–Ό              β–Ό                         β–Ό                  β–Ό
   modes.py       models.py      workflow.py                ui.py              tools/
   per-mode       walk + ensure  load + patch               per-mode form      extract_modes.py
   parameterize   from HF cache  API-format JSON            builders           (regen workflows/)
                                                        β”‚
                                                        β–Ό
                                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                     β”‚   comfyui/                       β”‚
                                     β”‚   submodule (local)              β”‚
                                     β”‚   runtime clone at ~/comfyui     β”‚
                                     β”‚   on HF Spaces                   β”‚
                                     β”‚                                  β”‚
                                     β”‚   β”œβ”€β”€ custom_nodes/ (pinned SHAs)β”‚
                                     β”‚   └── models/ β†’ HF cache symlinksβ”‚
                                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

**One backend, one process.** The `@spaces.GPU` decorator is the only divergence between local and Spaces runtime. ComfyUI manages VRAM via its tiered presets β€” no `empty_cache()` sprinkling needed elsewhere.

**Workflow as data.** Each of the six modes is a user-exported API-format JSON in `workflows/`. The mode handler patches a deep-copied template (`modes.parameterize_fn`) and hands it to ComfyUI's `PromptExecutor`. Updating the master workflow is a three-step ritual: edit in the ComfyUI editor β†’ export β†’ `python tools/extract_modes.py --master ... --out workflows`.

---

## Project layout

```
.
β”œβ”€β”€ app.py              # Gradio Blocks entry, _bootstrap, _on_generate, mode tabs
β”œβ”€β”€ backend.py          # ComfyUILibraryBackend, @spaces.GPU, duration estimator
β”œβ”€β”€ modes.py            # MODE_REGISTRY + per-mode parameterize_fn + node-id constants
β”œβ”€β”€ models.py           # MODEL_REGISTRY, walk_workflow_for_models, ensure_models
β”œβ”€β”€ ui.py               # render_status, _render_idle, mode-form layout primitives
β”œβ”€β”€ workflow.py         # load_template, set_input helpers
β”œβ”€β”€ workflows/          # API-format mode JSONs (do not hand-edit)
β”‚   β”œβ”€β”€ t2v.json
β”‚   β”œβ”€β”€ i2v.json
β”‚   β”œβ”€β”€ a2v.json
β”‚   β”œβ”€β”€ lipsync.json
β”‚   β”œβ”€β”€ keyframe.json
β”‚   └── style.json
β”œβ”€β”€ assets/seed_inputs/ # placeholder image / audio / video for cold-start staging
β”œβ”€β”€ tools/
β”‚   └── extract_modes.py  # regenerate workflows/ from a master ComfyUI export
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ future_improvements.md
β”‚   └── superpowers/{specs,plans}/  # spec + implementation plans per feature
β”œβ”€β”€ tests/              # L1 + L3 in CI; L2 with --comfy-real; L4 GPU smoke
β”œβ”€β”€ README.md           # this file (HF Space YAML + project intro)
β”œβ”€β”€ CLAUDE.md           # project facts + gotchas (what & why)
β”œβ”€β”€ AGENTS.md           # tool-agnostic agent rulebook
β”œβ”€β”€ SKILLS.md           # process / debugging / deployment (how)
β”œβ”€β”€ requirements.txt    # pinned deps
β”œβ”€β”€ pyproject.toml      # ruff + pytest config (py311)
β”œβ”€β”€ setup.sh            # venv + ComfyUI + custom nodes bootstrap
└── comfyui/            # git submodule (local) / runtime clone target (Spaces)
```

---

## Tech stack

- **[Gradio 5.50](https://gradio.app/)** β€” UI shell, native components, `gr.Progress(track_tqdm=True)`
- **[ComfyUI](https://github.com/comfyanonymous/ComfyUI)** β€” library-mode `PromptExecutor` (pinned commit; submodule locally, runtime-cloned on Spaces)
- **[LTX-2.3 22B](https://huggingface.co/Lightricks/LTX-2.3)** by Lightricks β€” primary diffusion transformer (BF16 weights via [Kijai/LTX2.3_comfy](https://huggingface.co/Kijai/LTX2.3_comfy))
- **[Gemma 3 12B](https://huggingface.co/google/gemma-3-12b-it)** by Google β€” multimodal text encoder (requires the full 5-shard model β€” text-only checkpoints crash on meta-tensor allocation in SDPA)
- **Custom nodes** (pinned SHAs in `app.CUSTOM_NODES_PINNED`):
  - [Lightricks/ComfyUI-LTXVideo](https://github.com/Lightricks/ComfyUI-LTXVideo) β€” LTX sampler / decoder nodes
  - [kijai/ComfyUI-KJNodes](https://github.com/kijai/ComfyUI-KJNodes) β€” utility nodes
  - [rgthree/rgthree-comfy](https://github.com/rgthree/rgthree-comfy) β€” Power-Lora-Loader
  - [Kosinkadink/ComfyUI-VideoHelperSuite](https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite) β€” video I/O
  - [pythongosssss/ComfyUI-Custom-Scripts](https://github.com/pythongosssss/ComfyUI-Custom-Scripts) β€” string / dict helpers
  - [city96/ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) β€” GGUF transformer loader
  - [Fannovel16/comfyui_controlnet_aux](https://github.com/Fannovel16/comfyui_controlnet_aux) β€” DWPose for Lipsync/Style preprocessors
  - [evanspearman/ComfyMath](https://github.com/evanspearman/ComfyMath) β€” math nodes for the workflow's keyframe path
  - [Smirnov75/ComfyUI-mxToolkit](https://github.com/Smirnov75/ComfyUI-mxToolkit) β€” utility nodes
  - [DoctorDiffusion/ComfyUI-MediaMixer](https://github.com/DoctorDiffusion/ComfyUI-MediaMixer) β€” `FinalFrameSelector`
- **[HF Spaces ZeroGPU](https://huggingface.co/zero-gpu)** (A10G) β€” `@spaces.GPU(duration=…)` for queue-priority signalling and per-call timeout

---

## Design

Theme: **Topaz Cinema Slate** β€” slate substrate `#1A1F26`, warm amber accent `#E0A458` used sparingly, IBM Plex Sans throughout. Defined as `_TOPAZ_THEME` + `_CUSTOM_CSS` in `app.py`.

Layout: hamburger drawer. Pinned 220 px sidebar at β‰₯1024 px (mode buttons + model status + settings); below 1024 px it slides in as a fixed overlay via the `.aio-shell.drawer-open` class. The header carries a live mode tag (T2V/A2V/I2V/LIPSYNC/KEY/STYLE) updated by JS without a server round-trip.

Spec, plan, and design rationale live under `docs/superpowers/specs/` and `docs/superpowers/plans/`.

---

## Notes on running

- **First inference is slow.** Cold-start workflow validation + model load on the active node graph takes ~30 – 90 s. Subsequent calls within the same session reuse loaded models.
- **VRAM tier** is auto-detected; override with `LTX23_AIO_VRAM=lowvram|normalvram|highvram`.
- **ZeroGPU duration cap.** The per-call estimator clamps to `[60, 900] s`. If a generation aborts with `"GPU task aborted"`, the handler retries once at 2Γ— duration. The duration field is the queue-priority signal, not a billing cap.
- **Output directory.** Local: `comfyui/output/LTX2.3/`. Spaces: `~/comfyui/output/LTX2.3/`. Both are whitelisted via `allowed_paths=` on launch (Gradio 5 file-access policy).
- **Local LAN testing.** Bound to `0.0.0.0:7860`. macOS firewall: allow inbound for `python` if a connection from your phone refuses.

---

## License

MIT for the AIO app code (see `LICENSE`).

- [ComfyUI](https://github.com/comfyanonymous/ComfyUI) is GPL-3.0.
- LTX-2.3 and Lightricks-published LoRAs / auxiliaries retain Lightricks' open-source licensing β€” see the individual model cards on Hugging Face.
- Gemma 3 weights are subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
- Each pinned custom node retains its own license; see the linked repositories.

## Credits

- **LTX-2.3** by [Lightricks](https://github.com/Lightricks)
- **ComfyUI** by [comfyanonymous](https://github.com/comfyanonymous)
- **Gemma 3** by [Google DeepMind](https://github.com/google-deepmind)
- **All-In-One ComfyUI workflow** that this app wraps β€” by [Danielle Falco](https://www.youtube.com/@FutuTek) (FutuTek)
- **Workflow nodes** by Lightricks, [kijai](https://github.com/kijai), [rgthree](https://github.com/rgthree), [Kosinkadink](https://github.com/Kosinkadink), [pythongosssss](https://github.com/pythongosssss), [city96](https://github.com/city96), [Fannovel16](https://github.com/Fannovel16), [evanspearman](https://github.com/evanspearman), [Smirnov75](https://github.com/Smirnov75), [DoctorDiffusion](https://github.com/DoctorDiffusion)

Built by [@techfreakworm](https://huggingface.co/techfreakworm) β€” drop a β™₯ on the [Space](https://huggingface.co/spaces/techfreakworm/LTX2.3-Studio) if it's useful, and follow there for what's next.