Spaces:
Running on Zero
Running on Zero
File size: 16,813 Bytes
27161b7 c899ca5 27161b7 80f5694 27161b7 01a78fa 12ca777 7bdb311 c53ac67 7bdb311 27161b7 5a81fc9 27161b7 5a81fc9 27161b7 5a81fc9 27161b7 5a81fc9 27161b7 5a81fc9 27161b7 5a81fc9 27161b7 5a81fc9 27161b7 5a81fc9 27161b7 5a81fc9 27161b7 5a81fc9 27161b7 5a81fc9 27161b7 5a81fc9 27161b7 5a81fc9 27161b7 5a81fc9 27161b7 5a81fc9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 | ---
title: LTX 2.3 Studio
emoji: π¬
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: "5.50.0"
app_file: app.py
python_version: "3.11"
suggested_hardware: zero-a10g
hf_oauth: false
preload_from_hub:
- Comfy-Org/ltx-2 split_files/text_encoders/gemma_3_12B_it.safetensors
- Kijai/LTX2.3_comfy diffusion_models/ltx-2.3-22b-dev_transformer_only_bf16.safetensors,loras/ltx-2.3-22b-distilled-lora-dynamic_fro09_avg_rank_105_bf16.safetensors,text_encoders/ltx-2.3_text_projection_bf16.safetensors,vae/LTX23_audio_vae_bf16.safetensors,vae/LTX23_video_vae_bf16.safetensors,vae/taeltx2_3.safetensors
- Lightricks/LTX-2-19b-IC-LoRA-Detailer ltx-2-19b-ic-lora-detailer.safetensors
- Lightricks/LTX-2-19b-LoRA-Camera-Control-Jib-Down ltx-2-19b-lora-camera-control-jib-down.safetensors
- Lightricks/LTX-2-19b-LoRA-Camera-Control-Jib-Up ltx-2-19b-lora-camera-control-jib-up.safetensors
- Lightricks/LTX-2-19b-LoRA-Camera-Control-Static ltx-2-19b-lora-camera-control-static.safetensors
- Lightricks/LTX-2.3 ltx-2.3-22b-distilled-lora-384.safetensors,ltx-2.3-spatial-upscaler-x2-1.0.safetensors
- Lightricks/LTX-2.3-22b-IC-LoRA-Union-Control ltx-2.3-22b-ic-lora-union-control-ref0.5.safetensors
- google/gemma-3-12b-it-qat-q4_0-unquantized gemma-3-12b-it/model-00001-of-00005.safetensors,gemma-3-12b-it/model-00002-of-00005.safetensors,gemma-3-12b-it/model-00003-of-00005.safetensors,gemma-3-12b-it/model-00004-of-00005.safetensors,gemma-3-12b-it/model-00005-of-00005.safetensors,gemma-3-12b-it/model.safetensors.index.json,gemma-3-12b-it/preprocessor_config.json,gemma-3-12b-it/tokenizer.model
---
# LTX 2.3 Studio
A single-process Gradio app that wraps [LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3) β Lightricks' open 22B video generation model β under one focused UI. Six modes (text Β· image Β· audio Β· lipsync Β· keyframe Β· style) sharing the same ComfyUI All-In-One workflow. Runs locally on Apple Silicon (MPS) or NVIDIA (CUDA), deploys to Hugging Face Spaces (ZeroGPU).
[](https://huggingface.co/spaces/techfreakworm/LTX2.3-Studio)
[](https://github.com/techfreakworm/ltx2.3-AIO-generator/stargazers)
[](LICENSE)
[](pyproject.toml)
[](https://github.com/comfyanonymous/ComfyUI)
[](https://huggingface.co/Lightricks/LTX-2.3)
β **Live demo:** https://huggingface.co/spaces/techfreakworm/LTX2.3-Studio
---
## What's inside
Six modes wired through the same ComfyUI All-In-One workflow. Each mode exposes only the inputs it actually consumes β the form stays short and focused.
| Mode | Inputs | Output | Notes |
|---|---|---|---|
| **Text β Video** | Prompt (+ optional audio prompt) | mp4 (+ optional wav) | The core mode. Camera-control LoRAs auto-applied by keyword. |
| **Audio β Video** | Prompt + audio track | mp4 with the input audio preserved | Conditions motion on the audio waveform. |
| **Image β Video** | Image + prompt | mp4 (+ optional audio) | Image-conditioned generation. |
| **Lipsync** | Image + audio | mp4 with audio | Viseme-aligned mouth motion. |
| **Keyframe** | First + last frames + prompt | mp4 | Latent interpolation between two anchors. |
| **Style Transfer** | Source video + style image | mp4 | IC-LoRA restyle; motion preserved from source. |
Every mode carries **Fast / Balanced / Quality** presets (steps Γ 1, Γ 1.5, Γ 3). A per-mode ZeroGPU duration estimator adapts the call timeout to the requested workload.
---
## Quick start (local)
Requires **Python 3.11**, ~80 GB free disk for the weight set, and ~24 GB VRAM (CUDA) or ~32 GB unified memory (Apple Silicon).
```bash
git clone --recurse-submodules https://github.com/techfreakworm/ltx2.3-AIO-generator
cd ltx2.3-AIO-generator
bash setup.sh # creates .venv, installs ComfyUI + pinned custom nodes + app deps
source .venv/bin/activate
python app.py # http://127.0.0.1:7860
```
The first run resolves model weights into your HF cache (`~/.cache/huggingface/hub/`) and symlinks them into `comfyui/models/<comfy_type>/`. Subsequent starts skip the download. Expect ~70 GB of weights pulled on a cold first run.
**Apple Silicon notes.** `PYTORCH_ENABLE_MPS_FALLBACK=1` is set automatically so the few MPS-unsupported ops fall back to CPU. ComfyUI's VRAM autodetect picks the right tier; override with `LTX23_AIO_VRAM=lowvram|normalvram|highvram` if you need to force one.
**LAN access** (phone / tablet on the same WiFi): `python app.py` binds `0.0.0.0:7860`. Visit `http://<your-LAN-IP>:7860` from another device. On macOS, allow inbound for `python` in System Settings β Network β Firewall if the connection refuses.
## Quick start (HF Spaces)
This repo is a Gradio Space. The Pro tier provides ZeroGPU (A10G) access and the per-call duration budget needed for the Balanced and Quality presets.
```bash
git remote add space https://huggingface.co/spaces/<your-handle>/LTX2.3-Studio
git push space master:main # local branch is master; HF Space deploys from main
```
> β The refspec `master:main` matters. The local default branch is `master` (GitHub convention); the HF Space deploys from `main`. A bare `git push space master` creates an orphan remote branch that does NOT trigger a deploy.
The Space's `preload_from_hub` directive (see the YAML at the top of this file) bakes ~111 GB of weights into the build image. `app.py:_bootstrap()` then:
1. Clones ComfyUI + pinned custom nodes into `~/comfyui` on cold start (ZeroGPU container freezes preserve them across calls)
2. Mirrors the read-only preload cache into `~/hf-cache-rw/` β works around the build-user-vs-runtime-user permissions trap (preloaded files are root-owned; we run as uid 1000 and can't write to them, so any lazy download to the cache would fail with `Permission denied`)
3. Stages seed input files into `comfyui/input/` so workflow loaders don't error before any user upload arrives
Subsequent requests hit warm cache β no network traffic on inference 2+.
**ZeroGPU duration estimator.** Each generate call carries a dynamic `@spaces.GPU(duration=N)` calculated from mode, preset, and frame count. Clamped at `[60, 900] s`. On timeout (`"GPU task aborted"`), the handler auto-retries once at 2Γ duration.
---
## Architecture
```
ββββββββββββββββββββββββββββββββββββ
browser βββΆβ app.py β Gradio Blocks β
β header Β· drawer Β· 6 mode tabs β
ββββββββββββββββββββ¬ββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββ
β backend.py β
β ComfyUILibraryBackend β
β @spaces.GPU(duration=callable) β
β calls PromptExecutor directly β
ββββββββββββββββββββ¬ββββββββββββββββ
β
ββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββββ΄βββββββ¬βββββββββββββββββββ
βΌ βΌ βΌ βΌ βΌ
modes.py models.py workflow.py ui.py tools/
per-mode walk + ensure load + patch per-mode form extract_modes.py
parameterize from HF cache API-format JSON builders (regen workflows/)
β
βΌ
ββββββββββββββββββββββββββββββββββββ
β comfyui/ β
β submodule (local) β
β runtime clone at ~/comfyui β
β on HF Spaces β
β β
β βββ custom_nodes/ (pinned SHAs)β
β βββ models/ β HF cache symlinksβ
ββββββββββββββββββββββββββββββββββββ
```
**One backend, one process.** The `@spaces.GPU` decorator is the only divergence between local and Spaces runtime. ComfyUI manages VRAM via its tiered presets β no `empty_cache()` sprinkling needed elsewhere.
**Workflow as data.** Each of the six modes is a user-exported API-format JSON in `workflows/`. The mode handler patches a deep-copied template (`modes.parameterize_fn`) and hands it to ComfyUI's `PromptExecutor`. Updating the master workflow is a three-step ritual: edit in the ComfyUI editor β export β `python tools/extract_modes.py --master ... --out workflows`.
---
## Project layout
```
.
βββ app.py # Gradio Blocks entry, _bootstrap, _on_generate, mode tabs
βββ backend.py # ComfyUILibraryBackend, @spaces.GPU, duration estimator
βββ modes.py # MODE_REGISTRY + per-mode parameterize_fn + node-id constants
βββ models.py # MODEL_REGISTRY, walk_workflow_for_models, ensure_models
βββ ui.py # render_status, _render_idle, mode-form layout primitives
βββ workflow.py # load_template, set_input helpers
βββ workflows/ # API-format mode JSONs (do not hand-edit)
β βββ t2v.json
β βββ i2v.json
β βββ a2v.json
β βββ lipsync.json
β βββ keyframe.json
β βββ style.json
βββ assets/seed_inputs/ # placeholder image / audio / video for cold-start staging
βββ tools/
β βββ extract_modes.py # regenerate workflows/ from a master ComfyUI export
βββ docs/
β βββ future_improvements.md
β βββ superpowers/{specs,plans}/ # spec + implementation plans per feature
βββ tests/ # L1 + L3 in CI; L2 with --comfy-real; L4 GPU smoke
βββ README.md # this file (HF Space YAML + project intro)
βββ CLAUDE.md # project facts + gotchas (what & why)
βββ AGENTS.md # tool-agnostic agent rulebook
βββ SKILLS.md # process / debugging / deployment (how)
βββ requirements.txt # pinned deps
βββ pyproject.toml # ruff + pytest config (py311)
βββ setup.sh # venv + ComfyUI + custom nodes bootstrap
βββ comfyui/ # git submodule (local) / runtime clone target (Spaces)
```
---
## Tech stack
- **[Gradio 5.50](https://gradio.app/)** β UI shell, native components, `gr.Progress(track_tqdm=True)`
- **[ComfyUI](https://github.com/comfyanonymous/ComfyUI)** β library-mode `PromptExecutor` (pinned commit; submodule locally, runtime-cloned on Spaces)
- **[LTX-2.3 22B](https://huggingface.co/Lightricks/LTX-2.3)** by Lightricks β primary diffusion transformer (BF16 weights via [Kijai/LTX2.3_comfy](https://huggingface.co/Kijai/LTX2.3_comfy))
- **[Gemma 3 12B](https://huggingface.co/google/gemma-3-12b-it)** by Google β multimodal text encoder (requires the full 5-shard model β text-only checkpoints crash on meta-tensor allocation in SDPA)
- **Custom nodes** (pinned SHAs in `app.CUSTOM_NODES_PINNED`):
- [Lightricks/ComfyUI-LTXVideo](https://github.com/Lightricks/ComfyUI-LTXVideo) β LTX sampler / decoder nodes
- [kijai/ComfyUI-KJNodes](https://github.com/kijai/ComfyUI-KJNodes) β utility nodes
- [rgthree/rgthree-comfy](https://github.com/rgthree/rgthree-comfy) β Power-Lora-Loader
- [Kosinkadink/ComfyUI-VideoHelperSuite](https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite) β video I/O
- [pythongosssss/ComfyUI-Custom-Scripts](https://github.com/pythongosssss/ComfyUI-Custom-Scripts) β string / dict helpers
- [city96/ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) β GGUF transformer loader
- [Fannovel16/comfyui_controlnet_aux](https://github.com/Fannovel16/comfyui_controlnet_aux) β DWPose for Lipsync/Style preprocessors
- [evanspearman/ComfyMath](https://github.com/evanspearman/ComfyMath) β math nodes for the workflow's keyframe path
- [Smirnov75/ComfyUI-mxToolkit](https://github.com/Smirnov75/ComfyUI-mxToolkit) β utility nodes
- [DoctorDiffusion/ComfyUI-MediaMixer](https://github.com/DoctorDiffusion/ComfyUI-MediaMixer) β `FinalFrameSelector`
- **[HF Spaces ZeroGPU](https://huggingface.co/zero-gpu)** (A10G) β `@spaces.GPU(duration=β¦)` for queue-priority signalling and per-call timeout
---
## Design
Theme: **Topaz Cinema Slate** β slate substrate `#1A1F26`, warm amber accent `#E0A458` used sparingly, IBM Plex Sans throughout. Defined as `_TOPAZ_THEME` + `_CUSTOM_CSS` in `app.py`.
Layout: hamburger drawer. Pinned 220 px sidebar at β₯1024 px (mode buttons + model status + settings); below 1024 px it slides in as a fixed overlay via the `.aio-shell.drawer-open` class. The header carries a live mode tag (T2V/A2V/I2V/LIPSYNC/KEY/STYLE) updated by JS without a server round-trip.
Spec, plan, and design rationale live under `docs/superpowers/specs/` and `docs/superpowers/plans/`.
---
## Notes on running
- **First inference is slow.** Cold-start workflow validation + model load on the active node graph takes ~30 β 90 s. Subsequent calls within the same session reuse loaded models.
- **VRAM tier** is auto-detected; override with `LTX23_AIO_VRAM=lowvram|normalvram|highvram`.
- **ZeroGPU duration cap.** The per-call estimator clamps to `[60, 900] s`. If a generation aborts with `"GPU task aborted"`, the handler retries once at 2Γ duration. The duration field is the queue-priority signal, not a billing cap.
- **Output directory.** Local: `comfyui/output/LTX2.3/`. Spaces: `~/comfyui/output/LTX2.3/`. Both are whitelisted via `allowed_paths=` on launch (Gradio 5 file-access policy).
- **Local LAN testing.** Bound to `0.0.0.0:7860`. macOS firewall: allow inbound for `python` if a connection from your phone refuses.
---
## License
MIT for the AIO app code (see `LICENSE`).
- [ComfyUI](https://github.com/comfyanonymous/ComfyUI) is GPL-3.0.
- LTX-2.3 and Lightricks-published LoRAs / auxiliaries retain Lightricks' open-source licensing β see the individual model cards on Hugging Face.
- Gemma 3 weights are subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
- Each pinned custom node retains its own license; see the linked repositories.
## Credits
- **LTX-2.3** by [Lightricks](https://github.com/Lightricks)
- **ComfyUI** by [comfyanonymous](https://github.com/comfyanonymous)
- **Gemma 3** by [Google DeepMind](https://github.com/google-deepmind)
- **All-In-One ComfyUI workflow** that this app wraps β by [Danielle Falco](https://www.youtube.com/@FutuTek) (FutuTek)
- **Workflow nodes** by Lightricks, [kijai](https://github.com/kijai), [rgthree](https://github.com/rgthree), [Kosinkadink](https://github.com/Kosinkadink), [pythongosssss](https://github.com/pythongosssss), [city96](https://github.com/city96), [Fannovel16](https://github.com/Fannovel16), [evanspearman](https://github.com/evanspearman), [Smirnov75](https://github.com/Smirnov75), [DoctorDiffusion](https://github.com/DoctorDiffusion)
Built by [@techfreakworm](https://huggingface.co/techfreakworm) β drop a β₯ on the [Space](https://huggingface.co/spaces/techfreakworm/LTX2.3-Studio) if it's useful, and follow there for what's next.
|