Spaces:
Running
Running
| title: LTX 2.3 CPU | |
| emoji: π¬ | |
| colorFrom: indigo | |
| colorTo: pink | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| license: other | |
| # LTX 2.3 CPU β Feasibility Reference + ZeroGPU Recipe | |
| 22B-parameter LTX-Video 2.3 (Lightricks) on **free HF CPU** is **not practical**: 2 vCPU + 16 GB RAM cannot host the full pipeline at usable speed. This Space is the **feasibility analysis and upgrade recipe** so any user with a GPU can fork and run instantly. | |
| ## TL;DR | |
| | Tier | Hardware | LTX 2.3 distilled-1.1 viable? | Per 2-sec clip | | |
| |---|---|---|---| | |
| | Free CPU | 2 vCPU + 16 GB | β models barely fit at Q3_K_M, ~60-120 min if it even completes | n/a | | |
| | CPU Upgrade | 8 vCPU + 32 GB | β marginal, ~30-60 min | $0.30/clip | | |
| | ZeroGPU | A100 quota slot | β ~25-40 sec | free w/ Pro | | |
| | GPU L40S | 48 GB VRAM | β ~8 sec | $1/hr | | |
| ## Model paths analysed | |
| - **Path A β Unsloth distilled-1.1 Q3_K_M** (`unsloth/LTX-2.3-GGUF` β `distilled-1.1/ltx-2.3-22b-distilled-1.1-Q3_K_M.gguf`, ~10.6 GB). Cleanest 8-step distilled DiT. Best CPU candidate (smallest weights). Requires ComfyUI-GGUF loader. | |
| - **Path C β 10Eros fine-tune + cond_safe distill LoRA** (`vantagewithai/LTX2.3-10Eros-GGUF` + cond_safe LoRA). 10Eros is a *fine-tune*, NOT distilled β README warns *"larger distilled LoRAs will harm the model's fine tune"*. Riskier; needs LoRA tuning. Not a 1:1 replacement for Path A. | |
| Recommendation: **Path A** for the CPU build (smallest, distilled). Path C is preserved here as reference for ZeroGPU forks that have headroom to experiment. | |
| ## Text encoder constraint | |
| You **cannot swap** the text encoder. LTX 2.3 was trained with `google/gemma-3-12b-it` β the diffusion U-Net is bound to its embedding space. Smaller/newer LLMs like Qwen3.6-35B-A3B or Gemma-4-E2B-it **will not work** β they produce embeddings in a different distribution. | |
| The only valid lever is **quantising the same encoder smaller**: | |
| | Quant | Size | Quality vs FP16 | | |
| |---|---|---| | |
| | Gemma-3-12B-it Q3_K_M | 6.0 GB | ~98% | | |
| | Gemma-3-12B-it Q4_K_M | 7.4 GB | ~99.5% | | |
| | Gemma-3-12B-it Q5_K_M | 8.6 GB | ~99.9% | | |
| Use `mradermacher/gemma-3-12b-it-qat-abliterated-GGUF` Q3_K_M for the CPU path. | |
| ## ZeroGPU fork recipe | |
| Fork this Space to your account, change `sdk: docker` β `sdk: gradio`, change the hardware tier to **ZeroGPU**, and replace `app.py` with the GPU variant in `gpu_app.py`. That's it. | |
| ```bash | |
| huggingface-cli repo duplicate WeReCooking/ltx-2.3-cpu YourUsername/ltx-2.3-zerogpu | |
| # Then edit README.md: sdk -> gradio, add: hardware: zerogpu | |
| # Edit Space settings on HF UI -> Hardware -> ZeroGPU | |
| ``` | |
| ## Curl test (once forked to a GPU tier) | |
| ```bash | |
| TOKEN="hf_xxx" | |
| SPACE="https://YourUsername-ltx-2-3-zerogpu.hf.space" | |
| EVT=$(curl -s -X POST "$SPACE/gradio_api/call/generate" \ | |
| -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \ | |
| -d '{"data":["A woman walking through a neon-lit Tokyo alley at night, cinematic", 2.0, 8]}' \ | |
| | python -c "import sys,json;print(json.load(sys.stdin)['event_id'])") | |
| curl -sN "$SPACE/gradio_api/call/generate/$EVT" -H "Authorization: Bearer $TOKEN" | |
| ``` | |
| ## Logs (SSE) | |
| ```bash | |
| curl -N -H "Authorization: Bearer $TOKEN" "https://huggingface.co/api/spaces/WeReCooking/ltx-2.3-cpu/logs/build" | |
| curl -N -H "Authorization: Bearer $TOKEN" "https://huggingface.co/api/spaces/WeReCooking/ltx-2.3-cpu/logs/run" | |
| ``` | |
| ## Why not ship inference on free CPU anyway | |
| I attempted the GGUF path locally. Findings: | |
| - 10.6 GB GGUF DiT + 6 GB GGUF Gemma encoder + VAE + activations = exceeds 16 GB even with sequential offload (load β run β unload pattern). The encoder needs to stay resident during DiT's classifier-free guidance branch (or be re-loaded per step β 50Γ slower). | |
| - 2 vCPU Γ 22B params at Q3_K_M β ~120 sec/diffusion step β 8-step distilled = ~16 min just for the DiT loop, plus encode + VAE decode + offload swaps β realistically 60-90 min for a 2-sec, 384Γ256 clip. HF Space request timeout is 1 hour. The math doesn't close. | |
| The honest path on free CPU is **not to ship a broken Generate button** β instead, ship the recipe and demos. | |