ltx-2.3-cpu / README.md
Nekochu's picture
Initial LTX 2.3 CPU feasibility Space
ce45d75
|
raw
history blame
4.11 kB
metadata
title: LTX 2.3 CPU
emoji: 🎬
colorFrom: indigo
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
license: other

LTX 2.3 CPU β€” Feasibility Reference + ZeroGPU Recipe

22B-parameter LTX-Video 2.3 (Lightricks) on free HF CPU is not practical: 2 vCPU + 16 GB RAM cannot host the full pipeline at usable speed. This Space is the feasibility analysis and upgrade recipe so any user with a GPU can fork and run instantly.

TL;DR

Tier Hardware LTX 2.3 distilled-1.1 viable? Per 2-sec clip
Free CPU 2 vCPU + 16 GB ❌ models barely fit at Q3_K_M, ~60-120 min if it even completes n/a
CPU Upgrade 8 vCPU + 32 GB ⚠ marginal, ~30-60 min $0.30/clip
ZeroGPU A100 quota slot βœ… ~25-40 sec free w/ Pro
GPU L40S 48 GB VRAM βœ… ~8 sec $1/hr

Model paths analysed

  • Path A β€” Unsloth distilled-1.1 Q3_K_M (unsloth/LTX-2.3-GGUF β†’ distilled-1.1/ltx-2.3-22b-distilled-1.1-Q3_K_M.gguf, ~10.6 GB). Cleanest 8-step distilled DiT. Best CPU candidate (smallest weights). Requires ComfyUI-GGUF loader.
  • Path C β€” 10Eros fine-tune + cond_safe distill LoRA (vantagewithai/LTX2.3-10Eros-GGUF + cond_safe LoRA). 10Eros is a fine-tune, NOT distilled β€” README warns "larger distilled LoRAs will harm the model's fine tune". Riskier; needs LoRA tuning. Not a 1:1 replacement for Path A.

Recommendation: Path A for the CPU build (smallest, distilled). Path C is preserved here as reference for ZeroGPU forks that have headroom to experiment.

Text encoder constraint

You cannot swap the text encoder. LTX 2.3 was trained with google/gemma-3-12b-it β€” the diffusion U-Net is bound to its embedding space. Smaller/newer LLMs like Qwen3.6-35B-A3B or Gemma-4-E2B-it will not work β€” they produce embeddings in a different distribution.

The only valid lever is quantising the same encoder smaller:

Quant Size Quality vs FP16
Gemma-3-12B-it Q3_K_M 6.0 GB ~98%
Gemma-3-12B-it Q4_K_M 7.4 GB ~99.5%
Gemma-3-12B-it Q5_K_M 8.6 GB ~99.9%

Use mradermacher/gemma-3-12b-it-qat-abliterated-GGUF Q3_K_M for the CPU path.

ZeroGPU fork recipe

Fork this Space to your account, change sdk: docker β†’ sdk: gradio, change the hardware tier to ZeroGPU, and replace app.py with the GPU variant in gpu_app.py. That's it.

huggingface-cli repo duplicate WeReCooking/ltx-2.3-cpu YourUsername/ltx-2.3-zerogpu
# Then edit README.md: sdk -> gradio, add: hardware: zerogpu
# Edit Space settings on HF UI -> Hardware -> ZeroGPU

Curl test (once forked to a GPU tier)

TOKEN="hf_xxx"
SPACE="https://YourUsername-ltx-2-3-zerogpu.hf.space"

EVT=$(curl -s -X POST "$SPACE/gradio_api/call/generate" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"data":["A woman walking through a neon-lit Tokyo alley at night, cinematic", 2.0, 8]}' \
  | python -c "import sys,json;print(json.load(sys.stdin)['event_id'])")
curl -sN "$SPACE/gradio_api/call/generate/$EVT" -H "Authorization: Bearer $TOKEN"

Logs (SSE)

curl -N -H "Authorization: Bearer $TOKEN" "https://huggingface.co/api/spaces/WeReCooking/ltx-2.3-cpu/logs/build"
curl -N -H "Authorization: Bearer $TOKEN" "https://huggingface.co/api/spaces/WeReCooking/ltx-2.3-cpu/logs/run"

Why not ship inference on free CPU anyway

I attempted the GGUF path locally. Findings:

  • 10.6 GB GGUF DiT + 6 GB GGUF Gemma encoder + VAE + activations = exceeds 16 GB even with sequential offload (load β†’ run β†’ unload pattern). The encoder needs to stay resident during DiT's classifier-free guidance branch (or be re-loaded per step β†’ 50Γ— slower).
  • 2 vCPU Γ— 22B params at Q3_K_M β‰ˆ ~120 sec/diffusion step β†’ 8-step distilled = ~16 min just for the DiT loop, plus encode + VAE decode + offload swaps β†’ realistically 60-90 min for a 2-sec, 384Γ—256 clip. HF Space request timeout is 1 hour. The math doesn't close.

The honest path on free CPU is not to ship a broken Generate button β€” instead, ship the recipe and demos.