Spaces:

WeReCooking
/

ltx-2.3-cpu

Running

App Files Files Community

ltx-2.3-cpu / README.md

Nekochu

Initial LTX 2.3 CPU feasibility Space

ce45d75 11 days ago

preview code

raw

history blame

4.11 kB

	---
	title: LTX 2.3 CPU
	emoji: 🎬
	colorFrom: indigo
	colorTo: pink
	sdk: docker
	app_port: 7860
	pinned: false
	license: other
	---

	# LTX 2.3 CPU — Feasibility Reference + ZeroGPU Recipe

	22B-parameter LTX-Video 2.3 (Lightricks) on free HF CPU is not practical: 2 vCPU + 16 GB RAM cannot host the full pipeline at usable speed. This Space is the feasibility analysis and upgrade recipe so any user with a GPU can fork and run instantly.

	## TL;DR

	\| Tier \| Hardware \| LTX 2.3 distilled-1.1 viable? \| Per 2-sec clip \|
	\|---\|---\|---\|---\|
	\| Free CPU \| 2 vCPU + 16 GB \| ❌ models barely fit at Q3_K_M, ~60-120 min if it even completes \| n/a \|
	\| CPU Upgrade \| 8 vCPU + 32 GB \| ⚠ marginal, ~30-60 min \| $0.30/clip \|
	\| ZeroGPU \| A100 quota slot \| ✅ ~25-40 sec \| free w/ Pro \|
	\| GPU L40S \| 48 GB VRAM \| ✅ ~8 sec \| $1/hr \|

	## Model paths analysed

	- Path A — Unsloth distilled-1.1 Q3_K_M (`unsloth/LTX-2.3-GGUF` → `distilled-1.1/ltx-2.3-22b-distilled-1.1-Q3_K_M.gguf`, ~10.6 GB). Cleanest 8-step distilled DiT. Best CPU candidate (smallest weights). Requires ComfyUI-GGUF loader.
	- Path C — 10Eros fine-tune + cond_safe distill LoRA (`vantagewithai/LTX2.3-10Eros-GGUF` + cond_safe LoRA). 10Eros is a fine-tune, NOT distilled — README warns "larger distilled LoRAs will harm the model's fine tune". Riskier; needs LoRA tuning. Not a 1:1 replacement for Path A.

	Recommendation: Path A for the CPU build (smallest, distilled). Path C is preserved here as reference for ZeroGPU forks that have headroom to experiment.

	## Text encoder constraint

	You cannot swap the text encoder. LTX 2.3 was trained with `google/gemma-3-12b-it` — the diffusion U-Net is bound to its embedding space. Smaller/newer LLMs like Qwen3.6-35B-A3B or Gemma-4-E2B-it will not work — they produce embeddings in a different distribution.

	The only valid lever is quantising the same encoder smaller:

	\| Quant \| Size \| Quality vs FP16 \|
	\|---\|---\|---\|
	\| Gemma-3-12B-it Q3_K_M \| 6.0 GB \| ~98% \|
	\| Gemma-3-12B-it Q4_K_M \| 7.4 GB \| ~99.5% \|
	\| Gemma-3-12B-it Q5_K_M \| 8.6 GB \| ~99.9% \|

	Use `mradermacher/gemma-3-12b-it-qat-abliterated-GGUF` Q3_K_M for the CPU path.

	## ZeroGPU fork recipe

	Fork this Space to your account, change `sdk: docker` → `sdk: gradio`, change the hardware tier to ZeroGPU, and replace `app.py` with the GPU variant in `gpu_app.py`. That's it.

	```bash
	huggingface-cli repo duplicate WeReCooking/ltx-2.3-cpu YourUsername/ltx-2.3-zerogpu
	# Then edit README.md: sdk -> gradio, add: hardware: zerogpu
	# Edit Space settings on HF UI -> Hardware -> ZeroGPU
	```

	## Curl test (once forked to a GPU tier)

	```bash
	TOKEN="hf_xxx"
	SPACE="https://YourUsername-ltx-2-3-zerogpu.hf.space"

	EVT=$(curl -s -X POST "$SPACE/gradio_api/call/generate" \
	-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
	-d '{"data":["A woman walking through a neon-lit Tokyo alley at night, cinematic", 2.0, 8]}' \
	\| python -c "import sys,json;print(json.load(sys.stdin)['event_id'])")
	curl -sN "$SPACE/gradio_api/call/generate/$EVT" -H "Authorization: Bearer $TOKEN"
	```

	## Logs (SSE)

	```bash
	curl -N -H "Authorization: Bearer $TOKEN" "https://huggingface.co/api/spaces/WeReCooking/ltx-2.3-cpu/logs/build"
	curl -N -H "Authorization: Bearer $TOKEN" "https://huggingface.co/api/spaces/WeReCooking/ltx-2.3-cpu/logs/run"
	```

	## Why not ship inference on free CPU anyway

	I attempted the GGUF path locally. Findings:
	- 10.6 GB GGUF DiT + 6 GB GGUF Gemma encoder + VAE + activations = exceeds 16 GB even with sequential offload (load → run → unload pattern). The encoder needs to stay resident during DiT's classifier-free guidance branch (or be re-loaded per step → 50× slower).
	- 2 vCPU × 22B params at Q3_K_M ≈ ~120 sec/diffusion step → 8-step distilled = ~16 min just for the DiT loop, plus encode + VAE decode + offload swaps → realistically 60-90 min for a 2-sec, 384×256 clip. HF Space request timeout is 1 hour. The math doesn't close.

	The honest path on free CPU is not to ship a broken Generate button — instead, ship the recipe and demos.