Spaces:
Running on Zero
Running on Zero
File size: 9,613 Bytes
ceadaef 6bd8e31 99302bc 213bf15 ceadaef dc32ce0 322b245 dc32ce0 ceadaef dc32ce0 ceadaef dc32ce0 ceadaef dc32ce0 ceadaef dc32ce0 ceadaef dc32ce0 ceadaef dc32ce0 ceadaef dc32ce0 ceadaef dc32ce0 ceadaef dc32ce0 ceadaef dc32ce0 ceadaef dc32ce0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | ---
title: Z-Image Studio
emoji: ⚡
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: "5.50.0"
app_file: app.py
python_version: "3.11"
suggested_hardware: zero-a10g
hf_oauth: false
preload_from_hub:
- Tongyi-MAI/Z-Image transformer/*,text_encoder/*,vae/*,tokenizer/*,scheduler/*,model_index.json
- Tongyi-MAI/Z-Image-Turbo transformer/*,scheduler/*,model_index.json
- alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1 Z-Image-Turbo-Fun-Controlnet-Union-2.1-8steps.safetensors
- lllyasviel/Annotators RealESRGAN_x4plus.pth
---
# Z-Image Studio
A single-process Gradio app that wraps [Tongyi-MAI Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image) and [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) with ControlNet and a 2× upscaler under one focused UI. Runs locally on Apple Silicon (MPS) or NVIDIA (CUDA), deploys to Hugging Face Spaces (ZeroGPU).
[](https://huggingface.co/spaces/techfreakworm/z-image-studio)
[](https://github.com/techfreakworm/z-image-studio/stargazers)
[](LICENSE)
[](pyproject.toml)
[](https://github.com/modelscope/DiffSynth-Studio)
→ **Live demo:** https://huggingface.co/spaces/techfreakworm/z-image-studio
---
## What's inside
Three tabs. Same DiffSynth `ZImagePipeline` underneath. Progressive disclosure — the form starts short and reveals controls only when you ask for them.
| Mode | Model | What it does |
|---|---|---|
| **Text → Image** | Z-Image (25 steps, cfg=4) · Z-Image-Turbo (8 steps, cfg=1) | Prompt-to-image. Toggle the model on the fly; the form swaps Steps / CFG / Negative-Prompt defaults to match. |
| **ControlNet** | Z-Image-Turbo + Fun-Controlnet-Union 2.1 | Canny / Depth / Pose preprocessors with a **live preview** of the processed control image. |
| **Upscale** | RealESRGAN x4 → Z-Image-Turbo refinement | Effective 2× upscale with diffusion-based detail restoration (5-step img2img at denoise 0.33). |
Each tab carries an optional LoRA toggle. When enabled, exposes a compact `.safetensors` slot + strength slider. The toggle label tells you which model's LoRA is accepted (Z-Image vs Z-Image-Turbo) and updates as you flip the radio.
---
## Quick start (local)
Requires **Python 3.11**, ~50 GB free disk for the weight set, and ~24 GB VRAM (CUDA) or ~32 GB unified memory (Apple Silicon).
```bash
git clone https://github.com/techfreakworm/z-image-studio
cd z-image-studio
bash setup.sh # creates .venv, installs requirements
source .venv/bin/activate
python app.py # http://127.0.0.1:7860
```
The first run resolves model weights into your HF cache (`~/.cache/huggingface/hub/`). Subsequent starts are fast — the app symlinks the cache snapshots into DiffSynth's expected `./models/<repo>/` layout so nothing re-downloads.
**Apple Silicon notes:** `PYTORCH_ENABLE_MPS_FALLBACK=1` is set automatically so the few MPS-unsupported ops fall back to CPU. DiffSynth's free-VRAM check (CUDA-only) is bypassed on MPS — module swapping still works.
## Quick start (HF Spaces)
```bash
git remote add space https://huggingface.co/spaces/<your-handle>/z-image-studio
git push space main
```
The Space's `preload_from_hub` directive pre-downloads the ~47 GB weight set at build time. `app.py:_bootstrap()` mirrors the read-only build cache into `~/hf-cache-rw/` and symlinks every snapshot into `./models/<repo>/`. Pipeline construction at first request finds everything locally; no network on inference 2 onward.
## Architecture
```
┌──────────────────────────────┐
browser ──▶ │ app.py — Gradio Blocks │
│ (header + CTA + 3 tabs) │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ backend.py │
│ ZImageStudioBackend │
│ @spaces.GPU(duration=…) │
│ one DiffSynth pipeline, │
│ two transformers in pool │
└──────────────┬───────────────┘
│
┌───────────────┬───────────┴────────┬──────────────────┐
▼ ▼ ▼ ▼
modes.py preprocessors.py upscale.py lora.py
3 handlers Canny/Depth/Pose RealESRGAN x4 safetensors
(controlnet_aux) + 0.5 resize sniff + apply/revert
```
**One pipeline instance**, both transformers (Base + Turbo) preloaded into the pool, swapped per request by indexing into `pool.model`. Shared encoder + VAE + tokenizer between Base and Turbo — no duplication.
`@spaces.GPU(duration=callable)` decorates the generate method at module load time on Spaces. The duration estimator clamps to `[60, 180] s` based on mode, model, steps, and image area. ZeroGPU "GPU task aborted" surfaces auto-retry once at 2× duration.
## Project layout
```
.
├── app.py # Gradio Blocks entry, bootstrap, event handlers, CTA
├── backend.py # ZImageStudioBackend; @spaces.GPU; duration estimator
├── modes.py # call_t2i / call_controlnet / call_upscale pure handlers
├── models.py # device autodetect, MODEL_CONFIGS, cache mirror + symlink
├── lora.py # safetensors header sniff + apply/revert ctx
├── preprocessors.py # Canny (cv2) + Depth (depth_midas) + Pose (openpose)
├── upscale.py # RealESRGAN x4 wrapper + basicsr/torchvision shim
├── ui.py # Per-tab Gradio component builders
├── theme.py # Soft Dark Restraint palette + minimal CSS
├── tooltips.py # Centralised info= strings
├── requirements.txt # pinned deps
├── pyproject.toml # ruff + pytest config (py311)
├── setup.sh # venv bootstrap
└── tests/ # 70 passing (L1+L2 in CI); GPU smoke in -m gpu
```
## Tech stack
- **[Gradio 5.50](https://gradio.app/)** — UI shell, native components, `gr.Progress(track_tqdm=True)`
- **[DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio)** — Z-Image pipeline + model pool + VRAM management
- **[Z-Image / Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image)** by Tongyi-MAI
- **[Z-Image-Turbo-Fun-Controlnet-Union-2.1](https://huggingface.co/alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1)** by Alibaba PAI
- **[RealESRGAN](https://github.com/xinntao/Real-ESRGAN)** weights via [`lllyasviel/Annotators`](https://huggingface.co/lllyasviel/Annotators)
- **[controlnet_aux](https://github.com/huggingface/controlnet_aux)** for Depth (MiDaS) and Pose (OpenPose)
- **HF Spaces ZeroGPU** (A10G) — `@spaces.GPU(duration=…)` queue priority
## Design
Theme: **Soft Dark Restraint** — warm dark substrate `#1A1614`, cream ink `#F0E8DD`, one accent `#FFB02E` used sparingly (live radio dot, slider fill, primary button, progress fill, brand period). Inter throughout. No display fonts, no shadows, no gradients. The accent is rationed so the generated image stays the visual focus.
Disclosure patterns — controls appear when they're needed:
- `Use a LoRA` checkbox → file slot + strength slider appear inline
- Model = Base → Negative Prompt + CFG slider appear (Turbo runs cfg=1 so they'd be no-ops)
- `Advanced` accordion → Width / Height / Seed live inside, collapsed by default
Spec + plan + design rationale live under `docs/superpowers/`.
## Notes on running
- **First inference is slow.** Cold-start pipeline construction (~30 – 60 s on MPS, ~10 – 20 s on CUDA) is amortised across the whole session. Subsequent requests hit warm cache.
- **MPS Macs:** Z-Image-Turbo at 8 steps + 1024² produces an image in ~30 – 60 s. Base at 25 steps is closer to 2 min. Upscale on 1024² → 2048² adds ~30 s on the refinement pass.
- **ZeroGPU duration cap.** The estimator clamps at 180 s. If a generation aborts, the handler retries once at 2× duration. The duration field per call is the queue-priority signal, not a billing cap.
## License
MIT for the app code (see `LICENSE`). DiffSynth-Studio is Apache-2.0. Z-Image and Z-Image-Turbo retain their respective Tongyi-MAI licenses. RealESRGAN weights are BSD-3-Clause via the xinntao/Real-ESRGAN repository.
## Credits
Z-Image and Z-Image-Turbo by [Tongyi-MAI](https://github.com/Tongyi-MAI). DiffSynth-Studio by the [ModelScope](https://github.com/modelscope) team. ControlNet Union 2.1 by [Alibaba PAI](https://github.com/alibaba). Built by [@techfreakworm](https://huggingface.co/techfreakworm) — drop a ♥ on the [Space](https://huggingface.co/spaces/techfreakworm/z-image-studio) if it's useful.
|