Spaces:
Running on Zero
Running on Zero
File size: 12,111 Bytes
33eae59 836e4c0 33eae59 711c874 6b38433 33eae59 711c874 33eae59 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | # CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project goal
Build a Hugging Face **Space** (ZeroGPU, Gradio SDK) that lets users upload / pick images and get
**pairwise or one-vs-many style-similarity scores** produced by **MegaStyle-Encoder** from the paper
*MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping*
([arXiv:2604.08364](https://arxiv.org/abs/2604.08364), [project page](https://jeoyal.github.io/MegaStyle/),
[upstream code](https://github.com/Tencent/MegaStyle)).
The Space is a **comparison/analysis tool**, not a style-transfer demo β it should surface a
similarity matrix / ranked grid / heatmap, not generate images. The style-transfer sibling model
(`megastyle_flux.safetensors`) is explicitly **out of scope** unless the user asks for it β a ZeroGPU
slot can load SigLIP but cannot sanely run a 12B FLUX model.
## The model contract (critical β easy to get wrong)
MegaStyle-Encoder is **not** a standalone architecture. It is **SigLIP** with fine-tuned weights:
- **Backbone**: `google/siglip-so400m-patch14-384` loaded via `transformers.SiglipVisionModel`
- **Processor**: `transformers.AutoProcessor.from_pretrained("google/siglip-so400m-patch14-384")`
- **Weights**: `megastyle_encoder.pth` from [Gaojunyao/MegaStyle](https://huggingface.co/Gaojunyao/MegaStyle)
(~857 MB). Loaded with `torch.load(..., map_location="cpu")`, then:
- The checkpoint may be either a raw `state_dict` or `{"model": state_dict}` β handle both.
- Apply with `model.load_state_dict(state, strict=False)` (non-strict is correct, upstream does this).
- **Embedding**: `model(pixel_values=...).pooler_output`, then **L2-normalize** (`emb / emb.norm(p=2, dim=-1, keepdim=True)`).
- **Similarity**: plain dot product of two normalized embeddings (= cosine similarity, range roughly [-1, 1]).
Reference implementation is `style_score.py` in the upstream repo β treat it as the authoritative
contract for any inference code in this Space. Do not "improve" the preprocessing, pooling choice, or
normalization without strong reason; the checkpoint was trained to produce a useful metric only under
exactly this pipeline.
## Deployment target: ZeroGPU (but the workload is GPU-light)
This Space targets **ZeroGPU** (Hugging Face's serverless GPU pool, currently backed by H200)
because it's the free GPU tier. The actual workload β one SigLIP-so400m forward pass over up
to 9 images β is small enough to run on CPU in 15β30s, or on any CUDA device in under a second.
We're not picking H200 for capability reasons; we take whatever ZeroGPU allocates.
**Viable alternatives** if ZeroGPU isn't available (no HF Pro, etc.):
- **CPU basic Space** (free, no Pro): drop `spaces` import and `@spaces.GPU` decorator; everything
else stays the same. Comparison takes 15β30s instead of <1s.
- **Dedicated T4/L4/A10G paid tier**: no code change; just pick hardware in Space Settings.
ZeroGPU-specific constraints that shape the code today:
- GPU-using functions **must** be decorated with `@spaces.GPU(duration=...)` and the `spaces`
package must be imported **before** `torch`. GPU tensors/models cannot live at module scope β
move `.to("cuda")` and the forward pass **inside** the decorated function. Guard `MODEL.to(device)`
with a `next(MODEL.parameters()).device != device` check so repeat calls don't re-migrate.
- The model is ~857 MB (`megastyle_encoder.pth`) + SigLIP ~1.8 GB. Load on CPU at module scope
via `hf_hub_download`; `.to(device)` inside the decorated function on first invocation.
- **`duration`** on `@spaces.GPU` is currently 30s (sized for 1 test image + up to 8 references).
Over-estimating wastes the user's ZeroGPU daily quota. Bump only if batch grows.
- **ZeroGPU hardware is NOT set in frontmatter** β it's configured in the Space's
*Settings β Hardware* panel. The `hardware:` frontmatter key that exists for regular Spaces
does not apply here. Don't reintroduce it.
- **Python**: ZeroGPU supports only Python **3.10.13** and **3.12.12**. Pin `python_version: "3.10"`
in frontmatter.
- **PyTorch**: ZeroGPU's supported wheel list is `>=2.1.0`. Keep the pin compatible.
- `sdk: gradio`, `sdk_version: 5.9.1` (latest 5.x as of this repo). Gradio 4+ is the only SDK
currently supported by ZeroGPU.
- Requirements live in `requirements.txt` at repo root: `gradio`, `torch`, `transformers`,
`Pillow`, `spaces`, `huggingface_hub`. Pin `transformers>=4.45` so `SiglipVisionModel` exists.
## Repo shape
Current layout (single-file app is appropriate at this size):
- `app.py` β Gradio Blocks UI + `@spaces.GPU`-decorated inference. Single entry point.
- `requirements.txt` β dependency pins.
- `README.md` β HF Space frontmatter (sdk, python_version, license) + user-facing description.
Doubles as the Space's landing page.
- `LICENSE` β MIT. Matches the `license: mit` claim in README frontmatter and the upstream
model's license.
- `.gitignore` β standard Python + HF cache + Gradio `flagged/` directory.
- `CLAUDE.md` β this file.
Split `app.py` into a separate `megastyle.py` module only if inference logic grows past ~150 lines
or needs unit tests without Gradio. Not needed today.
## UI & scoring conventions (locked by review)
- **Up to 8 reference images.** Clipped silently at compute time with a note in the result markdown.
- **Headline score is mean cosine similarity** across per-reference scores. Per-reference
breakdown is surfaced in a table so users can spot outliers dragging down the mean.
- **Label bands** (heuristic, calibrated for SigLIP-family cosine ranges where unrelated images
typically sit 0.4β0.6):
| Cosine | Label | Emoji |
|--------|-------|-------|
| β₯ 0.75 | Strong style match | π’ |
| 0.65β0.75 | Good style match | π’ |
| 0.55β0.65 | Moderate style match | π‘ |
| 0.45β0.55 | Weak style match | π |
| < 0.45 | Minimal style match | π΄ |
If you ever have calibration data (e.g., per-style-pair cosine distributions from the MegaStyle
dataset), retighten these. Until then, do not represent the label as ground truth β the raw
cosine is the source of truth.
- **Do not display a pseudo-percentage.** An earlier iteration mapped cosine β `(x+1)/2 * 100`
which compresses useful signal and misleads users. The raw three-decimal cosine + the label is
the display contract.
- **Verdict styling uses emoji prefix, not inline HTML `<span>`.** `gr.Markdown`'s HTML handling
varies across Gradio versions; emoji is reliably rendered.
## Deploying and observing the Space
The live Space is at **`olfronar/megastyle-comparison`**
(https://huggingface.co/spaces/olfronar/megastyle-comparison). Local working copy is git-backed
with `origin` pointing at the Space's git URL β normal `git push origin main` triggers a rebuild.
### Publish a change
```bash
git add <files> && git commit -m "..." && git push origin main
```
For one-off blob pushes without a git clone:
```bash
hf upload olfronar/megastyle-comparison . --repo-type space --commit-message "..."
```
### Set hardware (ZeroGPU, etc.)
Hardware is *not* set via README frontmatter. Use `HfApi.request_space_hardware`:
```python
from huggingface_hub import HfApi
HfApi().request_space_hardware("olfronar/megastyle-comparison", hardware="zero-a10g")
```
Valid flavors include `cpu-basic`, `cpu-upgrade`, `t4-small`, `a10g-small`, `a10g-large`,
`a100-large`, and `zero-a10g` (the legacy name still used for the ZeroGPU pool, which is
currently backed by H200 hardware). You can also toggle via the Space's *Settings β Hardware*
page; both paths write to the same field.
### Observe build and run logs
HF exposes Server-Sent-Events streams for both phases. They require the user's HF token (read
it from `~/.cache/huggingface/token` after `hf auth login`). Capped-timeout curl works for
point-in-time snapshots:
```bash
# Build phase (docker layers, pip install)
curl -s --max-time 20 \
-H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
"https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/build" | tail -80
# Run phase (Python stdout/stderr, Gradio startup)
curl -s --max-time 20 \
-H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
"https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/run" | tail -80
```
Each event line is JSON with `data` and `timestamp` fields. Because it's SSE, the stream is
long-lived β always use `--max-time` to bound it.
### Check high-level state
```bash
curl -s "https://huggingface.co/api/spaces/olfronar/megastyle-comparison" | \
python3 -c "import sys, json; d=json.load(sys.stdin); r=d.get('runtime',{}); \
print('stage:', r.get('stage'), 'hardware:', r.get('hardware'), 'sha:', d.get('sha','?')[:7])"
```
`stage` values: `BUILDING` β `RUNNING` (healthy) or `RUNTIME_ERROR` / `BUILD_FAILED` (check logs).
### Common fix cycles
- **Runtime import / missing dep**: edit `requirements.txt` or the affected import, commit, push,
stream the run logs until `RUNNING`. Most fixes don't need a full rebuild β layer caching makes
re-pushes with unchanged deps very fast.
- **Hardware change**: one `HfApi().request_space_hardware(...)` call β no push, no rebuild.
- **Stuck build**: use the *Settings β Factory Reboot* UI as a last resort; our Dockerfile is the
HF default so factory-reboot is safe.
## Working with the Hugging Face Hub
- Model weights (`megastyle_encoder.pth`) should be pulled at runtime via `huggingface_hub.hf_hub_download`,
not committed. The Space's build cache keeps it warm between restarts.
- License of the checkpoint is MIT (per `Gaojunyao/MegaStyle`); SigLIP is Apache-2.0. The Space's own
code should ship under a compatible license.
- Paper citation (if surfaced in UI): `gao2026megastyle` bibtex is in the upstream repo README.
## Conventions
- **Use `SiglipImageProcessor`, not `AutoProcessor`.** Upstream's `style_score.py` uses
`AutoProcessor.from_pretrained(SIGLIP_ID)` which loads both the image processor *and* the SigLIP
tokenizer; the tokenizer requires `sentencepiece`, which isn't in the ZeroGPU base image and
crashes at import. Vision-only inference has no use for the tokenizer. Load
`SiglipImageProcessor.from_pretrained(SIGLIP_ID)` directly β the `pixel_values` output is
identical.
- **Control the gradio version via `sdk_version` in README frontmatter, not `requirements.txt`.**
HF's Dockerfile injects `gradio[oauth]==<sdk_version>` into the pip install line alongside our
`requirements.txt`. Putting a conflicting `gradio` pin in `requirements.txt` (e.g.
`gradio>=5.25.0`) crashes the build with `Cannot install gradio==<sdk_version> and gradio>=...
because these package versions have conflicting dependencies`. Bump `sdk_version` instead;
`requirements.txt` should not name gradio at all.
- **`sdk_version: 5.50.0` or newer is required.** Gradio 5.9.1 (the version HF defaults to at
Space creation) ships a `gradio_client.utils.get_type` that crashes on boolean JSON schemas
(valid JSON-Schema shorthand for accept-anything). `gr.Dataframe` triggers this in `/api/info`,
flooding run logs with `TypeError: argument of type 'bool' is not iterable`. Fixed in later
5.x. Also launch with `show_api=False` as a belt-and-suspenders so the endpoint isn't exposed
at all β we don't need programmatic API access for a visual-only demo.
- **Match upstream preprocessing exactly.** If you find yourself tempted to resize, center-crop, or
change color-space conversion manually instead of going through the SigLIP image processor, stop
β the metric will silently degrade.
- **Don't cast weights to fp16/bf16 on load unless explicitly needed.** ZeroGPU A10G handles fp32
SigLIP fine; precision changes affect similarity scores measurably.
- **Device string**: the upstream `style_score.py` probes for `cuda` β `npu` β `cpu`. On ZeroGPU it
will always be `cuda` inside the decorated function; outside (e.g. cold start), keep it on CPU.
|