Spaces:
Running on Zero
Running on Zero
| # CLAUDE.md | |
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
| ## Project goal | |
| Build a Hugging Face **Space** (ZeroGPU, Gradio SDK) that lets users upload / pick images and get | |
| **pairwise or one-vs-many style-similarity scores** produced by **MegaStyle-Encoder** from the paper | |
| *MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping* | |
| ([arXiv:2604.08364](https://arxiv.org/abs/2604.08364), [project page](https://jeoyal.github.io/MegaStyle/), | |
| [upstream code](https://github.com/Tencent/MegaStyle)). | |
| The Space is a **comparison/analysis tool**, not a style-transfer demo β it should surface a | |
| similarity matrix / ranked grid / heatmap, not generate images. The style-transfer sibling model | |
| (`megastyle_flux.safetensors`) is explicitly **out of scope** unless the user asks for it β a ZeroGPU | |
| slot can load SigLIP but cannot sanely run a 12B FLUX model. | |
| ## The model contract (critical β easy to get wrong) | |
| MegaStyle-Encoder is **not** a standalone architecture. It is **SigLIP** with fine-tuned weights: | |
| - **Backbone**: `google/siglip-so400m-patch14-384` loaded via `transformers.SiglipVisionModel` | |
| - **Processor**: `transformers.AutoProcessor.from_pretrained("google/siglip-so400m-patch14-384")` | |
| - **Weights**: `megastyle_encoder.pth` from [Gaojunyao/MegaStyle](https://huggingface.co/Gaojunyao/MegaStyle) | |
| (~857 MB). Loaded with `torch.load(..., map_location="cpu")`, then: | |
| - The checkpoint may be either a raw `state_dict` or `{"model": state_dict}` β handle both. | |
| - Apply with `model.load_state_dict(state, strict=False)` (non-strict is correct, upstream does this). | |
| - **Embedding**: `model(pixel_values=...).pooler_output`, then **L2-normalize** (`emb / emb.norm(p=2, dim=-1, keepdim=True)`). | |
| - **Similarity**: plain dot product of two normalized embeddings (= cosine similarity, range roughly [-1, 1]). | |
| Reference implementation is `style_score.py` in the upstream repo β treat it as the authoritative | |
| contract for any inference code in this Space. Do not "improve" the preprocessing, pooling choice, or | |
| normalization without strong reason; the checkpoint was trained to produce a useful metric only under | |
| exactly this pipeline. | |
| ## Deployment target: ZeroGPU (but the workload is GPU-light) | |
| This Space targets **ZeroGPU** (Hugging Face's serverless GPU pool, currently backed by H200) | |
| because it's the free GPU tier. The actual workload β one SigLIP-so400m forward pass over up | |
| to 9 images β is small enough to run on CPU in 15β30s, or on any CUDA device in under a second. | |
| We're not picking H200 for capability reasons; we take whatever ZeroGPU allocates. | |
| **Viable alternatives** if ZeroGPU isn't available (no HF Pro, etc.): | |
| - **CPU basic Space** (free, no Pro): drop `spaces` import and `@spaces.GPU` decorator; everything | |
| else stays the same. Comparison takes 15β30s instead of <1s. | |
| - **Dedicated T4/L4/A10G paid tier**: no code change; just pick hardware in Space Settings. | |
| ZeroGPU-specific constraints that shape the code today: | |
| - GPU-using functions **must** be decorated with `@spaces.GPU(duration=...)` and the `spaces` | |
| package must be imported **before** `torch`. GPU tensors/models cannot live at module scope β | |
| move `.to("cuda")` and the forward pass **inside** the decorated function. Guard `MODEL.to(device)` | |
| with a `next(MODEL.parameters()).device != device` check so repeat calls don't re-migrate. | |
| - The model is ~857 MB (`megastyle_encoder.pth`) + SigLIP ~1.8 GB. Load on CPU at module scope | |
| via `hf_hub_download`; `.to(device)` inside the decorated function on first invocation. | |
| - **`duration`** on `@spaces.GPU` is currently 30s (sized for 1 test image + up to 8 references). | |
| Over-estimating wastes the user's ZeroGPU daily quota. Bump only if batch grows. | |
| - **ZeroGPU hardware is NOT set in frontmatter** β it's configured in the Space's | |
| *Settings β Hardware* panel. The `hardware:` frontmatter key that exists for regular Spaces | |
| does not apply here. Don't reintroduce it. | |
| - **Python**: ZeroGPU supports only Python **3.10.13** and **3.12.12**. Pin `python_version: "3.10"` | |
| in frontmatter. | |
| - **PyTorch**: ZeroGPU's supported wheel list is `>=2.1.0`. Keep the pin compatible. | |
| - `sdk: gradio`, `sdk_version: 5.9.1` (latest 5.x as of this repo). Gradio 4+ is the only SDK | |
| currently supported by ZeroGPU. | |
| - Requirements live in `requirements.txt` at repo root: `gradio`, `torch`, `transformers`, | |
| `Pillow`, `spaces`, `huggingface_hub`. Pin `transformers>=4.45` so `SiglipVisionModel` exists. | |
| ## Repo shape | |
| Current layout (single-file app is appropriate at this size): | |
| - `app.py` β Gradio Blocks UI + `@spaces.GPU`-decorated inference. Single entry point. | |
| - `requirements.txt` β dependency pins. | |
| - `README.md` β HF Space frontmatter (sdk, python_version, license) + user-facing description. | |
| Doubles as the Space's landing page. | |
| - `LICENSE` β MIT. Matches the `license: mit` claim in README frontmatter and the upstream | |
| model's license. | |
| - `.gitignore` β standard Python + HF cache + Gradio `flagged/` directory. | |
| - `CLAUDE.md` β this file. | |
| Split `app.py` into a separate `megastyle.py` module only if inference logic grows past ~150 lines | |
| or needs unit tests without Gradio. Not needed today. | |
| ## UI & scoring conventions (locked by review) | |
| - **Up to 8 reference images.** Clipped silently at compute time with a note in the result markdown. | |
| - **Headline score is mean cosine similarity** across per-reference scores. Per-reference | |
| breakdown is surfaced in a table so users can spot outliers dragging down the mean. | |
| - **Label bands** (heuristic, calibrated for SigLIP-family cosine ranges where unrelated images | |
| typically sit 0.4β0.6): | |
| | Cosine | Label | Emoji | | |
| |--------|-------|-------| | |
| | β₯ 0.75 | Strong style match | π’ | | |
| | 0.65β0.75 | Good style match | π’ | | |
| | 0.55β0.65 | Moderate style match | π‘ | | |
| | 0.45β0.55 | Weak style match | π | | |
| | < 0.45 | Minimal style match | π΄ | | |
| If you ever have calibration data (e.g., per-style-pair cosine distributions from the MegaStyle | |
| dataset), retighten these. Until then, do not represent the label as ground truth β the raw | |
| cosine is the source of truth. | |
| - **Do not display a pseudo-percentage.** An earlier iteration mapped cosine β `(x+1)/2 * 100` | |
| which compresses useful signal and misleads users. The raw three-decimal cosine + the label is | |
| the display contract. | |
| - **Verdict styling uses emoji prefix, not inline HTML `<span>`.** `gr.Markdown`'s HTML handling | |
| varies across Gradio versions; emoji is reliably rendered. | |
| ## Deploying and observing the Space | |
| The live Space is at **`olfronar/megastyle-comparison`** | |
| (https://huggingface.co/spaces/olfronar/megastyle-comparison). Local working copy is git-backed | |
| with `origin` pointing at the Space's git URL β normal `git push origin main` triggers a rebuild. | |
| ### Publish a change | |
| ```bash | |
| git add <files> && git commit -m "..." && git push origin main | |
| ``` | |
| For one-off blob pushes without a git clone: | |
| ```bash | |
| hf upload olfronar/megastyle-comparison . --repo-type space --commit-message "..." | |
| ``` | |
| ### Set hardware (ZeroGPU, etc.) | |
| Hardware is *not* set via README frontmatter. Use `HfApi.request_space_hardware`: | |
| ```python | |
| from huggingface_hub import HfApi | |
| HfApi().request_space_hardware("olfronar/megastyle-comparison", hardware="zero-a10g") | |
| ``` | |
| Valid flavors include `cpu-basic`, `cpu-upgrade`, `t4-small`, `a10g-small`, `a10g-large`, | |
| `a100-large`, and `zero-a10g` (the legacy name still used for the ZeroGPU pool, which is | |
| currently backed by H200 hardware). You can also toggle via the Space's *Settings β Hardware* | |
| page; both paths write to the same field. | |
| ### Observe build and run logs | |
| HF exposes Server-Sent-Events streams for both phases. They require the user's HF token (read | |
| it from `~/.cache/huggingface/token` after `hf auth login`). Capped-timeout curl works for | |
| point-in-time snapshots: | |
| ```bash | |
| # Build phase (docker layers, pip install) | |
| curl -s --max-time 20 \ | |
| -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \ | |
| "https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/build" | tail -80 | |
| # Run phase (Python stdout/stderr, Gradio startup) | |
| curl -s --max-time 20 \ | |
| -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \ | |
| "https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/run" | tail -80 | |
| ``` | |
| Each event line is JSON with `data` and `timestamp` fields. Because it's SSE, the stream is | |
| long-lived β always use `--max-time` to bound it. | |
| ### Check high-level state | |
| ```bash | |
| curl -s "https://huggingface.co/api/spaces/olfronar/megastyle-comparison" | \ | |
| python3 -c "import sys, json; d=json.load(sys.stdin); r=d.get('runtime',{}); \ | |
| print('stage:', r.get('stage'), 'hardware:', r.get('hardware'), 'sha:', d.get('sha','?')[:7])" | |
| ``` | |
| `stage` values: `BUILDING` β `RUNNING` (healthy) or `RUNTIME_ERROR` / `BUILD_FAILED` (check logs). | |
| ### Common fix cycles | |
| - **Runtime import / missing dep**: edit `requirements.txt` or the affected import, commit, push, | |
| stream the run logs until `RUNNING`. Most fixes don't need a full rebuild β layer caching makes | |
| re-pushes with unchanged deps very fast. | |
| - **Hardware change**: one `HfApi().request_space_hardware(...)` call β no push, no rebuild. | |
| - **Stuck build**: use the *Settings β Factory Reboot* UI as a last resort; our Dockerfile is the | |
| HF default so factory-reboot is safe. | |
| ## Working with the Hugging Face Hub | |
| - Model weights (`megastyle_encoder.pth`) should be pulled at runtime via `huggingface_hub.hf_hub_download`, | |
| not committed. The Space's build cache keeps it warm between restarts. | |
| - License of the checkpoint is MIT (per `Gaojunyao/MegaStyle`); SigLIP is Apache-2.0. The Space's own | |
| code should ship under a compatible license. | |
| - Paper citation (if surfaced in UI): `gao2026megastyle` bibtex is in the upstream repo README. | |
| ## Conventions | |
| - **Use `SiglipImageProcessor`, not `AutoProcessor`.** Upstream's `style_score.py` uses | |
| `AutoProcessor.from_pretrained(SIGLIP_ID)` which loads both the image processor *and* the SigLIP | |
| tokenizer; the tokenizer requires `sentencepiece`, which isn't in the ZeroGPU base image and | |
| crashes at import. Vision-only inference has no use for the tokenizer. Load | |
| `SiglipImageProcessor.from_pretrained(SIGLIP_ID)` directly β the `pixel_values` output is | |
| identical. | |
| - **Control the gradio version via `sdk_version` in README frontmatter, not `requirements.txt`.** | |
| HF's Dockerfile injects `gradio[oauth]==<sdk_version>` into the pip install line alongside our | |
| `requirements.txt`. Putting a conflicting `gradio` pin in `requirements.txt` (e.g. | |
| `gradio>=5.25.0`) crashes the build with `Cannot install gradio==<sdk_version> and gradio>=... | |
| because these package versions have conflicting dependencies`. Bump `sdk_version` instead; | |
| `requirements.txt` should not name gradio at all. | |
| - **`sdk_version: 5.50.0` or newer is required.** Gradio 5.9.1 (the version HF defaults to at | |
| Space creation) ships a `gradio_client.utils.get_type` that crashes on boolean JSON schemas | |
| (valid JSON-Schema shorthand for accept-anything). `gr.Dataframe` triggers this in `/api/info`, | |
| flooding run logs with `TypeError: argument of type 'bool' is not iterable`. Fixed in later | |
| 5.x. Also launch with `show_api=False` as a belt-and-suspenders so the endpoint isn't exposed | |
| at all β we don't need programmatic API access for a visual-only demo. | |
| - **Match upstream preprocessing exactly.** If you find yourself tempted to resize, center-crop, or | |
| change color-space conversion manually instead of going through the SigLIP image processor, stop | |
| β the metric will silently degrade. | |
| - **Don't cast weights to fp16/bf16 on load unless explicitly needed.** ZeroGPU A10G handles fp32 | |
| SigLIP fine; precision changes affect similarity scores measurably. | |
| - **Device string**: the upstream `style_score.py` probes for `cuda` β `npu` β `cpu`. On ZeroGPU it | |
| will always be `cuda` inside the decorated function; outside (e.g. cold start), keep it on CPU. | |