# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project goal Build a Hugging Face **Space** (ZeroGPU, Gradio SDK) that lets users upload / pick images and get **pairwise or one-vs-many style-similarity scores** produced by **MegaStyle-Encoder** from the paper *MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping* ([arXiv:2604.08364](https://arxiv.org/abs/2604.08364), [project page](https://jeoyal.github.io/MegaStyle/), [upstream code](https://github.com/Tencent/MegaStyle)). The Space is a **comparison/analysis tool**, not a style-transfer demo — it should surface a similarity matrix / ranked grid / heatmap, not generate images. The style-transfer sibling model (`megastyle_flux.safetensors`) is explicitly **out of scope** unless the user asks for it — a ZeroGPU slot can load SigLIP but cannot sanely run a 12B FLUX model. ## The model contract (critical — easy to get wrong) MegaStyle-Encoder is **not** a standalone architecture. It is **SigLIP** with fine-tuned weights: - **Backbone**: `google/siglip-so400m-patch14-384` loaded via `transformers.SiglipVisionModel` - **Processor**: `transformers.AutoProcessor.from_pretrained("google/siglip-so400m-patch14-384")` - **Weights**: `megastyle_encoder.pth` from [Gaojunyao/MegaStyle](https://huggingface.co/Gaojunyao/MegaStyle) (~857 MB). Loaded with `torch.load(..., map_location="cpu")`, then: - The checkpoint may be either a raw `state_dict` or `{"model": state_dict}` — handle both. - Apply with `model.load_state_dict(state, strict=False)` (non-strict is correct, upstream does this). - **Embedding**: `model(pixel_values=...).pooler_output`, then **L2-normalize** (`emb / emb.norm(p=2, dim=-1, keepdim=True)`). - **Similarity**: plain dot product of two normalized embeddings (= cosine similarity, range roughly [-1, 1]). Reference implementation is `style_score.py` in the upstream repo — treat it as the authoritative contract for any inference code in this Space. Do not "improve" the preprocessing, pooling choice, or normalization without strong reason; the checkpoint was trained to produce a useful metric only under exactly this pipeline. ## Deployment target: ZeroGPU (but the workload is GPU-light) This Space targets **ZeroGPU** (Hugging Face's serverless GPU pool, currently backed by H200) because it's the free GPU tier. The actual workload — one SigLIP-so400m forward pass over up to 9 images — is small enough to run on CPU in 15–30s, or on any CUDA device in under a second. We're not picking H200 for capability reasons; we take whatever ZeroGPU allocates. **Viable alternatives** if ZeroGPU isn't available (no HF Pro, etc.): - **CPU basic Space** (free, no Pro): drop `spaces` import and `@spaces.GPU` decorator; everything else stays the same. Comparison takes 15–30s instead of <1s. - **Dedicated T4/L4/A10G paid tier**: no code change; just pick hardware in Space Settings. ZeroGPU-specific constraints that shape the code today: - GPU-using functions **must** be decorated with `@spaces.GPU(duration=...)` and the `spaces` package must be imported **before** `torch`. GPU tensors/models cannot live at module scope — move `.to("cuda")` and the forward pass **inside** the decorated function. Guard `MODEL.to(device)` with a `next(MODEL.parameters()).device != device` check so repeat calls don't re-migrate. - The model is ~857 MB (`megastyle_encoder.pth`) + SigLIP ~1.8 GB. Load on CPU at module scope via `hf_hub_download`; `.to(device)` inside the decorated function on first invocation. - **`duration`** on `@spaces.GPU` is currently 30s (sized for 1 test image + up to 8 references). Over-estimating wastes the user's ZeroGPU daily quota. Bump only if batch grows. - **ZeroGPU hardware is NOT set in frontmatter** — it's configured in the Space's *Settings → Hardware* panel. The `hardware:` frontmatter key that exists for regular Spaces does not apply here. Don't reintroduce it. - **Python**: ZeroGPU supports only Python **3.10.13** and **3.12.12**. Pin `python_version: "3.10"` in frontmatter. - **PyTorch**: ZeroGPU's supported wheel list is `>=2.1.0`. Keep the pin compatible. - `sdk: gradio`, `sdk_version: 5.9.1` (latest 5.x as of this repo). Gradio 4+ is the only SDK currently supported by ZeroGPU. - Requirements live in `requirements.txt` at repo root: `gradio`, `torch`, `transformers`, `Pillow`, `spaces`, `huggingface_hub`. Pin `transformers>=4.45` so `SiglipVisionModel` exists. ## Repo shape Current layout (single-file app is appropriate at this size): - `app.py` — Gradio Blocks UI + `@spaces.GPU`-decorated inference. Single entry point. - `requirements.txt` — dependency pins. - `README.md` — HF Space frontmatter (sdk, python_version, license) + user-facing description. Doubles as the Space's landing page. - `LICENSE` — MIT. Matches the `license: mit` claim in README frontmatter and the upstream model's license. - `.gitignore` — standard Python + HF cache + Gradio `flagged/` directory. - `CLAUDE.md` — this file. Split `app.py` into a separate `megastyle.py` module only if inference logic grows past ~150 lines or needs unit tests without Gradio. Not needed today. ## UI & scoring conventions (locked by review) - **Up to 8 reference images.** Clipped silently at compute time with a note in the result markdown. - **Headline score is mean cosine similarity** across per-reference scores. Per-reference breakdown is surfaced in a table so users can spot outliers dragging down the mean. - **Label bands** (heuristic, calibrated for SigLIP-family cosine ranges where unrelated images typically sit 0.4–0.6): | Cosine | Label | Emoji | |--------|-------|-------| | ≥ 0.75 | Strong style match | 🟢 | | 0.65–0.75 | Good style match | 🟢 | | 0.55–0.65 | Moderate style match | 🟡 | | 0.45–0.55 | Weak style match | 🟠 | | < 0.45 | Minimal style match | 🔴 | If you ever have calibration data (e.g., per-style-pair cosine distributions from the MegaStyle dataset), retighten these. Until then, do not represent the label as ground truth — the raw cosine is the source of truth. - **Do not display a pseudo-percentage.** An earlier iteration mapped cosine → `(x+1)/2 * 100` which compresses useful signal and misleads users. The raw three-decimal cosine + the label is the display contract. - **Verdict styling uses emoji prefix, not inline HTML ``.** `gr.Markdown`'s HTML handling varies across Gradio versions; emoji is reliably rendered. ## Deploying and observing the Space The live Space is at **`olfronar/megastyle-comparison`** (https://huggingface.co/spaces/olfronar/megastyle-comparison). Local working copy is git-backed with `origin` pointing at the Space's git URL — normal `git push origin main` triggers a rebuild. ### Publish a change ```bash git add && git commit -m "..." && git push origin main ``` For one-off blob pushes without a git clone: ```bash hf upload olfronar/megastyle-comparison . --repo-type space --commit-message "..." ``` ### Set hardware (ZeroGPU, etc.) Hardware is *not* set via README frontmatter. Use `HfApi.request_space_hardware`: ```python from huggingface_hub import HfApi HfApi().request_space_hardware("olfronar/megastyle-comparison", hardware="zero-a10g") ``` Valid flavors include `cpu-basic`, `cpu-upgrade`, `t4-small`, `a10g-small`, `a10g-large`, `a100-large`, and `zero-a10g` (the legacy name still used for the ZeroGPU pool, which is currently backed by H200 hardware). You can also toggle via the Space's *Settings → Hardware* page; both paths write to the same field. ### Observe build and run logs HF exposes Server-Sent-Events streams for both phases. They require the user's HF token (read it from `~/.cache/huggingface/token` after `hf auth login`). Capped-timeout curl works for point-in-time snapshots: ```bash # Build phase (docker layers, pip install) curl -s --max-time 20 \ -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \ "https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/build" | tail -80 # Run phase (Python stdout/stderr, Gradio startup) curl -s --max-time 20 \ -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \ "https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/run" | tail -80 ``` Each event line is JSON with `data` and `timestamp` fields. Because it's SSE, the stream is long-lived — always use `--max-time` to bound it. ### Check high-level state ```bash curl -s "https://huggingface.co/api/spaces/olfronar/megastyle-comparison" | \ python3 -c "import sys, json; d=json.load(sys.stdin); r=d.get('runtime',{}); \ print('stage:', r.get('stage'), 'hardware:', r.get('hardware'), 'sha:', d.get('sha','?')[:7])" ``` `stage` values: `BUILDING` → `RUNNING` (healthy) or `RUNTIME_ERROR` / `BUILD_FAILED` (check logs). ### Common fix cycles - **Runtime import / missing dep**: edit `requirements.txt` or the affected import, commit, push, stream the run logs until `RUNNING`. Most fixes don't need a full rebuild — layer caching makes re-pushes with unchanged deps very fast. - **Hardware change**: one `HfApi().request_space_hardware(...)` call — no push, no rebuild. - **Stuck build**: use the *Settings → Factory Reboot* UI as a last resort; our Dockerfile is the HF default so factory-reboot is safe. ## Working with the Hugging Face Hub - Model weights (`megastyle_encoder.pth`) should be pulled at runtime via `huggingface_hub.hf_hub_download`, not committed. The Space's build cache keeps it warm between restarts. - License of the checkpoint is MIT (per `Gaojunyao/MegaStyle`); SigLIP is Apache-2.0. The Space's own code should ship under a compatible license. - Paper citation (if surfaced in UI): `gao2026megastyle` bibtex is in the upstream repo README. ## Conventions - **Use `SiglipImageProcessor`, not `AutoProcessor`.** Upstream's `style_score.py` uses `AutoProcessor.from_pretrained(SIGLIP_ID)` which loads both the image processor *and* the SigLIP tokenizer; the tokenizer requires `sentencepiece`, which isn't in the ZeroGPU base image and crashes at import. Vision-only inference has no use for the tokenizer. Load `SiglipImageProcessor.from_pretrained(SIGLIP_ID)` directly — the `pixel_values` output is identical. - **Control the gradio version via `sdk_version` in README frontmatter, not `requirements.txt`.** HF's Dockerfile injects `gradio[oauth]==` into the pip install line alongside our `requirements.txt`. Putting a conflicting `gradio` pin in `requirements.txt` (e.g. `gradio>=5.25.0`) crashes the build with `Cannot install gradio== and gradio>=... because these package versions have conflicting dependencies`. Bump `sdk_version` instead; `requirements.txt` should not name gradio at all. - **`sdk_version: 5.50.0` or newer is required.** Gradio 5.9.1 (the version HF defaults to at Space creation) ships a `gradio_client.utils.get_type` that crashes on boolean JSON schemas (valid JSON-Schema shorthand for accept-anything). `gr.Dataframe` triggers this in `/api/info`, flooding run logs with `TypeError: argument of type 'bool' is not iterable`. Fixed in later 5.x. Also launch with `show_api=False` as a belt-and-suspenders so the endpoint isn't exposed at all — we don't need programmatic API access for a visual-only demo. - **Match upstream preprocessing exactly.** If you find yourself tempted to resize, center-crop, or change color-space conversion manually instead of going through the SigLIP image processor, stop — the metric will silently degrade. - **Don't cast weights to fp16/bf16 on load unless explicitly needed.** ZeroGPU A10G handles fp32 SigLIP fine; precision changes affect similarity scores measurably. - **Device string**: the upstream `style_score.py` probes for `cuda` → `npu` → `cpu`. On ZeroGPU it will always be `cuda` inside the decorated function; outside (e.g. cold start), keep it on CPU.