Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.14.0
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project goal
Build a Hugging Face Space (ZeroGPU, Gradio SDK) that lets users upload / pick images and get pairwise or one-vs-many style-similarity scores produced by MegaStyle-Encoder from the paper MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping (arXiv:2604.08364, project page, upstream code).
The Space is a comparison/analysis tool, not a style-transfer demo β it should surface a
similarity matrix / ranked grid / heatmap, not generate images. The style-transfer sibling model
(megastyle_flux.safetensors) is explicitly out of scope unless the user asks for it β a ZeroGPU
slot can load SigLIP but cannot sanely run a 12B FLUX model.
The model contract (critical β easy to get wrong)
MegaStyle-Encoder is not a standalone architecture. It is SigLIP with fine-tuned weights:
- Backbone:
google/siglip-so400m-patch14-384loaded viatransformers.SiglipVisionModel - Processor:
transformers.AutoProcessor.from_pretrained("google/siglip-so400m-patch14-384") - Weights:
megastyle_encoder.pthfrom Gaojunyao/MegaStyle (~857 MB). Loaded withtorch.load(..., map_location="cpu"), then:- The checkpoint may be either a raw
state_dictor{"model": state_dict}β handle both. - Apply with
model.load_state_dict(state, strict=False)(non-strict is correct, upstream does this).
- The checkpoint may be either a raw
- Embedding:
model(pixel_values=...).pooler_output, then L2-normalize (emb / emb.norm(p=2, dim=-1, keepdim=True)). - Similarity: plain dot product of two normalized embeddings (= cosine similarity, range roughly [-1, 1]).
Reference implementation is style_score.py in the upstream repo β treat it as the authoritative
contract for any inference code in this Space. Do not "improve" the preprocessing, pooling choice, or
normalization without strong reason; the checkpoint was trained to produce a useful metric only under
exactly this pipeline.
Deployment target: ZeroGPU (but the workload is GPU-light)
This Space targets ZeroGPU (Hugging Face's serverless GPU pool, currently backed by H200) because it's the free GPU tier. The actual workload β one SigLIP-so400m forward pass over up to 9 images β is small enough to run on CPU in 15β30s, or on any CUDA device in under a second. We're not picking H200 for capability reasons; we take whatever ZeroGPU allocates.
Viable alternatives if ZeroGPU isn't available (no HF Pro, etc.):
- CPU basic Space (free, no Pro): drop
spacesimport and@spaces.GPUdecorator; everything else stays the same. Comparison takes 15β30s instead of <1s. - Dedicated T4/L4/A10G paid tier: no code change; just pick hardware in Space Settings.
ZeroGPU-specific constraints that shape the code today:
- GPU-using functions must be decorated with
@spaces.GPU(duration=...)and thespacespackage must be imported beforetorch. GPU tensors/models cannot live at module scope β move.to("cuda")and the forward pass inside the decorated function. GuardMODEL.to(device)with anext(MODEL.parameters()).device != devicecheck so repeat calls don't re-migrate. - The model is ~857 MB (
megastyle_encoder.pth) + SigLIP ~1.8 GB. Load on CPU at module scope viahf_hub_download;.to(device)inside the decorated function on first invocation. durationon@spaces.GPUis currently 30s (sized for 1 test image + up to 8 references). Over-estimating wastes the user's ZeroGPU daily quota. Bump only if batch grows.- ZeroGPU hardware is NOT set in frontmatter β it's configured in the Space's
Settings β Hardware panel. The
hardware:frontmatter key that exists for regular Spaces does not apply here. Don't reintroduce it. - Python: ZeroGPU supports only Python 3.10.13 and 3.12.12. Pin
python_version: "3.10"in frontmatter. - PyTorch: ZeroGPU's supported wheel list is
>=2.1.0. Keep the pin compatible. sdk: gradio,sdk_version: 5.9.1(latest 5.x as of this repo). Gradio 4+ is the only SDK currently supported by ZeroGPU.- Requirements live in
requirements.txtat repo root:gradio,torch,transformers,Pillow,spaces,huggingface_hub. Pintransformers>=4.45soSiglipVisionModelexists.
Repo shape
Current layout (single-file app is appropriate at this size):
app.pyβ Gradio Blocks UI +@spaces.GPU-decorated inference. Single entry point.requirements.txtβ dependency pins.README.mdβ HF Space frontmatter (sdk, python_version, license) + user-facing description. Doubles as the Space's landing page.LICENSEβ MIT. Matches thelicense: mitclaim in README frontmatter and the upstream model's license..gitignoreβ standard Python + HF cache + Gradioflagged/directory.CLAUDE.mdβ this file.
Split app.py into a separate megastyle.py module only if inference logic grows past ~150 lines
or needs unit tests without Gradio. Not needed today.
UI & scoring conventions (locked by review)
Up to 8 reference images. Clipped silently at compute time with a note in the result markdown.
Headline score is mean cosine similarity across per-reference scores. Per-reference breakdown is surfaced in a table so users can spot outliers dragging down the mean.
Label bands (heuristic, calibrated for SigLIP-family cosine ranges where unrelated images typically sit 0.4β0.6):
Cosine Label Emoji β₯ 0.75 Strong style match π’ 0.65β0.75 Good style match π’ 0.55β0.65 Moderate style match π‘ 0.45β0.55 Weak style match π < 0.45 Minimal style match π΄ If you ever have calibration data (e.g., per-style-pair cosine distributions from the MegaStyle dataset), retighten these. Until then, do not represent the label as ground truth β the raw cosine is the source of truth.
Do not display a pseudo-percentage. An earlier iteration mapped cosine β
(x+1)/2 * 100which compresses useful signal and misleads users. The raw three-decimal cosine + the label is the display contract.Verdict styling uses emoji prefix, not inline HTML
<span>.gr.Markdown's HTML handling varies across Gradio versions; emoji is reliably rendered.
Deploying and observing the Space
The live Space is at olfronar/megastyle-comparison
(https://huggingface.co/spaces/olfronar/megastyle-comparison). Local working copy is git-backed
with origin pointing at the Space's git URL β normal git push origin main triggers a rebuild.
Publish a change
git add <files> && git commit -m "..." && git push origin main
For one-off blob pushes without a git clone:
hf upload olfronar/megastyle-comparison . --repo-type space --commit-message "..."
Set hardware (ZeroGPU, etc.)
Hardware is not set via README frontmatter. Use HfApi.request_space_hardware:
from huggingface_hub import HfApi
HfApi().request_space_hardware("olfronar/megastyle-comparison", hardware="zero-a10g")
Valid flavors include cpu-basic, cpu-upgrade, t4-small, a10g-small, a10g-large,
a100-large, and zero-a10g (the legacy name still used for the ZeroGPU pool, which is
currently backed by H200 hardware). You can also toggle via the Space's Settings β Hardware
page; both paths write to the same field.
Observe build and run logs
HF exposes Server-Sent-Events streams for both phases. They require the user's HF token (read
it from ~/.cache/huggingface/token after hf auth login). Capped-timeout curl works for
point-in-time snapshots:
# Build phase (docker layers, pip install)
curl -s --max-time 20 \
-H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
"https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/build" | tail -80
# Run phase (Python stdout/stderr, Gradio startup)
curl -s --max-time 20 \
-H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
"https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/run" | tail -80
Each event line is JSON with data and timestamp fields. Because it's SSE, the stream is
long-lived β always use --max-time to bound it.
Check high-level state
curl -s "https://huggingface.co/api/spaces/olfronar/megastyle-comparison" | \
python3 -c "import sys, json; d=json.load(sys.stdin); r=d.get('runtime',{}); \
print('stage:', r.get('stage'), 'hardware:', r.get('hardware'), 'sha:', d.get('sha','?')[:7])"
stage values: BUILDING β RUNNING (healthy) or RUNTIME_ERROR / BUILD_FAILED (check logs).
Common fix cycles
- Runtime import / missing dep: edit
requirements.txtor the affected import, commit, push, stream the run logs untilRUNNING. Most fixes don't need a full rebuild β layer caching makes re-pushes with unchanged deps very fast. - Hardware change: one
HfApi().request_space_hardware(...)call β no push, no rebuild. - Stuck build: use the Settings β Factory Reboot UI as a last resort; our Dockerfile is the HF default so factory-reboot is safe.
Working with the Hugging Face Hub
- Model weights (
megastyle_encoder.pth) should be pulled at runtime viahuggingface_hub.hf_hub_download, not committed. The Space's build cache keeps it warm between restarts. - License of the checkpoint is MIT (per
Gaojunyao/MegaStyle); SigLIP is Apache-2.0. The Space's own code should ship under a compatible license. - Paper citation (if surfaced in UI):
gao2026megastylebibtex is in the upstream repo README.
Conventions
- Use
SiglipImageProcessor, notAutoProcessor. Upstream'sstyle_score.pyusesAutoProcessor.from_pretrained(SIGLIP_ID)which loads both the image processor and the SigLIP tokenizer; the tokenizer requiressentencepiece, which isn't in the ZeroGPU base image and crashes at import. Vision-only inference has no use for the tokenizer. LoadSiglipImageProcessor.from_pretrained(SIGLIP_ID)directly β thepixel_valuesoutput is identical. - Control the gradio version via
sdk_versionin README frontmatter, notrequirements.txt. HF's Dockerfile injectsgradio[oauth]==<sdk_version>into the pip install line alongside ourrequirements.txt. Putting a conflictinggradiopin inrequirements.txt(e.g.gradio>=5.25.0) crashes the build withCannot install gradio==<sdk_version> and gradio>=... because these package versions have conflicting dependencies. Bumpsdk_versioninstead;requirements.txtshould not name gradio at all. sdk_version: 5.50.0or newer is required. Gradio 5.9.1 (the version HF defaults to at Space creation) ships agradio_client.utils.get_typethat crashes on boolean JSON schemas (valid JSON-Schema shorthand for accept-anything).gr.Dataframetriggers this in/api/info, flooding run logs withTypeError: argument of type 'bool' is not iterable. Fixed in later 5.x. Also launch withshow_api=Falseas a belt-and-suspenders so the endpoint isn't exposed at all β we don't need programmatic API access for a visual-only demo.- Match upstream preprocessing exactly. If you find yourself tempted to resize, center-crop, or change color-space conversion manually instead of going through the SigLIP image processor, stop β the metric will silently degrade.
- Don't cast weights to fp16/bf16 on load unless explicitly needed. ZeroGPU A10G handles fp32 SigLIP fine; precision changes affect similarity scores measurably.
- Device string: the upstream
style_score.pyprobes forcudaβnpuβcpu. On ZeroGPU it will always becudainside the decorated function; outside (e.g. cold start), keep it on CPU.