Spaces:

olfronar
/

megastyle-comparison

Running on Zero

App Files Files Community

megastyle-comparison / CLAUDE.md

Maxim Kruglikov

Bump sdk_version to 5.50.0; drop gradio pin from requirements.txt

6b38433 17 days ago

preview code

raw

history blame contribute delete

12.1 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project goal

Build a Hugging Face Space (ZeroGPU, Gradio SDK) that lets users upload / pick images and get pairwise or one-vs-many style-similarity scores produced by MegaStyle-Encoder from the paper MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping (arXiv:2604.08364, project page, upstream code).

The Space is a comparison/analysis tool, not a style-transfer demo — it should surface a similarity matrix / ranked grid / heatmap, not generate images. The style-transfer sibling model (megastyle_flux.safetensors) is explicitly out of scope unless the user asks for it — a ZeroGPU slot can load SigLIP but cannot sanely run a 12B FLUX model.

The model contract (critical — easy to get wrong)

MegaStyle-Encoder is not a standalone architecture. It is SigLIP with fine-tuned weights:

Backbone: google/siglip-so400m-patch14-384 loaded via transformers.SiglipVisionModel
Processor: transformers.AutoProcessor.from_pretrained("google/siglip-so400m-patch14-384")
Weights: megastyle_encoder.pth from Gaojunyao/MegaStyle (~857 MB). Loaded with torch.load(..., map_location="cpu"), then:
- The checkpoint may be either a raw state_dict or {"model": state_dict} — handle both.
- Apply with model.load_state_dict(state, strict=False) (non-strict is correct, upstream does this).
Embedding: model(pixel_values=...).pooler_output, then L2-normalize (emb / emb.norm(p=2, dim=-1, keepdim=True)).
Similarity: plain dot product of two normalized embeddings (= cosine similarity, range roughly [-1, 1]).

Reference implementation is style_score.py in the upstream repo — treat it as the authoritative contract for any inference code in this Space. Do not "improve" the preprocessing, pooling choice, or normalization without strong reason; the checkpoint was trained to produce a useful metric only under exactly this pipeline.

Deployment target: ZeroGPU (but the workload is GPU-light)

This Space targets ZeroGPU (Hugging Face's serverless GPU pool, currently backed by H200) because it's the free GPU tier. The actual workload — one SigLIP-so400m forward pass over up to 9 images — is small enough to run on CPU in 15–30s, or on any CUDA device in under a second. We're not picking H200 for capability reasons; we take whatever ZeroGPU allocates.

Viable alternatives if ZeroGPU isn't available (no HF Pro, etc.):

CPU basic Space (free, no Pro): drop spaces import and @spaces.GPU decorator; everything else stays the same. Comparison takes 15–30s instead of <1s.
Dedicated T4/L4/A10G paid tier: no code change; just pick hardware in Space Settings.

ZeroGPU-specific constraints that shape the code today:

GPU-using functions must be decorated with @spaces.GPU(duration=...) and the spaces package must be imported before torch. GPU tensors/models cannot live at module scope — move .to("cuda") and the forward pass inside the decorated function. Guard MODEL.to(device) with a next(MODEL.parameters()).device != device check so repeat calls don't re-migrate.
The model is ~857 MB (megastyle_encoder.pth) + SigLIP ~1.8 GB. Load on CPU at module scope via hf_hub_download; .to(device) inside the decorated function on first invocation.
duration on @spaces.GPU is currently 30s (sized for 1 test image + up to 8 references). Over-estimating wastes the user's ZeroGPU daily quota. Bump only if batch grows.
ZeroGPU hardware is NOT set in frontmatter — it's configured in the Space's Settings → Hardware panel. The hardware: frontmatter key that exists for regular Spaces does not apply here. Don't reintroduce it.
Python: ZeroGPU supports only Python 3.10.13 and 3.12.12. Pin python_version: "3.10" in frontmatter.
PyTorch: ZeroGPU's supported wheel list is >=2.1.0. Keep the pin compatible.
sdk: gradio, sdk_version: 5.9.1 (latest 5.x as of this repo). Gradio 4+ is the only SDK currently supported by ZeroGPU.
Requirements live in requirements.txt at repo root: gradio, torch, transformers, Pillow, spaces, huggingface_hub. Pin transformers>=4.45 so SiglipVisionModel exists.

Repo shape

Current layout (single-file app is appropriate at this size):

app.py — Gradio Blocks UI + @spaces.GPU-decorated inference. Single entry point.
requirements.txt — dependency pins.
README.md — HF Space frontmatter (sdk, python_version, license) + user-facing description. Doubles as the Space's landing page.
LICENSE — MIT. Matches the license: mit claim in README frontmatter and the upstream model's license.
.gitignore — standard Python + HF cache + Gradio flagged/ directory.
CLAUDE.md — this file.

Split app.py into a separate megastyle.py module only if inference logic grows past ~150 lines or needs unit tests without Gradio. Not needed today.

UI & scoring conventions (locked by review)

Up to 8 reference images. Clipped silently at compute time with a note in the result markdown.
Headline score is mean cosine similarity across per-reference scores. Per-reference breakdown is surfaced in a table so users can spot outliers dragging down the mean.

Label bands (heuristic, calibrated for SigLIP-family cosine ranges where unrelated images typically sit 0.4–0.6):

Cosine	Label	Emoji
≥ 0.75	Strong style match	🟢
0.65–0.75	Good style match	🟢
0.55–0.65	Moderate style match	🟡
0.45–0.55	Weak style match	🟠
< 0.45	Minimal style match	🔴

If you ever have calibration data (e.g., per-style-pair cosine distributions from the MegaStyle dataset), retighten these. Until then, do not represent the label as ground truth — the raw cosine is the source of truth.

Do not display a pseudo-percentage. An earlier iteration mapped cosine → (x+1)/2 * 100 which compresses useful signal and misleads users. The raw three-decimal cosine + the label is the display contract.
Verdict styling uses emoji prefix, not inline HTML <span>. gr.Markdown's HTML handling varies across Gradio versions; emoji is reliably rendered.

Deploying and observing the Space

The live Space is at olfronar/megastyle-comparison (https://huggingface.co/spaces/olfronar/megastyle-comparison). Local working copy is git-backed with origin pointing at the Space's git URL — normal git push origin main triggers a rebuild.

Publish a change

git add <files> && git commit -m "..." && git push origin main

For one-off blob pushes without a git clone:

hf upload olfronar/megastyle-comparison . --repo-type space --commit-message "..."

Set hardware (ZeroGPU, etc.)

Hardware is not set via README frontmatter. Use HfApi.request_space_hardware:

from huggingface_hub import HfApi
HfApi().request_space_hardware("olfronar/megastyle-comparison", hardware="zero-a10g")

Valid flavors include cpu-basic, cpu-upgrade, t4-small, a10g-small, a10g-large, a100-large, and zero-a10g (the legacy name still used for the ZeroGPU pool, which is currently backed by H200 hardware). You can also toggle via the Space's Settings → Hardware page; both paths write to the same field.

Observe build and run logs

HF exposes Server-Sent-Events streams for both phases. They require the user's HF token (read it from ~/.cache/huggingface/token after hf auth login). Capped-timeout curl works for point-in-time snapshots:

# Build phase (docker layers, pip install)
curl -s --max-time 20 \
  -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
  "https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/build" | tail -80

# Run phase (Python stdout/stderr, Gradio startup)
curl -s --max-time 20 \
  -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
  "https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/run" | tail -80

Each event line is JSON with data and timestamp fields. Because it's SSE, the stream is long-lived — always use --max-time to bound it.

Check high-level state

curl -s "https://huggingface.co/api/spaces/olfronar/megastyle-comparison" | \
  python3 -c "import sys, json; d=json.load(sys.stdin); r=d.get('runtime',{}); \
  print('stage:', r.get('stage'), 'hardware:', r.get('hardware'), 'sha:', d.get('sha','?')[:7])"

stage values: BUILDING → RUNNING (healthy) or RUNTIME_ERROR / BUILD_FAILED (check logs).

Common fix cycles

Runtime import / missing dep: edit requirements.txt or the affected import, commit, push, stream the run logs until RUNNING. Most fixes don't need a full rebuild — layer caching makes re-pushes with unchanged deps very fast.
Hardware change: one HfApi().request_space_hardware(...) call — no push, no rebuild.
Stuck build: use the Settings → Factory Reboot UI as a last resort; our Dockerfile is the HF default so factory-reboot is safe.

Working with the Hugging Face Hub

Model weights (megastyle_encoder.pth) should be pulled at runtime via huggingface_hub.hf_hub_download, not committed. The Space's build cache keeps it warm between restarts.
License of the checkpoint is MIT (per Gaojunyao/MegaStyle); SigLIP is Apache-2.0. The Space's own code should ship under a compatible license.
Paper citation (if surfaced in UI): gao2026megastyle bibtex is in the upstream repo README.

Conventions

Use SiglipImageProcessor, not AutoProcessor. Upstream's style_score.py uses AutoProcessor.from_pretrained(SIGLIP_ID) which loads both the image processor and the SigLIP tokenizer; the tokenizer requires sentencepiece, which isn't in the ZeroGPU base image and crashes at import. Vision-only inference has no use for the tokenizer. Load SiglipImageProcessor.from_pretrained(SIGLIP_ID) directly — the pixel_values output is identical.
Control the gradio version via sdk_version in README frontmatter, not requirements.txt. HF's Dockerfile injects gradio[oauth]==<sdk_version> into the pip install line alongside our requirements.txt. Putting a conflicting gradio pin in requirements.txt (e.g. gradio>=5.25.0) crashes the build with Cannot install gradio==<sdk_version> and gradio>=... because these package versions have conflicting dependencies. Bump sdk_version instead; requirements.txt should not name gradio at all.
sdk_version: 5.50.0 or newer is required. Gradio 5.9.1 (the version HF defaults to at Space creation) ships a gradio_client.utils.get_type that crashes on boolean JSON schemas (valid JSON-Schema shorthand for accept-anything). gr.Dataframe triggers this in /api/info, flooding run logs with TypeError: argument of type 'bool' is not iterable. Fixed in later 5.x. Also launch with show_api=False as a belt-and-suspenders so the endpoint isn't exposed at all — we don't need programmatic API access for a visual-only demo.
Match upstream preprocessing exactly. If you find yourself tempted to resize, center-crop, or change color-space conversion manually instead of going through the SigLIP image processor, stop — the metric will silently degrade.
Don't cast weights to fp16/bf16 on load unless explicitly needed. ZeroGPU A10G handles fp32 SigLIP fine; precision changes affect similarity scores measurably.
Device string: the upstream style_score.py probes for cuda → npu → cpu. On ZeroGPU it will always be cuda inside the decorated function; outside (e.g. cold start), keep it on CPU.