Maxim Kruglikov
Bump sdk_version to 5.50.0; drop gradio pin from requirements.txt
6b38433

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project goal

Build a Hugging Face Space (ZeroGPU, Gradio SDK) that lets users upload / pick images and get pairwise or one-vs-many style-similarity scores produced by MegaStyle-Encoder from the paper MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping (arXiv:2604.08364, project page, upstream code).

The Space is a comparison/analysis tool, not a style-transfer demo β€” it should surface a similarity matrix / ranked grid / heatmap, not generate images. The style-transfer sibling model (megastyle_flux.safetensors) is explicitly out of scope unless the user asks for it β€” a ZeroGPU slot can load SigLIP but cannot sanely run a 12B FLUX model.

The model contract (critical β€” easy to get wrong)

MegaStyle-Encoder is not a standalone architecture. It is SigLIP with fine-tuned weights:

  • Backbone: google/siglip-so400m-patch14-384 loaded via transformers.SiglipVisionModel
  • Processor: transformers.AutoProcessor.from_pretrained("google/siglip-so400m-patch14-384")
  • Weights: megastyle_encoder.pth from Gaojunyao/MegaStyle (~857 MB). Loaded with torch.load(..., map_location="cpu"), then:
    • The checkpoint may be either a raw state_dict or {"model": state_dict} β€” handle both.
    • Apply with model.load_state_dict(state, strict=False) (non-strict is correct, upstream does this).
  • Embedding: model(pixel_values=...).pooler_output, then L2-normalize (emb / emb.norm(p=2, dim=-1, keepdim=True)).
  • Similarity: plain dot product of two normalized embeddings (= cosine similarity, range roughly [-1, 1]).

Reference implementation is style_score.py in the upstream repo β€” treat it as the authoritative contract for any inference code in this Space. Do not "improve" the preprocessing, pooling choice, or normalization without strong reason; the checkpoint was trained to produce a useful metric only under exactly this pipeline.

Deployment target: ZeroGPU (but the workload is GPU-light)

This Space targets ZeroGPU (Hugging Face's serverless GPU pool, currently backed by H200) because it's the free GPU tier. The actual workload β€” one SigLIP-so400m forward pass over up to 9 images β€” is small enough to run on CPU in 15–30s, or on any CUDA device in under a second. We're not picking H200 for capability reasons; we take whatever ZeroGPU allocates.

Viable alternatives if ZeroGPU isn't available (no HF Pro, etc.):

  • CPU basic Space (free, no Pro): drop spaces import and @spaces.GPU decorator; everything else stays the same. Comparison takes 15–30s instead of <1s.
  • Dedicated T4/L4/A10G paid tier: no code change; just pick hardware in Space Settings.

ZeroGPU-specific constraints that shape the code today:

  • GPU-using functions must be decorated with @spaces.GPU(duration=...) and the spaces package must be imported before torch. GPU tensors/models cannot live at module scope β€” move .to("cuda") and the forward pass inside the decorated function. Guard MODEL.to(device) with a next(MODEL.parameters()).device != device check so repeat calls don't re-migrate.
  • The model is ~857 MB (megastyle_encoder.pth) + SigLIP ~1.8 GB. Load on CPU at module scope via hf_hub_download; .to(device) inside the decorated function on first invocation.
  • duration on @spaces.GPU is currently 30s (sized for 1 test image + up to 8 references). Over-estimating wastes the user's ZeroGPU daily quota. Bump only if batch grows.
  • ZeroGPU hardware is NOT set in frontmatter β€” it's configured in the Space's Settings β†’ Hardware panel. The hardware: frontmatter key that exists for regular Spaces does not apply here. Don't reintroduce it.
  • Python: ZeroGPU supports only Python 3.10.13 and 3.12.12. Pin python_version: "3.10" in frontmatter.
  • PyTorch: ZeroGPU's supported wheel list is >=2.1.0. Keep the pin compatible.
  • sdk: gradio, sdk_version: 5.9.1 (latest 5.x as of this repo). Gradio 4+ is the only SDK currently supported by ZeroGPU.
  • Requirements live in requirements.txt at repo root: gradio, torch, transformers, Pillow, spaces, huggingface_hub. Pin transformers>=4.45 so SiglipVisionModel exists.

Repo shape

Current layout (single-file app is appropriate at this size):

  • app.py β€” Gradio Blocks UI + @spaces.GPU-decorated inference. Single entry point.
  • requirements.txt β€” dependency pins.
  • README.md β€” HF Space frontmatter (sdk, python_version, license) + user-facing description. Doubles as the Space's landing page.
  • LICENSE β€” MIT. Matches the license: mit claim in README frontmatter and the upstream model's license.
  • .gitignore β€” standard Python + HF cache + Gradio flagged/ directory.
  • CLAUDE.md β€” this file.

Split app.py into a separate megastyle.py module only if inference logic grows past ~150 lines or needs unit tests without Gradio. Not needed today.

UI & scoring conventions (locked by review)

  • Up to 8 reference images. Clipped silently at compute time with a note in the result markdown.

  • Headline score is mean cosine similarity across per-reference scores. Per-reference breakdown is surfaced in a table so users can spot outliers dragging down the mean.

  • Label bands (heuristic, calibrated for SigLIP-family cosine ranges where unrelated images typically sit 0.4–0.6):

    Cosine Label Emoji
    β‰₯ 0.75 Strong style match 🟒
    0.65–0.75 Good style match 🟒
    0.55–0.65 Moderate style match 🟑
    0.45–0.55 Weak style match 🟠
    < 0.45 Minimal style match πŸ”΄

    If you ever have calibration data (e.g., per-style-pair cosine distributions from the MegaStyle dataset), retighten these. Until then, do not represent the label as ground truth β€” the raw cosine is the source of truth.

  • Do not display a pseudo-percentage. An earlier iteration mapped cosine β†’ (x+1)/2 * 100 which compresses useful signal and misleads users. The raw three-decimal cosine + the label is the display contract.

  • Verdict styling uses emoji prefix, not inline HTML <span>. gr.Markdown's HTML handling varies across Gradio versions; emoji is reliably rendered.

Deploying and observing the Space

The live Space is at olfronar/megastyle-comparison (https://huggingface.co/spaces/olfronar/megastyle-comparison). Local working copy is git-backed with origin pointing at the Space's git URL β€” normal git push origin main triggers a rebuild.

Publish a change

git add <files> && git commit -m "..." && git push origin main

For one-off blob pushes without a git clone:

hf upload olfronar/megastyle-comparison . --repo-type space --commit-message "..."

Set hardware (ZeroGPU, etc.)

Hardware is not set via README frontmatter. Use HfApi.request_space_hardware:

from huggingface_hub import HfApi
HfApi().request_space_hardware("olfronar/megastyle-comparison", hardware="zero-a10g")

Valid flavors include cpu-basic, cpu-upgrade, t4-small, a10g-small, a10g-large, a100-large, and zero-a10g (the legacy name still used for the ZeroGPU pool, which is currently backed by H200 hardware). You can also toggle via the Space's Settings β†’ Hardware page; both paths write to the same field.

Observe build and run logs

HF exposes Server-Sent-Events streams for both phases. They require the user's HF token (read it from ~/.cache/huggingface/token after hf auth login). Capped-timeout curl works for point-in-time snapshots:

# Build phase (docker layers, pip install)
curl -s --max-time 20 \
  -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
  "https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/build" | tail -80

# Run phase (Python stdout/stderr, Gradio startup)
curl -s --max-time 20 \
  -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
  "https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/run" | tail -80

Each event line is JSON with data and timestamp fields. Because it's SSE, the stream is long-lived β€” always use --max-time to bound it.

Check high-level state

curl -s "https://huggingface.co/api/spaces/olfronar/megastyle-comparison" | \
  python3 -c "import sys, json; d=json.load(sys.stdin); r=d.get('runtime',{}); \
  print('stage:', r.get('stage'), 'hardware:', r.get('hardware'), 'sha:', d.get('sha','?')[:7])"

stage values: BUILDING β†’ RUNNING (healthy) or RUNTIME_ERROR / BUILD_FAILED (check logs).

Common fix cycles

  • Runtime import / missing dep: edit requirements.txt or the affected import, commit, push, stream the run logs until RUNNING. Most fixes don't need a full rebuild β€” layer caching makes re-pushes with unchanged deps very fast.
  • Hardware change: one HfApi().request_space_hardware(...) call β€” no push, no rebuild.
  • Stuck build: use the Settings β†’ Factory Reboot UI as a last resort; our Dockerfile is the HF default so factory-reboot is safe.

Working with the Hugging Face Hub

  • Model weights (megastyle_encoder.pth) should be pulled at runtime via huggingface_hub.hf_hub_download, not committed. The Space's build cache keeps it warm between restarts.
  • License of the checkpoint is MIT (per Gaojunyao/MegaStyle); SigLIP is Apache-2.0. The Space's own code should ship under a compatible license.
  • Paper citation (if surfaced in UI): gao2026megastyle bibtex is in the upstream repo README.

Conventions

  • Use SiglipImageProcessor, not AutoProcessor. Upstream's style_score.py uses AutoProcessor.from_pretrained(SIGLIP_ID) which loads both the image processor and the SigLIP tokenizer; the tokenizer requires sentencepiece, which isn't in the ZeroGPU base image and crashes at import. Vision-only inference has no use for the tokenizer. Load SiglipImageProcessor.from_pretrained(SIGLIP_ID) directly β€” the pixel_values output is identical.
  • Control the gradio version via sdk_version in README frontmatter, not requirements.txt. HF's Dockerfile injects gradio[oauth]==<sdk_version> into the pip install line alongside our requirements.txt. Putting a conflicting gradio pin in requirements.txt (e.g. gradio>=5.25.0) crashes the build with Cannot install gradio==<sdk_version> and gradio>=... because these package versions have conflicting dependencies. Bump sdk_version instead; requirements.txt should not name gradio at all.
  • sdk_version: 5.50.0 or newer is required. Gradio 5.9.1 (the version HF defaults to at Space creation) ships a gradio_client.utils.get_type that crashes on boolean JSON schemas (valid JSON-Schema shorthand for accept-anything). gr.Dataframe triggers this in /api/info, flooding run logs with TypeError: argument of type 'bool' is not iterable. Fixed in later 5.x. Also launch with show_api=False as a belt-and-suspenders so the endpoint isn't exposed at all β€” we don't need programmatic API access for a visual-only demo.
  • Match upstream preprocessing exactly. If you find yourself tempted to resize, center-crop, or change color-space conversion manually instead of going through the SigLIP image processor, stop β€” the metric will silently degrade.
  • Don't cast weights to fp16/bf16 on load unless explicitly needed. ZeroGPU A10G handles fp32 SigLIP fine; precision changes affect similarity scores measurably.
  • Device string: the upstream style_score.py probes for cuda β†’ npu β†’ cpu. On ZeroGPU it will always be cuda inside the decorated function; outside (e.g. cold start), keep it on CPU.