Maxim Kruglikov
Bump sdk_version to 5.50.0; drop gradio pin from requirements.txt
6b38433
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project goal
Build a Hugging Face **Space** (ZeroGPU, Gradio SDK) that lets users upload / pick images and get
**pairwise or one-vs-many style-similarity scores** produced by **MegaStyle-Encoder** from the paper
*MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping*
([arXiv:2604.08364](https://arxiv.org/abs/2604.08364), [project page](https://jeoyal.github.io/MegaStyle/),
[upstream code](https://github.com/Tencent/MegaStyle)).
The Space is a **comparison/analysis tool**, not a style-transfer demo β€” it should surface a
similarity matrix / ranked grid / heatmap, not generate images. The style-transfer sibling model
(`megastyle_flux.safetensors`) is explicitly **out of scope** unless the user asks for it β€” a ZeroGPU
slot can load SigLIP but cannot sanely run a 12B FLUX model.
## The model contract (critical β€” easy to get wrong)
MegaStyle-Encoder is **not** a standalone architecture. It is **SigLIP** with fine-tuned weights:
- **Backbone**: `google/siglip-so400m-patch14-384` loaded via `transformers.SiglipVisionModel`
- **Processor**: `transformers.AutoProcessor.from_pretrained("google/siglip-so400m-patch14-384")`
- **Weights**: `megastyle_encoder.pth` from [Gaojunyao/MegaStyle](https://huggingface.co/Gaojunyao/MegaStyle)
(~857 MB). Loaded with `torch.load(..., map_location="cpu")`, then:
- The checkpoint may be either a raw `state_dict` or `{"model": state_dict}` β€” handle both.
- Apply with `model.load_state_dict(state, strict=False)` (non-strict is correct, upstream does this).
- **Embedding**: `model(pixel_values=...).pooler_output`, then **L2-normalize** (`emb / emb.norm(p=2, dim=-1, keepdim=True)`).
- **Similarity**: plain dot product of two normalized embeddings (= cosine similarity, range roughly [-1, 1]).
Reference implementation is `style_score.py` in the upstream repo β€” treat it as the authoritative
contract for any inference code in this Space. Do not "improve" the preprocessing, pooling choice, or
normalization without strong reason; the checkpoint was trained to produce a useful metric only under
exactly this pipeline.
## Deployment target: ZeroGPU (but the workload is GPU-light)
This Space targets **ZeroGPU** (Hugging Face's serverless GPU pool, currently backed by H200)
because it's the free GPU tier. The actual workload β€” one SigLIP-so400m forward pass over up
to 9 images β€” is small enough to run on CPU in 15–30s, or on any CUDA device in under a second.
We're not picking H200 for capability reasons; we take whatever ZeroGPU allocates.
**Viable alternatives** if ZeroGPU isn't available (no HF Pro, etc.):
- **CPU basic Space** (free, no Pro): drop `spaces` import and `@spaces.GPU` decorator; everything
else stays the same. Comparison takes 15–30s instead of <1s.
- **Dedicated T4/L4/A10G paid tier**: no code change; just pick hardware in Space Settings.
ZeroGPU-specific constraints that shape the code today:
- GPU-using functions **must** be decorated with `@spaces.GPU(duration=...)` and the `spaces`
package must be imported **before** `torch`. GPU tensors/models cannot live at module scope β€”
move `.to("cuda")` and the forward pass **inside** the decorated function. Guard `MODEL.to(device)`
with a `next(MODEL.parameters()).device != device` check so repeat calls don't re-migrate.
- The model is ~857 MB (`megastyle_encoder.pth`) + SigLIP ~1.8 GB. Load on CPU at module scope
via `hf_hub_download`; `.to(device)` inside the decorated function on first invocation.
- **`duration`** on `@spaces.GPU` is currently 30s (sized for 1 test image + up to 8 references).
Over-estimating wastes the user's ZeroGPU daily quota. Bump only if batch grows.
- **ZeroGPU hardware is NOT set in frontmatter** β€” it's configured in the Space's
*Settings β†’ Hardware* panel. The `hardware:` frontmatter key that exists for regular Spaces
does not apply here. Don't reintroduce it.
- **Python**: ZeroGPU supports only Python **3.10.13** and **3.12.12**. Pin `python_version: "3.10"`
in frontmatter.
- **PyTorch**: ZeroGPU's supported wheel list is `>=2.1.0`. Keep the pin compatible.
- `sdk: gradio`, `sdk_version: 5.9.1` (latest 5.x as of this repo). Gradio 4+ is the only SDK
currently supported by ZeroGPU.
- Requirements live in `requirements.txt` at repo root: `gradio`, `torch`, `transformers`,
`Pillow`, `spaces`, `huggingface_hub`. Pin `transformers>=4.45` so `SiglipVisionModel` exists.
## Repo shape
Current layout (single-file app is appropriate at this size):
- `app.py` β€” Gradio Blocks UI + `@spaces.GPU`-decorated inference. Single entry point.
- `requirements.txt` β€” dependency pins.
- `README.md` β€” HF Space frontmatter (sdk, python_version, license) + user-facing description.
Doubles as the Space's landing page.
- `LICENSE` β€” MIT. Matches the `license: mit` claim in README frontmatter and the upstream
model's license.
- `.gitignore` β€” standard Python + HF cache + Gradio `flagged/` directory.
- `CLAUDE.md` β€” this file.
Split `app.py` into a separate `megastyle.py` module only if inference logic grows past ~150 lines
or needs unit tests without Gradio. Not needed today.
## UI & scoring conventions (locked by review)
- **Up to 8 reference images.** Clipped silently at compute time with a note in the result markdown.
- **Headline score is mean cosine similarity** across per-reference scores. Per-reference
breakdown is surfaced in a table so users can spot outliers dragging down the mean.
- **Label bands** (heuristic, calibrated for SigLIP-family cosine ranges where unrelated images
typically sit 0.4–0.6):
| Cosine | Label | Emoji |
|--------|-------|-------|
| β‰₯ 0.75 | Strong style match | 🟒 |
| 0.65–0.75 | Good style match | 🟒 |
| 0.55–0.65 | Moderate style match | 🟑 |
| 0.45–0.55 | Weak style match | 🟠 |
| < 0.45 | Minimal style match | πŸ”΄ |
If you ever have calibration data (e.g., per-style-pair cosine distributions from the MegaStyle
dataset), retighten these. Until then, do not represent the label as ground truth β€” the raw
cosine is the source of truth.
- **Do not display a pseudo-percentage.** An earlier iteration mapped cosine β†’ `(x+1)/2 * 100`
which compresses useful signal and misleads users. The raw three-decimal cosine + the label is
the display contract.
- **Verdict styling uses emoji prefix, not inline HTML `<span>`.** `gr.Markdown`'s HTML handling
varies across Gradio versions; emoji is reliably rendered.
## Deploying and observing the Space
The live Space is at **`olfronar/megastyle-comparison`**
(https://huggingface.co/spaces/olfronar/megastyle-comparison). Local working copy is git-backed
with `origin` pointing at the Space's git URL β€” normal `git push origin main` triggers a rebuild.
### Publish a change
```bash
git add <files> && git commit -m "..." && git push origin main
```
For one-off blob pushes without a git clone:
```bash
hf upload olfronar/megastyle-comparison . --repo-type space --commit-message "..."
```
### Set hardware (ZeroGPU, etc.)
Hardware is *not* set via README frontmatter. Use `HfApi.request_space_hardware`:
```python
from huggingface_hub import HfApi
HfApi().request_space_hardware("olfronar/megastyle-comparison", hardware="zero-a10g")
```
Valid flavors include `cpu-basic`, `cpu-upgrade`, `t4-small`, `a10g-small`, `a10g-large`,
`a100-large`, and `zero-a10g` (the legacy name still used for the ZeroGPU pool, which is
currently backed by H200 hardware). You can also toggle via the Space's *Settings β†’ Hardware*
page; both paths write to the same field.
### Observe build and run logs
HF exposes Server-Sent-Events streams for both phases. They require the user's HF token (read
it from `~/.cache/huggingface/token` after `hf auth login`). Capped-timeout curl works for
point-in-time snapshots:
```bash
# Build phase (docker layers, pip install)
curl -s --max-time 20 \
-H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
"https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/build" | tail -80
# Run phase (Python stdout/stderr, Gradio startup)
curl -s --max-time 20 \
-H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
"https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/run" | tail -80
```
Each event line is JSON with `data` and `timestamp` fields. Because it's SSE, the stream is
long-lived β€” always use `--max-time` to bound it.
### Check high-level state
```bash
curl -s "https://huggingface.co/api/spaces/olfronar/megastyle-comparison" | \
python3 -c "import sys, json; d=json.load(sys.stdin); r=d.get('runtime',{}); \
print('stage:', r.get('stage'), 'hardware:', r.get('hardware'), 'sha:', d.get('sha','?')[:7])"
```
`stage` values: `BUILDING` β†’ `RUNNING` (healthy) or `RUNTIME_ERROR` / `BUILD_FAILED` (check logs).
### Common fix cycles
- **Runtime import / missing dep**: edit `requirements.txt` or the affected import, commit, push,
stream the run logs until `RUNNING`. Most fixes don't need a full rebuild β€” layer caching makes
re-pushes with unchanged deps very fast.
- **Hardware change**: one `HfApi().request_space_hardware(...)` call β€” no push, no rebuild.
- **Stuck build**: use the *Settings β†’ Factory Reboot* UI as a last resort; our Dockerfile is the
HF default so factory-reboot is safe.
## Working with the Hugging Face Hub
- Model weights (`megastyle_encoder.pth`) should be pulled at runtime via `huggingface_hub.hf_hub_download`,
not committed. The Space's build cache keeps it warm between restarts.
- License of the checkpoint is MIT (per `Gaojunyao/MegaStyle`); SigLIP is Apache-2.0. The Space's own
code should ship under a compatible license.
- Paper citation (if surfaced in UI): `gao2026megastyle` bibtex is in the upstream repo README.
## Conventions
- **Use `SiglipImageProcessor`, not `AutoProcessor`.** Upstream's `style_score.py` uses
`AutoProcessor.from_pretrained(SIGLIP_ID)` which loads both the image processor *and* the SigLIP
tokenizer; the tokenizer requires `sentencepiece`, which isn't in the ZeroGPU base image and
crashes at import. Vision-only inference has no use for the tokenizer. Load
`SiglipImageProcessor.from_pretrained(SIGLIP_ID)` directly β€” the `pixel_values` output is
identical.
- **Control the gradio version via `sdk_version` in README frontmatter, not `requirements.txt`.**
HF's Dockerfile injects `gradio[oauth]==<sdk_version>` into the pip install line alongside our
`requirements.txt`. Putting a conflicting `gradio` pin in `requirements.txt` (e.g.
`gradio>=5.25.0`) crashes the build with `Cannot install gradio==<sdk_version> and gradio>=...
because these package versions have conflicting dependencies`. Bump `sdk_version` instead;
`requirements.txt` should not name gradio at all.
- **`sdk_version: 5.50.0` or newer is required.** Gradio 5.9.1 (the version HF defaults to at
Space creation) ships a `gradio_client.utils.get_type` that crashes on boolean JSON schemas
(valid JSON-Schema shorthand for accept-anything). `gr.Dataframe` triggers this in `/api/info`,
flooding run logs with `TypeError: argument of type 'bool' is not iterable`. Fixed in later
5.x. Also launch with `show_api=False` as a belt-and-suspenders so the endpoint isn't exposed
at all β€” we don't need programmatic API access for a visual-only demo.
- **Match upstream preprocessing exactly.** If you find yourself tempted to resize, center-crop, or
change color-space conversion manually instead of going through the SigLIP image processor, stop
β€” the metric will silently degrade.
- **Don't cast weights to fp16/bf16 on load unless explicitly needed.** ZeroGPU A10G handles fp32
SigLIP fine; precision changes affect similarity scores measurably.
- **Device string**: the upstream `style_score.py` probes for `cuda` β†’ `npu` β†’ `cpu`. On ZeroGPU it
will always be `cuda` inside the decorated function; outside (e.g. cold start), keep it on CPU.