Spaces:

olfronar
/

megastyle-comparison

Running on Zero

File size: 12,111 Bytes

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project goal

Build a Hugging Face **Space** (ZeroGPU, Gradio SDK) that lets users upload / pick images and get
**pairwise or one-vs-many style-similarity scores** produced by **MegaStyle-Encoder** from the paper
*MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping*
([arXiv:2604.08364](https://arxiv.org/abs/2604.08364), [project page](https://jeoyal.github.io/MegaStyle/),
[upstream code](https://github.com/Tencent/MegaStyle)).

The Space is a **comparison/analysis tool**, not a style-transfer demo — it should surface a
similarity matrix / ranked grid / heatmap, not generate images. The style-transfer sibling model
(`megastyle_flux.safetensors`) is explicitly **out of scope** unless the user asks for it — a ZeroGPU
slot can load SigLIP but cannot sanely run a 12B FLUX model.

## The model contract (critical — easy to get wrong)

MegaStyle-Encoder is **not** a standalone architecture. It is **SigLIP** with fine-tuned weights:

- **Backbone**: `google/siglip-so400m-patch14-384` loaded via `transformers.SiglipVisionModel`
- **Processor**: `transformers.AutoProcessor.from_pretrained("google/siglip-so400m-patch14-384")`
- **Weights**: `megastyle_encoder.pth` from [Gaojunyao/MegaStyle](https://huggingface.co/Gaojunyao/MegaStyle)
  (~857 MB). Loaded with `torch.load(..., map_location="cpu")`, then:
  - The checkpoint may be either a raw `state_dict` or `{"model": state_dict}` — handle both.
  - Apply with `model.load_state_dict(state, strict=False)` (non-strict is correct, upstream does this).
- **Embedding**: `model(pixel_values=...).pooler_output`, then **L2-normalize** (`emb / emb.norm(p=2, dim=-1, keepdim=True)`).
- **Similarity**: plain dot product of two normalized embeddings (= cosine similarity, range roughly [-1, 1]).

Reference implementation is `style_score.py` in the upstream repo — treat it as the authoritative
contract for any inference code in this Space. Do not "improve" the preprocessing, pooling choice, or
normalization without strong reason; the checkpoint was trained to produce a useful metric only under
exactly this pipeline.

## Deployment target: ZeroGPU (but the workload is GPU-light)

This Space targets **ZeroGPU** (Hugging Face's serverless GPU pool, currently backed by H200)
because it's the free GPU tier. The actual workload — one SigLIP-so400m forward pass over up
to 9 images — is small enough to run on CPU in 15–30s, or on any CUDA device in under a second.
We're not picking H200 for capability reasons; we take whatever ZeroGPU allocates.

**Viable alternatives** if ZeroGPU isn't available (no HF Pro, etc.):
- **CPU basic Space** (free, no Pro): drop `spaces` import and `@spaces.GPU` decorator; everything
  else stays the same. Comparison takes 15–30s instead of <1s.
- **Dedicated T4/L4/A10G paid tier**: no code change; just pick hardware in Space Settings.

ZeroGPU-specific constraints that shape the code today:

- GPU-using functions **must** be decorated with `@spaces.GPU(duration=...)` and the `spaces`
  package must be imported **before** `torch`. GPU tensors/models cannot live at module scope —
  move `.to("cuda")` and the forward pass **inside** the decorated function. Guard `MODEL.to(device)`
  with a `next(MODEL.parameters()).device != device` check so repeat calls don't re-migrate.
- The model is ~857 MB (`megastyle_encoder.pth`) + SigLIP ~1.8 GB. Load on CPU at module scope
  via `hf_hub_download`; `.to(device)` inside the decorated function on first invocation.
- **`duration`** on `@spaces.GPU` is currently 30s (sized for 1 test image + up to 8 references).
  Over-estimating wastes the user's ZeroGPU daily quota. Bump only if batch grows.
- **ZeroGPU hardware is NOT set in frontmatter** — it's configured in the Space's
  *Settings → Hardware* panel. The `hardware:` frontmatter key that exists for regular Spaces
  does not apply here. Don't reintroduce it.
- **Python**: ZeroGPU supports only Python **3.10.13** and **3.12.12**. Pin `python_version: "3.10"`
  in frontmatter.
- **PyTorch**: ZeroGPU's supported wheel list is `>=2.1.0`. Keep the pin compatible.
- `sdk: gradio`, `sdk_version: 5.9.1` (latest 5.x as of this repo). Gradio 4+ is the only SDK
  currently supported by ZeroGPU.
- Requirements live in `requirements.txt` at repo root: `gradio`, `torch`, `transformers`,
  `Pillow`, `spaces`, `huggingface_hub`. Pin `transformers>=4.45` so `SiglipVisionModel` exists.

## Repo shape

Current layout (single-file app is appropriate at this size):

- `app.py` — Gradio Blocks UI + `@spaces.GPU`-decorated inference. Single entry point.
- `requirements.txt` — dependency pins.
- `README.md` — HF Space frontmatter (sdk, python_version, license) + user-facing description.
  Doubles as the Space's landing page.
- `LICENSE` — MIT. Matches the `license: mit` claim in README frontmatter and the upstream
  model's license.
- `.gitignore` — standard Python + HF cache + Gradio `flagged/` directory.
- `CLAUDE.md` — this file.

Split `app.py` into a separate `megastyle.py` module only if inference logic grows past ~150 lines
or needs unit tests without Gradio. Not needed today.

## UI & scoring conventions (locked by review)

- **Up to 8 reference images.** Clipped silently at compute time with a note in the result markdown.
- **Headline score is mean cosine similarity** across per-reference scores. Per-reference
  breakdown is surfaced in a table so users can spot outliers dragging down the mean.
- **Label bands** (heuristic, calibrated for SigLIP-family cosine ranges where unrelated images
  typically sit 0.4–0.6):

  | Cosine | Label | Emoji |
  |--------|-------|-------|
  | ≥ 0.75 | Strong style match | 🟢 |
  | 0.65–0.75 | Good style match | 🟢 |
  | 0.55–0.65 | Moderate style match | 🟡 |
  | 0.45–0.55 | Weak style match | 🟠 |
  | < 0.45 | Minimal style match | 🔴 |

  If you ever have calibration data (e.g., per-style-pair cosine distributions from the MegaStyle
  dataset), retighten these. Until then, do not represent the label as ground truth — the raw
  cosine is the source of truth.
- **Do not display a pseudo-percentage.** An earlier iteration mapped cosine → `(x+1)/2 * 100`
  which compresses useful signal and misleads users. The raw three-decimal cosine + the label is
  the display contract.
- **Verdict styling uses emoji prefix, not inline HTML `<span>`.** `gr.Markdown`'s HTML handling
  varies across Gradio versions; emoji is reliably rendered.

## Deploying and observing the Space

The live Space is at **`olfronar/megastyle-comparison`**
(https://huggingface.co/spaces/olfronar/megastyle-comparison). Local working copy is git-backed
with `origin` pointing at the Space's git URL — normal `git push origin main` triggers a rebuild.

### Publish a change

```bash
git add <files> && git commit -m "..." && git push origin main
```

For one-off blob pushes without a git clone:

```bash
hf upload olfronar/megastyle-comparison . --repo-type space --commit-message "..."
```

### Set hardware (ZeroGPU, etc.)

Hardware is *not* set via README frontmatter. Use `HfApi.request_space_hardware`:

```python
from huggingface_hub import HfApi
HfApi().request_space_hardware("olfronar/megastyle-comparison", hardware="zero-a10g")
```

Valid flavors include `cpu-basic`, `cpu-upgrade`, `t4-small`, `a10g-small`, `a10g-large`,
`a100-large`, and `zero-a10g` (the legacy name still used for the ZeroGPU pool, which is
currently backed by H200 hardware). You can also toggle via the Space's *Settings → Hardware*
page; both paths write to the same field.

### Observe build and run logs

HF exposes Server-Sent-Events streams for both phases. They require the user's HF token (read
it from `~/.cache/huggingface/token` after `hf auth login`). Capped-timeout curl works for
point-in-time snapshots:

```bash
# Build phase (docker layers, pip install)
curl -s --max-time 20 \
  -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
  "https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/build" | tail -80

# Run phase (Python stdout/stderr, Gradio startup)
curl -s --max-time 20 \
  -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
  "https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/run" | tail -80
```

Each event line is JSON with `data` and `timestamp` fields. Because it's SSE, the stream is
long-lived — always use `--max-time` to bound it.

### Check high-level state

```bash
curl -s "https://huggingface.co/api/spaces/olfronar/megastyle-comparison" | \
  python3 -c "import sys, json; d=json.load(sys.stdin); r=d.get('runtime',{}); \
  print('stage:', r.get('stage'), 'hardware:', r.get('hardware'), 'sha:', d.get('sha','?')[:7])"
```

`stage` values: `BUILDING` → `RUNNING` (healthy) or `RUNTIME_ERROR` / `BUILD_FAILED` (check logs).

### Common fix cycles

- **Runtime import / missing dep**: edit `requirements.txt` or the affected import, commit, push,
  stream the run logs until `RUNNING`. Most fixes don't need a full rebuild — layer caching makes
  re-pushes with unchanged deps very fast.
- **Hardware change**: one `HfApi().request_space_hardware(...)` call — no push, no rebuild.
- **Stuck build**: use the *Settings → Factory Reboot* UI as a last resort; our Dockerfile is the
  HF default so factory-reboot is safe.

## Working with the Hugging Face Hub

- Model weights (`megastyle_encoder.pth`) should be pulled at runtime via `huggingface_hub.hf_hub_download`,
  not committed. The Space's build cache keeps it warm between restarts.
- License of the checkpoint is MIT (per `Gaojunyao/MegaStyle`); SigLIP is Apache-2.0. The Space's own
  code should ship under a compatible license.
- Paper citation (if surfaced in UI): `gao2026megastyle` bibtex is in the upstream repo README.

## Conventions

- **Use `SiglipImageProcessor`, not `AutoProcessor`.** Upstream's `style_score.py` uses
  `AutoProcessor.from_pretrained(SIGLIP_ID)` which loads both the image processor *and* the SigLIP
  tokenizer; the tokenizer requires `sentencepiece`, which isn't in the ZeroGPU base image and
  crashes at import. Vision-only inference has no use for the tokenizer. Load
  `SiglipImageProcessor.from_pretrained(SIGLIP_ID)` directly — the `pixel_values` output is
  identical.
- **Control the gradio version via `sdk_version` in README frontmatter, not `requirements.txt`.**
  HF's Dockerfile injects `gradio[oauth]==<sdk_version>` into the pip install line alongside our
  `requirements.txt`. Putting a conflicting `gradio` pin in `requirements.txt` (e.g.
  `gradio>=5.25.0`) crashes the build with `Cannot install gradio==<sdk_version> and gradio>=...
  because these package versions have conflicting dependencies`. Bump `sdk_version` instead;
  `requirements.txt` should not name gradio at all.
- **`sdk_version: 5.50.0` or newer is required.** Gradio 5.9.1 (the version HF defaults to at
  Space creation) ships a `gradio_client.utils.get_type` that crashes on boolean JSON schemas
  (valid JSON-Schema shorthand for accept-anything). `gr.Dataframe` triggers this in `/api/info`,
  flooding run logs with `TypeError: argument of type 'bool' is not iterable`. Fixed in later
  5.x. Also launch with `show_api=False` as a belt-and-suspenders so the endpoint isn't exposed
  at all — we don't need programmatic API access for a visual-only demo.
- **Match upstream preprocessing exactly.** If you find yourself tempted to resize, center-crop, or
  change color-space conversion manually instead of going through the SigLIP image processor, stop
  — the metric will silently degrade.
- **Don't cast weights to fp16/bf16 on load unless explicitly needed.** ZeroGPU A10G handles fp32
  SigLIP fine; precision changes affect similarity scores measurably.
- **Device string**: the upstream `style_score.py` probes for `cuda` → `npu` → `cpu`. On ZeroGPU it
  will always be `cuda` inside the decorated function; outside (e.g. cold start), keep it on CPU.