Spaces:

olfronar
/

megastyle-comparison

Running on Zero

App Files Files Community

megastyle-comparison / CLAUDE.md

Maxim Kruglikov

Bump sdk_version to 5.50.0; drop gradio pin from requirements.txt

6b38433 17 days ago

preview code

raw

history blame contribute delete

12.1 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Project goal

	Build a Hugging Face Space (ZeroGPU, Gradio SDK) that lets users upload / pick images and get
	pairwise or one-vs-many style-similarity scores produced by MegaStyle-Encoder from the paper
	MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping
	([arXiv:2604.08364](https://arxiv.org/abs/2604.08364), [project page](https://jeoyal.github.io/MegaStyle/),
	[upstream code](https://github.com/Tencent/MegaStyle)).

	The Space is a comparison/analysis tool, not a style-transfer demo — it should surface a
	similarity matrix / ranked grid / heatmap, not generate images. The style-transfer sibling model
	(`megastyle_flux.safetensors`) is explicitly out of scope unless the user asks for it — a ZeroGPU
	slot can load SigLIP but cannot sanely run a 12B FLUX model.

	## The model contract (critical — easy to get wrong)

	MegaStyle-Encoder is not a standalone architecture. It is SigLIP with fine-tuned weights:

	- Backbone: `google/siglip-so400m-patch14-384` loaded via `transformers.SiglipVisionModel`
	- Processor: `transformers.AutoProcessor.from_pretrained("google/siglip-so400m-patch14-384")`
	- Weights: `megastyle_encoder.pth` from [Gaojunyao/MegaStyle](https://huggingface.co/Gaojunyao/MegaStyle)
	(~857 MB). Loaded with `torch.load(..., map_location="cpu")`, then:
	- The checkpoint may be either a raw `state_dict` or `{"model": state_dict}` — handle both.
	- Apply with `model.load_state_dict(state, strict=False)` (non-strict is correct, upstream does this).
	- Embedding: `model(pixel_values=...).pooler_output`, then L2-normalize (`emb / emb.norm(p=2, dim=-1, keepdim=True)`).
	- Similarity: plain dot product of two normalized embeddings (= cosine similarity, range roughly [-1, 1]).

	Reference implementation is `style_score.py` in the upstream repo — treat it as the authoritative
	contract for any inference code in this Space. Do not "improve" the preprocessing, pooling choice, or
	normalization without strong reason; the checkpoint was trained to produce a useful metric only under
	exactly this pipeline.

	## Deployment target: ZeroGPU (but the workload is GPU-light)

	This Space targets ZeroGPU (Hugging Face's serverless GPU pool, currently backed by H200)
	because it's the free GPU tier. The actual workload — one SigLIP-so400m forward pass over up
	to 9 images — is small enough to run on CPU in 15–30s, or on any CUDA device in under a second.
	We're not picking H200 for capability reasons; we take whatever ZeroGPU allocates.

	Viable alternatives if ZeroGPU isn't available (no HF Pro, etc.):
	- CPU basic Space (free, no Pro): drop `spaces` import and `@spaces.GPU` decorator; everything
	else stays the same. Comparison takes 15–30s instead of <1s.
	- Dedicated T4/L4/A10G paid tier: no code change; just pick hardware in Space Settings.

	ZeroGPU-specific constraints that shape the code today:

	- GPU-using functions must be decorated with `@spaces.GPU(duration=...)` and the `spaces`
	package must be imported before `torch`. GPU tensors/models cannot live at module scope —
	move `.to("cuda")` and the forward pass inside the decorated function. Guard `MODEL.to(device)`
	with a `next(MODEL.parameters()).device != device` check so repeat calls don't re-migrate.
	- The model is ~857 MB (`megastyle_encoder.pth`) + SigLIP ~1.8 GB. Load on CPU at module scope
	via `hf_hub_download`; `.to(device)` inside the decorated function on first invocation.
	- `duration` on `@spaces.GPU` is currently 30s (sized for 1 test image + up to 8 references).
	Over-estimating wastes the user's ZeroGPU daily quota. Bump only if batch grows.
	- ZeroGPU hardware is NOT set in frontmatter — it's configured in the Space's
	Settings → Hardware panel. The `hardware:` frontmatter key that exists for regular Spaces
	does not apply here. Don't reintroduce it.
	- Python: ZeroGPU supports only Python 3.10.13 and 3.12.12. Pin `python_version: "3.10"`
	in frontmatter.
	- PyTorch: ZeroGPU's supported wheel list is `>=2.1.0`. Keep the pin compatible.
	- `sdk: gradio`, `sdk_version: 5.9.1` (latest 5.x as of this repo). Gradio 4+ is the only SDK
	currently supported by ZeroGPU.
	- Requirements live in `requirements.txt` at repo root: `gradio`, `torch`, `transformers`,
	`Pillow`, `spaces`, `huggingface_hub`. Pin `transformers>=4.45` so `SiglipVisionModel` exists.

	## Repo shape

	Current layout (single-file app is appropriate at this size):

	- `app.py` — Gradio Blocks UI + `@spaces.GPU`-decorated inference. Single entry point.
	- `requirements.txt` — dependency pins.
	- `README.md` — HF Space frontmatter (sdk, python_version, license) + user-facing description.
	Doubles as the Space's landing page.
	- `LICENSE` — MIT. Matches the `license: mit` claim in README frontmatter and the upstream
	model's license.
	- `.gitignore` — standard Python + HF cache + Gradio `flagged/` directory.
	- `CLAUDE.md` — this file.

	Split `app.py` into a separate `megastyle.py` module only if inference logic grows past ~150 lines
	or needs unit tests without Gradio. Not needed today.

	## UI & scoring conventions (locked by review)

	- Up to 8 reference images. Clipped silently at compute time with a note in the result markdown.
	- Headline score is mean cosine similarity across per-reference scores. Per-reference
	breakdown is surfaced in a table so users can spot outliers dragging down the mean.
	- Label bands (heuristic, calibrated for SigLIP-family cosine ranges where unrelated images
	typically sit 0.4–0.6):

	\| Cosine \| Label \| Emoji \|
	\|--------\|-------\|-------\|
	\| ≥ 0.75 \| Strong style match \| 🟢 \|
	\| 0.65–0.75 \| Good style match \| 🟢 \|
	\| 0.55–0.65 \| Moderate style match \| 🟡 \|
	\| 0.45–0.55 \| Weak style match \| 🟠 \|
	\| < 0.45 \| Minimal style match \| 🔴 \|

	If you ever have calibration data (e.g., per-style-pair cosine distributions from the MegaStyle
	dataset), retighten these. Until then, do not represent the label as ground truth — the raw
	cosine is the source of truth.
	- Do not display a pseudo-percentage. An earlier iteration mapped cosine → `(x+1)/2 * 100`
	which compresses useful signal and misleads users. The raw three-decimal cosine + the label is
	the display contract.
	- Verdict styling uses emoji prefix, not inline HTML `<span>`. `gr.Markdown`'s HTML handling
	varies across Gradio versions; emoji is reliably rendered.

	## Deploying and observing the Space

	The live Space is at `olfronar/megastyle-comparison`
	(https://huggingface.co/spaces/olfronar/megastyle-comparison). Local working copy is git-backed
	with `origin` pointing at the Space's git URL — normal `git push origin main` triggers a rebuild.

	### Publish a change

	```bash
	git add <files> && git commit -m "..." && git push origin main
	```

	For one-off blob pushes without a git clone:

	```bash
	hf upload olfronar/megastyle-comparison . --repo-type space --commit-message "..."
	```

	### Set hardware (ZeroGPU, etc.)

	Hardware is not set via README frontmatter. Use `HfApi.request_space_hardware`:

	```python
	from huggingface_hub import HfApi
	HfApi().request_space_hardware("olfronar/megastyle-comparison", hardware="zero-a10g")
	```

	Valid flavors include `cpu-basic`, `cpu-upgrade`, `t4-small`, `a10g-small`, `a10g-large`,
	`a100-large`, and `zero-a10g` (the legacy name still used for the ZeroGPU pool, which is
	currently backed by H200 hardware). You can also toggle via the Space's Settings → Hardware
	page; both paths write to the same field.

	### Observe build and run logs

	HF exposes Server-Sent-Events streams for both phases. They require the user's HF token (read
	it from `~/.cache/huggingface/token` after `hf auth login`). Capped-timeout curl works for
	point-in-time snapshots:

	```bash
	# Build phase (docker layers, pip install)
	curl -s --max-time 20 \
	-H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
	"https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/build" \| tail -80

	# Run phase (Python stdout/stderr, Gradio startup)
	curl -s --max-time 20 \
	-H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
	"https://huggingface.co/api/spaces/olfronar/megastyle-comparison/logs/run" \| tail -80
	```

	Each event line is JSON with `data` and `timestamp` fields. Because it's SSE, the stream is
	long-lived — always use `--max-time` to bound it.

	### Check high-level state

	```bash
	curl -s "https://huggingface.co/api/spaces/olfronar/megastyle-comparison" \| \
	python3 -c "import sys, json; d=json.load(sys.stdin); r=d.get('runtime',{}); \
	print('stage:', r.get('stage'), 'hardware:', r.get('hardware'), 'sha:', d.get('sha','?')[:7])"
	```

	`stage` values: `BUILDING` → `RUNNING` (healthy) or `RUNTIME_ERROR` / `BUILD_FAILED` (check logs).

	### Common fix cycles

	- Runtime import / missing dep: edit `requirements.txt` or the affected import, commit, push,
	stream the run logs until `RUNNING`. Most fixes don't need a full rebuild — layer caching makes
	re-pushes with unchanged deps very fast.
	- Hardware change: one `HfApi().request_space_hardware(...)` call — no push, no rebuild.
	- Stuck build: use the Settings → Factory Reboot UI as a last resort; our Dockerfile is the
	HF default so factory-reboot is safe.

	## Working with the Hugging Face Hub

	- Model weights (`megastyle_encoder.pth`) should be pulled at runtime via `huggingface_hub.hf_hub_download`,
	not committed. The Space's build cache keeps it warm between restarts.
	- License of the checkpoint is MIT (per `Gaojunyao/MegaStyle`); SigLIP is Apache-2.0. The Space's own
	code should ship under a compatible license.
	- Paper citation (if surfaced in UI): `gao2026megastyle` bibtex is in the upstream repo README.

	## Conventions

	- Use `SiglipImageProcessor`, not `AutoProcessor`. Upstream's `style_score.py` uses
	`AutoProcessor.from_pretrained(SIGLIP_ID)` which loads both the image processor and the SigLIP
	tokenizer; the tokenizer requires `sentencepiece`, which isn't in the ZeroGPU base image and
	crashes at import. Vision-only inference has no use for the tokenizer. Load
	`SiglipImageProcessor.from_pretrained(SIGLIP_ID)` directly — the `pixel_values` output is
	identical.
	- Control the gradio version via `sdk_version` in README frontmatter, not `requirements.txt`.
	HF's Dockerfile injects `gradio[oauth]==<sdk_version>` into the pip install line alongside our
	`requirements.txt`. Putting a conflicting `gradio` pin in `requirements.txt` (e.g.
	`gradio>=5.25.0`) crashes the build with `Cannot install gradio==<sdk_version> and gradio>=...
	because these package versions have conflicting dependencies`. Bump `sdk_version` instead;
	`requirements.txt` should not name gradio at all.
	- `sdk_version: 5.50.0` or newer is required. Gradio 5.9.1 (the version HF defaults to at
	Space creation) ships a `gradio_client.utils.get_type` that crashes on boolean JSON schemas
	(valid JSON-Schema shorthand for accept-anything). `gr.Dataframe` triggers this in `/api/info`,
	flooding run logs with `TypeError: argument of type 'bool' is not iterable`. Fixed in later
	5.x. Also launch with `show_api=False` as a belt-and-suspenders so the endpoint isn't exposed
	at all — we don't need programmatic API access for a visual-only demo.
	- Match upstream preprocessing exactly. If you find yourself tempted to resize, center-crop, or
	change color-space conversion manually instead of going through the SigLIP image processor, stop
	— the metric will silently degrade.
	- Don't cast weights to fp16/bf16 on load unless explicitly needed. ZeroGPU A10G handles fp32
	SigLIP fine; precision changes affect similarity scores measurably.
	- Device string: the upstream `style_score.py` probes for `cuda` → `npu` → `cpu`. On ZeroGPU it
	will always be `cuda` inside the decorated function; outside (e.g. cold start), keep it on CPU.