Spaces:

olfronar
/

megastyle-comparison

Running on Zero

App Files Files Community

megastyle-comparison / README.md

Maxim Kruglikov

Bump sdk_version to 5.50.0; drop gradio pin from requirements.txt

6b38433 17 days ago

preview code

raw

history blame contribute delete

2.4 kB

	---
	title: MegaStyle Image Style Comparison
	emoji: 🎨
	colorFrom: purple
	colorTo: pink
	sdk: gradio
	sdk_version: 5.50.0
	python_version: "3.10"
	app_file: app.py
	pinned: false
	short_description: Compare image style similarity with MegaStyle-Encoder
	license: mit
	---

	> Deploying this Space: select ZeroGPU hardware in the Space's Settings → Hardware
	> panel after creating it. ZeroGPU is not configured via frontmatter.

	# MegaStyle Image Style Comparison

	Upload a test image and 1–8 reference images, hit Compare styles, and get a
	style-similarity score (0–100) plus a human-readable verdict. Powered by
	[MegaStyle-Encoder](https://huggingface.co/Gaojunyao/MegaStyle), a SigLIP-based style encoder
	trained on the 1.4M-image [MegaStyle dataset](https://huggingface.co/datasets/tencent/MegaStyle-1.4M)
	with style-supervised contrastive learning — see paper
	[MegaStyle (arXiv:2604.08364)](https://arxiv.org/abs/2604.08364).

	## How it works

	1. Each image is embedded with MegaStyle-Encoder into a unit-length style vector.
	2. Cosine similarity between the test vector and each reference vector gives a per-reference score.
	3. The headline score is the mean of those per-reference scores, shown as a percentage for
	readability. A per-reference table is shown below for transparency.

	## Verdict labels

	The verdict is a heuristic bucketing of the cosine-similarity score:

	\| Score range \| Label \|
	\|-------------\|-------\|
	\| `≥ 0.75` \| 🟢 Strong style match \|
	\| `0.65 – 0.75` \| 🟢 Good style match \|
	\| `0.55 – 0.65` \| 🟡 Moderate style match \|
	\| `0.45 – 0.55` \| 🟠 Weak style match \|
	\| `< 0.45` \| 🔴 Minimal style match \|

	These thresholds are not calibrated against ground-truth style labels — they are rule-of-thumb
	bands tuned for the typical cosine-similarity range of SigLIP-family encoders (where even
	unrelated images can sit around 0.4–0.6). Treat the raw number as the source of truth.

	## Credits

	- Paper: Gao et al., *MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent
	Text-to-Image Style Mapping*, arXiv:2604.08364, 2026.
	- Upstream code: [Tencent/MegaStyle](https://github.com/Tencent/MegaStyle)
	- Model weights: [Gaojunyao/MegaStyle](https://huggingface.co/Gaojunyao/MegaStyle) (MIT)
	- Backbone: [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) (Apache-2.0)