Spaces:
Running on Zero
Running on Zero
| title: MegaStyle Image Style Comparison | |
| emoji: π¨ | |
| colorFrom: purple | |
| colorTo: pink | |
| sdk: gradio | |
| sdk_version: 5.50.0 | |
| python_version: "3.10" | |
| app_file: app.py | |
| pinned: false | |
| short_description: Compare image style similarity with MegaStyle-Encoder | |
| license: mit | |
| > **Deploying this Space:** select **ZeroGPU** hardware in the Space's *Settings β Hardware* | |
| > panel after creating it. ZeroGPU is not configured via frontmatter. | |
| # MegaStyle Image Style Comparison | |
| Upload a **test image** and **1β8 reference images**, hit **Compare styles**, and get a | |
| style-similarity score (0β100) plus a human-readable verdict. Powered by | |
| [MegaStyle-Encoder](https://huggingface.co/Gaojunyao/MegaStyle), a SigLIP-based style encoder | |
| trained on the 1.4M-image [MegaStyle dataset](https://huggingface.co/datasets/tencent/MegaStyle-1.4M) | |
| with style-supervised contrastive learning β see paper | |
| [MegaStyle (arXiv:2604.08364)](https://arxiv.org/abs/2604.08364). | |
| ## How it works | |
| 1. Each image is embedded with MegaStyle-Encoder into a unit-length style vector. | |
| 2. Cosine similarity between the test vector and each reference vector gives a per-reference score. | |
| 3. The headline score is the **mean** of those per-reference scores, shown as a percentage for | |
| readability. A per-reference table is shown below for transparency. | |
| ## Verdict labels | |
| The verdict is a heuristic bucketing of the cosine-similarity score: | |
| | Score range | Label | | |
| |-------------|-------| | |
| | `β₯ 0.75` | π’ Strong style match | | |
| | `0.65 β 0.75` | π’ Good style match | | |
| | `0.55 β 0.65` | π‘ Moderate style match | | |
| | `0.45 β 0.55` | π Weak style match | | |
| | `< 0.45` | π΄ Minimal style match | | |
| These thresholds are not calibrated against ground-truth style labels β they are rule-of-thumb | |
| bands tuned for the typical cosine-similarity range of SigLIP-family encoders (where even | |
| unrelated images can sit around 0.4β0.6). Treat the raw number as the source of truth. | |
| ## Credits | |
| - Paper: Gao et al., *MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent | |
| Text-to-Image Style Mapping*, arXiv:2604.08364, 2026. | |
| - Upstream code: [Tencent/MegaStyle](https://github.com/Tencent/MegaStyle) | |
| - Model weights: [Gaojunyao/MegaStyle](https://huggingface.co/Gaojunyao/MegaStyle) (MIT) | |
| - Backbone: [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) (Apache-2.0) | |