NearID — Identity Representation Learning via Near-identity Distractors
Paper: NearID Code: github.com/Aleksandar/NearID Paper: NearID: Identity Representation Learning via Near-identity Distractors
NearID produces identity-aware image embeddings that remain stable across background and context changes while correctly rejecting near-identity distractors (visually similar but different instances placed in the same context). It is designed for evaluating identity preservation in personalized image generation.
Architecture
| Property | Value |
|---|---|
| Base model | google/siglip2-so400m-patch14-384 |
| Backbone | SigLIP2 SO400M ViT/14 @ 384 px (frozen) |
| Pooling head | Multi-head Attention Pooling (MAP), initialised from SigLIP2 weights (trained) |
| Embedding dim | 1152 |
| Normalisation | L2 (built-in, config.normalize_embeddings=True) |
| Total parameters | ~428 M |
| Trainable parameters | ~15 M (head-only; backbone weights are frozen to preserve pretrained priors) |
| Input resolution | 384 × 384 |
| Format | safetensors (fp16) |
Quick Start
from transformers import AutoModel, AutoImageProcessor
from PIL import Image
model = AutoModel.from_pretrained("Aleksandar/nearid-siglip2", trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained("Aleksandar/nearid-siglip2")
inputs = processor(images=Image.open("photo.jpg"), return_tensors="pt")
# Full output (ModelOutput with image_embeds, last_hidden_state, pooler_output)
outputs = model(**inputs)
embedding = outputs.image_embeds # [1, 1152], L2-normalised
# Tensor shortcut
embedding = model.get_image_features(**inputs) # [1, 1152]
Note on image processor: The original training used
SiglipImageProcessor(slow). The release defaults toSiglipImageProcessorFastfor performance. To use the original slow processor, passuse_fast=FalsetoAutoImageProcessor.from_pretrained().
Pairwise Similarity
import torch
emb_a = model.get_image_features(**processor(images=img_a, return_tensors="pt"))
emb_b = model.get_image_features(**processor(images=img_b, return_tensors="pt"))
similarity = (emb_a @ emb_b.T).item() # cosine similarity (embeddings are normalised)
Batch Inference
images = [Image.open(p) for p in image_paths]
inputs = processor(images=images, return_tensors="pt", padding=True)
embeddings = model.get_image_features(**inputs) # [B, 1152]
# Pairwise similarity matrix
sim_matrix = embeddings @ embeddings.T
Evaluation
Near-Identity Discrimination & Alignment (Table 1)
We evaluate on three complementary benchmarks: NearID (object-level near-identity discrimination), MTG (part-level discrimination + oracle alignment), and DreamBench++ (human-judgment alignment).
| Scoring Model | NearID SSR ↑ | NearID PA ↑ | MTG MO ↑ | MTG MOpair ↑ | MTG SSR ↑ | MTG PA ↑ | DB++ MH ↑ |
|---|---|---|---|---|---|---|---|
| CLIP ViT-L/14 | 10.31 | 20.92 | 0.239 | 0.484 | 0.0 | 0.0 | 0.493 |
| DINOv2 ViT-L/14 | 20.43 | 34.55 | 0.324 | 0.519 | 0.0 | 0.0 | 0.492 |
| SigLIP2 (backbone) | 30.74 | 48.81 | 0.180 | 0.366 | 0.0 | 0.0 | 0.516 |
| VSM | 32.13 | 46.70 | 0.394 | 0.445 | 7.0 | 24.5 | 0.190 |
| NearID (Ours) | 99.17 | 99.71 | 0.465 | 0.486 | 35.0 | 46.5 | 0.545 |
SSR and PA are averaged across seven inpainting settings (three excluded from training). MO/MOpair = metric-to-oracle correlation; MH = metric-to-human correlation (Fisher-z averaged).
DreamBench++ Per-Method Human Alignment (Table 2)
NearID improves over SigLIP2 on every personalization method tested, with Fisher-z averaged MH of 0.545 vs 0.516.
Training Details
Training Data
NearID was trained on the NearID dataset, which consists of multi-view positives per identity paired with near-identity distractors: different but semantically similar instances inpainted into the exact same background using an ensemble of generation pipelines (Flux, PowerPaint, SDXL, Qwen). Part-level training signal is provided by the MTG dataset.
Training Procedure
| Hyperparameter | Value |
|---|---|
| Tuning strategy | Head-only (backbone frozen) |
| Loss function | NearID loss (InfoNCE + near-identity distractor ranking, α = 0.5, τ = 0.07) |
| Optimiser | AdamW |
| Learning rate | 1e-4 |
| LR schedule | Cosine with 100 warmup steps |
| Batch size | 128 |
| Epochs | 11 |
| Precision | fp16 mixed precision |
| Hardware | 1 × NVIDIA A100 |
| Training time | ~6.5 hours |
| Framework | PyTorch + HuggingFace Accelerate |
Intended Uses
Primary use cases:
- Evaluating identity preservation in personalized image generation (e.g., scoring outputs of DreamBooth, Textual Inversion, IP-Adapter)
- Embedding extraction for identity-aware retrieval or clustering
- Benchmarking and research on near-identity discrimination
Out-of-scope uses:
- This model is not a face recognition or person re-identification system
- Surveillance or tracking without consent
- Production biometric authentication (the model has not been audited for that purpose)
- Demographic classification or profiling
Limitations
- Domain: NearID was trained on synthetic inpaintings of common objects. Performance on domains not represented in the training set (e.g., highly specialised industrial parts, medical imagery) has not been evaluated.
- Resolution: The model expects 384 × 384 input. Performance may degrade on images significantly below this resolution or with heavy compression.
- Single-image scoring: The model scores individual images independently; it does not reason over video or image sequences.
- Generative models: The near-identity distractors were generated using specific inpainting pipelines. Novel generation artifacts from unseen pipelines may affect discrimination performance.
Citation
@article{cvejic2026nearid,
title={NearID: Identity Representation Learning via Near-identity Distractors},
author={Cvejic, Aleksandar and Abdal, Rameen and Eldesokey, Abdelrahman and Ghanem, Bernard and Wonka, Peter},
journal={arXiv preprint arXiv:2604.01973},
year={2026}
}
See more at https://arxiv.org/abs/2604.01973 for the full paper.
- Downloads last month
- 105
Model tree for Aleksandar/nearid-siglip2
Base model
google/siglip2-so400m-patch14-384Datasets used to train Aleksandar/nearid-siglip2
Collection including Aleksandar/nearid-siglip2
Paper for Aleksandar/nearid-siglip2
Evaluation results
- SSR (Sample Success Rate) on NearID-Benchself-reported99.170
- PA (Pairwise Accuracy) on NearID-Benchself-reported99.710
- MH (Metric-Human Correlation) on DreamBench++self-reported0.545