NearID — Identity Representation Learning via Near-identity Distractors

Paper: NearID Code: github.com/Aleksandar/NearID Paper: NearID: Identity Representation Learning via Near-identity Distractors

Paper Project Page GitHub KAUST Snap Research

NearID produces identity-aware image embeddings that remain stable across background and context changes while correctly rejecting near-identity distractors (visually similar but different instances placed in the same context). It is designed for evaluating identity preservation in personalized image generation.

Architecture

Property Value
Base model google/siglip2-so400m-patch14-384
Backbone SigLIP2 SO400M ViT/14 @ 384 px (frozen)
Pooling head Multi-head Attention Pooling (MAP), initialised from SigLIP2 weights (trained)
Embedding dim 1152
Normalisation L2 (built-in, config.normalize_embeddings=True)
Total parameters ~428 M
Trainable parameters ~15 M (head-only; backbone weights are frozen to preserve pretrained priors)
Input resolution 384 × 384
Format safetensors (fp16)

Quick Start

from transformers import AutoModel, AutoImageProcessor
from PIL import Image

model = AutoModel.from_pretrained("Aleksandar/nearid-siglip2", trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained("Aleksandar/nearid-siglip2")

inputs = processor(images=Image.open("photo.jpg"), return_tensors="pt")

# Full output (ModelOutput with image_embeds, last_hidden_state, pooler_output)
outputs = model(**inputs)
embedding = outputs.image_embeds  # [1, 1152], L2-normalised

# Tensor shortcut
embedding = model.get_image_features(**inputs)  # [1, 1152]

Note on image processor: The original training used SiglipImageProcessor (slow). The release defaults to SiglipImageProcessorFast for performance. To use the original slow processor, pass use_fast=False to AutoImageProcessor.from_pretrained().

Pairwise Similarity

import torch

emb_a = model.get_image_features(**processor(images=img_a, return_tensors="pt"))
emb_b = model.get_image_features(**processor(images=img_b, return_tensors="pt"))

similarity = (emb_a @ emb_b.T).item()  # cosine similarity (embeddings are normalised)

Batch Inference

images = [Image.open(p) for p in image_paths]
inputs = processor(images=images, return_tensors="pt", padding=True)
embeddings = model.get_image_features(**inputs)  # [B, 1152]

# Pairwise similarity matrix
sim_matrix = embeddings @ embeddings.T

Evaluation

Near-Identity Discrimination & Alignment (Table 1)

We evaluate on three complementary benchmarks: NearID (object-level near-identity discrimination), MTG (part-level discrimination + oracle alignment), and DreamBench++ (human-judgment alignment).

Scoring Model NearID SSR ↑ NearID PA ↑ MTG MO ↑ MTG MOpair ↑ MTG SSR ↑ MTG PA ↑ DB++ MH ↑
CLIP ViT-L/14 10.31 20.92 0.239 0.484 0.0 0.0 0.493
DINOv2 ViT-L/14 20.43 34.55 0.324 0.519 0.0 0.0 0.492
SigLIP2 (backbone) 30.74 48.81 0.180 0.366 0.0 0.0 0.516
VSM 32.13 46.70 0.394 0.445 7.0 24.5 0.190
NearID (Ours) 99.17 99.71 0.465 0.486 35.0 46.5 0.545

SSR and PA are averaged across seven inpainting settings (three excluded from training). MO/MOpair = metric-to-oracle correlation; MH = metric-to-human correlation (Fisher-z averaged).

DreamBench++ Per-Method Human Alignment (Table 2)

NearID improves over SigLIP2 on every personalization method tested, with Fisher-z averaged MH of 0.545 vs 0.516.

Training Details

Training Data

NearID was trained on the NearID dataset, which consists of multi-view positives per identity paired with near-identity distractors: different but semantically similar instances inpainted into the exact same background using an ensemble of generation pipelines (Flux, PowerPaint, SDXL, Qwen). Part-level training signal is provided by the MTG dataset.

Training Procedure

Hyperparameter Value
Tuning strategy Head-only (backbone frozen)
Loss function NearID loss (InfoNCE + near-identity distractor ranking, α = 0.5, τ = 0.07)
Optimiser AdamW
Learning rate 1e-4
LR schedule Cosine with 100 warmup steps
Batch size 128
Epochs 11
Precision fp16 mixed precision
Hardware 1 × NVIDIA A100
Training time ~6.5 hours
Framework PyTorch + HuggingFace Accelerate

Intended Uses

Primary use cases:

  • Evaluating identity preservation in personalized image generation (e.g., scoring outputs of DreamBooth, Textual Inversion, IP-Adapter)
  • Embedding extraction for identity-aware retrieval or clustering
  • Benchmarking and research on near-identity discrimination

Out-of-scope uses:

  • This model is not a face recognition or person re-identification system
  • Surveillance or tracking without consent
  • Production biometric authentication (the model has not been audited for that purpose)
  • Demographic classification or profiling

Limitations

  • Domain: NearID was trained on synthetic inpaintings of common objects. Performance on domains not represented in the training set (e.g., highly specialised industrial parts, medical imagery) has not been evaluated.
  • Resolution: The model expects 384 × 384 input. Performance may degrade on images significantly below this resolution or with heavy compression.
  • Single-image scoring: The model scores individual images independently; it does not reason over video or image sequences.
  • Generative models: The near-identity distractors were generated using specific inpainting pipelines. Novel generation artifacts from unseen pipelines may affect discrimination performance.

Citation

@article{cvejic2026nearid,
  title={NearID: Identity Representation Learning via Near-identity Distractors},
  author={Cvejic, Aleksandar and Abdal, Rameen and Eldesokey, Abdelrahman and Ghanem, Bernard and Wonka, Peter},
  journal={arXiv preprint arXiv:2604.01973},
  year={2026}
}

See more at https://arxiv.org/abs/2604.01973 for the full paper.

Downloads last month
105
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Aleksandar/nearid-siglip2

Finetuned
(23)
this model

Datasets used to train Aleksandar/nearid-siglip2

Collection including Aleksandar/nearid-siglip2

Paper for Aleksandar/nearid-siglip2

Evaluation results