SigLIP to Arctic Projection Layer

A lightweight linear projection layer that maps SigLIP image embeddings (1152-dim) to Snowflake Arctic-embed-m-v2.0 text embedding space (768-dim).

Model Details

  • Input: SigLIP-so400m-patch14-384 image embeddings (1152 dimensions)
  • Output: Arctic-embed-m-v2.0 compatible embeddings (768 dimensions)
  • Architecture: Linear projection (no hidden layers)
  • Parameters: ~885K

Training Data

Trained on ~690K image-caption pairs from DataComp-small, filtered for quality.

Performance

Metric Value
Image-to-Text Recall@1 77.44%
Image-to-Text Recall@5 91.6%
Text-to-Image Recall@1 84.86%
Text-to-Image Recall@5 95.4%
Mean Cosine Similarity 0.458

Usage

import torch
from safetensors.torch import load_file

# Load the projection layer
state_dict = load_file("model.safetensors")
projection = torch.nn.Linear(1152, 768, bias=True)
projection.load_state_dict(state_dict)

# Project SigLIP embeddings to Arctic space
siglip_embeds = ...  # [batch, 1152]
arctic_compatible = projection(siglip_embeds)  # [batch, 768]

Training

Trained using contrastive loss with temperature=0.07 for 2 epochs.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train carsondial/christmas-siglip-arctic-projector