SigLIP to Arctic Projection Layer
A lightweight linear projection layer that maps SigLIP image embeddings (1152-dim) to Snowflake Arctic-embed-m-v2.0 text embedding space (768-dim).
Model Details
- Input: SigLIP-so400m-patch14-384 image embeddings (1152 dimensions)
- Output: Arctic-embed-m-v2.0 compatible embeddings (768 dimensions)
- Architecture: Linear projection (no hidden layers)
- Parameters: ~885K
Training Data
Trained on ~690K image-caption pairs from DataComp-small, filtered for quality.
Performance
| Metric | Value |
|---|---|
| Image-to-Text Recall@1 | 77.44% |
| Image-to-Text Recall@5 | 91.6% |
| Text-to-Image Recall@1 | 84.86% |
| Text-to-Image Recall@5 | 95.4% |
| Mean Cosine Similarity | 0.458 |
Usage
import torch
from safetensors.torch import load_file
# Load the projection layer
state_dict = load_file("model.safetensors")
projection = torch.nn.Linear(1152, 768, bias=True)
projection.load_state_dict(state_dict)
# Project SigLIP embeddings to Arctic space
siglip_embeds = ... # [batch, 1152]
arctic_compatible = projection(siglip_embeds) # [batch, 768]
Training
Trained using contrastive loss with temperature=0.07 for 2 epochs.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support