SigLIP2 Giant 256px -- Pre-compiled for AWS Neuron (Inferentia2)
Pre-compiled SigLIP2 Giant vision encoder
for AWS Inferentia2 / Trainium, traced with torch_neuronx.trace().
Model Details
| Property | Value |
|---|---|
| Base model | google/siglip2-giant-opt-patch16-256 |
| Parameters | ~1.1B |
| Input | (1, 3, 256, 256) -- batch size 1, 256x256 RGB |
| Output | dict with pooler_output (1, 1536) and last_hidden_state (1, 256, 1536) |
| Compiled size | 1.88 GB |
| SDK | Neuron SDK 2.28 (torch-neuronx 2.9.0) |
| Compiler flags | --auto-cast matmult --optlevel 3 --model-type unet-inference |
Compiler Flags Explained
| Flag | Purpose |
|---|---|
--auto-cast matmult |
Cast matrix multiplications to BF16 (~50% smaller model, 99.999% accuracy) |
--optlevel 3 |
Maximum compiler optimization |
--model-type unet-inference |
Memory layout optimizations that reduce SRAM spill (+6% throughput) |
Note: The flag is
matmult(two t's), notmatmul.
Quick Start
import torch
import torch_neuronx
from huggingface_hub import hf_hub_download
# Download pre-compiled model
model_path = hf_hub_download(
repo_id="jburtoft/siglip2-giant-256-neuron",
filename="siglip2_giant_256_neuron.pt",
)
# Load (must import torch_neuronx before torch.jit.load)
model = torch.jit.load(model_path)
model.eval()
# Inference
inp = torch.randn(1, 3, 256, 256)
with torch.no_grad():
output = model(inp)
print(output["pooler_output"].shape) # (1, 1536)
print(output["last_hidden_state"].shape) # (1, 256, 1536)
Performance (inf2.xlarge)
| Config | Throughput | Latency (avg) |
|---|---|---|
| Single core | 70.12 img/s | 14.26 ms |
| DP=2 (both cores) | 117.76 img/s | 16.98 ms |
Accuracy
Cosine similarity vs FP32 CPU reference (20 samples):
- Pooler output: 0.999924--0.999963
- Hidden states: 0.999943--0.999970
Compile It Yourself
The pre-compiled model works on any inf2 or trn2 instance. To recompile (requires inf2.8xlarge+ due to host RAM):
import torch
import torch_neuronx
from transformers import SiglipVisionModel
model = SiglipVisionModel.from_pretrained("google/siglip2-giant-opt-patch16-256")
model.eval()
model_neuron = torch_neuronx.trace(
model,
torch.randn(1, 3, 256, 256),
compiler_workdir="./compile_workdir",
compiler_args=[
"--auto-cast", "matmult",
"--optlevel", "3",
"--model-type", "unet-inference",
],
)
torch.jit.save(model_neuron, "siglip2_giant_256_neuron.pt")
Compilation takes ~15 minutes on inf2.8xlarge. The resulting .pt file is portable across all inf2 and trn2 instance sizes.
inf2.xlarge users: Compilation OOMs on inf2.xlarge (only 4 vCPUs / ~16 GB host RAM). Use this pre-compiled model or compile on a larger instance.
SDK Compatibility
This model was compiled with Neuron SDK 2.28. It should work with SDK 2.28+. If you encounter loading errors on a different SDK version, recompile using the instructions above.
License
Apache 2.0 (same as the base model).
Model tree for jburtoft/siglip2-giant-256-neuron
Base model
google/siglip2-giant-opt-patch16-256