---
license: other
license_name: sapiens2-license
license_link: https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md
pipeline_tag: image-feature-extraction
library_name: transformers
base_model: facebook/sapiens2-pretrain-0.1b
tags:
- sapiens
- sapiens2
- vision-transformer
- human-centric
- feature-extraction
- onnx
- onnxruntime-web
---
# Sapiens2-0.1B — ONNX Export
ONNX export of [facebook/sapiens2-pretrain-0.1b](https://huggingface.co/facebook/sapiens2-pretrain-0.1b), a vision transformer pretrained on **1 billion human images**, packaged for browser inference via `onnxruntime-web`.
| File | Size | Use |
|---|---|---|
| `sapiens2_0.1b_int8.onnx` | 116 MB | Browser (recommended) |
| `sapiens2_0.1b_fp32.onnx` | 458 MB | Server-side / higher precision |
| `example_embeddings.js` | — | Drop-in browser ES module |
**Output:** a `(batch, 768)` float32 vector per image (CLS token).
---
## What are embeddings?
The model encodes an image into a 768-dimensional vector that captures human-centric semantics — pose, body shape, clothing, and identity. Two images with similar people in similar poses will have embeddings close together in this space. Common uses:
- **Similarity search** — find the most similar person/pose in a collection
- **Clustering** — group images by pose, clothing, or activity
- **Classification** — train a lightweight head on top of frozen embeddings
- **Retrieval** — image → nearest-neighbor lookup in a vector database
---
## Browser quick start
```bash
npm install onnxruntime-web
```
```js
import * as ort from "onnxruntime-web";
// Point WASM binaries at the CDN build
ort.env.wasm.wasmPaths = "https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/";
const MODEL_URL =
"https://huggingface.co/barakplasma/sapiens2-onnx/resolve/main/sapiens2_0.1b_int8.onnx";
const H = 1024, W = 768;
const MEAN = [0.485, 0.456, 0.406];
const STD = [0.229, 0.224, 0.225];
// Load once; reuse for all images. ~1-2 s cold start.
export async function loadModel() {
return ort.InferenceSession.create(MODEL_URL, {
executionProviders: ["webgpu", "wasm"], // WebGPU ~1-3 s/img, WASM ~20-60 s/img
graphOptimizationLevel: "all",
});
}
// Accepts any ,