--- license: other license_name: sapiens2-license license_link: https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md pipeline_tag: image-feature-extraction library_name: transformers base_model: facebook/sapiens2-pretrain-0.1b tags: - sapiens - sapiens2 - vision-transformer - human-centric - feature-extraction - onnx - onnxruntime-web --- # Sapiens2-0.1B — ONNX Export ONNX export of [facebook/sapiens2-pretrain-0.1b](https://huggingface.co/facebook/sapiens2-pretrain-0.1b), a vision transformer pretrained on **1 billion human images**, packaged for browser inference via `onnxruntime-web`. | File | Size | Use | |---|---|---| | `sapiens2_0.1b_int8.onnx` | 116 MB | Browser (recommended) | | `sapiens2_0.1b_fp32.onnx` | 458 MB | Server-side / higher precision | | `example_embeddings.js` | — | Drop-in browser ES module | **Output:** a `(batch, 768)` float32 vector per image (CLS token). --- ## What are embeddings? The model encodes an image into a 768-dimensional vector that captures human-centric semantics — pose, body shape, clothing, and identity. Two images with similar people in similar poses will have embeddings close together in this space. Common uses: - **Similarity search** — find the most similar person/pose in a collection - **Clustering** — group images by pose, clothing, or activity - **Classification** — train a lightweight head on top of frozen embeddings - **Retrieval** — image → nearest-neighbor lookup in a vector database --- ## Browser quick start ```bash npm install onnxruntime-web ``` ```js import * as ort from "onnxruntime-web"; // Point WASM binaries at the CDN build ort.env.wasm.wasmPaths = "https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/"; const MODEL_URL = "https://huggingface.co/barakplasma/sapiens2-onnx/resolve/main/sapiens2_0.1b_int8.onnx"; const H = 1024, W = 768; const MEAN = [0.485, 0.456, 0.406]; const STD = [0.229, 0.224, 0.225]; // Load once; reuse for all images. ~1-2 s cold start. export async function loadModel() { return ort.InferenceSession.create(MODEL_URL, { executionProviders: ["webgpu", "wasm"], // WebGPU ~1-3 s/img, WASM ~20-60 s/img graphOptimizationLevel: "all", }); } // Accepts any ,