| --- |
| license: other |
| license_name: sapiens2-license |
| license_link: https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md |
| pipeline_tag: image-feature-extraction |
| library_name: sapiens |
| tags: |
| - sapiens |
| - sapiens2 |
| - vision-transformer |
| - human-centric |
| - pretrained-backbone |
| - feature-extraction |
| --- |
| |
| # Sapiens2-1B |
|
|
| Sapiens2 is a family of high-resolution vision transformers pretrained on **1 billion human images** β designed for human-centric tasks such as pose estimation, body-part segmentation, surface normals, and pointmaps. |
|
|
| This repository contains the **1B parameter pretrained backbone**. It produces dense per-patch features suitable for fine-tuning downstream task heads. |
|
|
| - π **Paper:** [arXiv:2604.21681](https://arxiv.org/pdf/2604.21681) |
| - π **Project Page:** [rawalkhirodkar.github.io/sapiens2](https://rawalkhirodkar.github.io/sapiens2) |
| - π» **Code:** [github.com/facebookresearch/sapiens2](https://github.com/facebookresearch/sapiens2) |
|
|
| ## Model Details |
|
|
| - **Developed by:** Meta |
| - **Model type:** Vision Transformer |
| - **License:** [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md) |
| - **Task:** pretrain |
| - **Format:** safetensors |
| - **File:** `sapiens2_1b_pretrain.safetensors` |
|
|
| ## Quick Start |
|
|
| Install the [Sapiens2 repo](https://github.com/facebookresearch/sapiens2) (`pip install -e .`). |
|
|
| ```python |
| import torch |
| from huggingface_hub import hf_hub_download |
| from safetensors.torch import load_file |
| from sapiens.backbones.standalone.sapiens2 import Sapiens2 |
| |
| # Build the model and load the pretrained checkpoint |
| model = Sapiens2(arch="sapiens2_1b", img_size=(1024, 768), patch_size=16).eval().cuda() # img_size is (H, W) |
| ckpt_path = hf_hub_download(repo_id="facebook/sapiens2-pretrain-1b", filename="sapiens2_1b_pretrain.safetensors") |
| model.load_state_dict(load_file(ckpt_path)) |
| |
| # Forward pass on a single image (RGB; ImageNet normalization recommended) |
| x = torch.randn(1, 3, 1024, 768).cuda() |
| with torch.no_grad(): |
| features = model(x)[0] # dense backbone features: (B, num_tokens, embed_dim) |
| ``` |
|
|
| ## Model Card |
|
|
| | Field | Value | |
| |-------|-------| |
| | Architecture | Sapiens2 ViT (RoPE, GQA, SwiGLU, RMSNorm, QK-norm) | |
| | Parameters | 1.462 B | |
| | FLOPs | 4.715 T | |
| | Embedding dim | 1536 | |
| | Layers | 40 | |
| | Attention heads | 24 | |
| | Pretraining resolution | 1024 Γ 768 (H Γ W) | |
| | Patch size | 16 | |
| | Pretraining data | 1B human images | |
|
|
| ### Sapiens2 Family |
|
|
| | Model | Params | FLOPs | Embed dim | Layers | Heads | |
| |-------|--------|-------|-----------|--------|-------| |
| | [Sapiens2-0.1B](https://huggingface.co/facebook/sapiens2-pretrain-0.1b) | 0.114 B | 0.342 T | 768 | 12 | 12 | |
| | [Sapiens2-0.4B](https://huggingface.co/facebook/sapiens2-pretrain-0.4b) | 0.398 B | 1.260 T | 1024 | 24 | 16 | |
| | [Sapiens2-0.8B](https://huggingface.co/facebook/sapiens2-pretrain-0.8b) | 0.818 B | 2.592 T | 1280 | 32 | 16 | |
| | **Sapiens2-1B** *(this)* | 1.462 B | 4.715 T | 1536 | 40 | 24 | |
| | [Sapiens2-1B-4K](https://huggingface.co/facebook/sapiens2-pretrain-1b-4k) | 1.607 B | β | 1536 | 40 | 24 | |
| | [Sapiens2-5B](https://huggingface.co/facebook/sapiens2-pretrain-5b) | 5.071 B | 15.722 T | 2432 | 56 | 32 | |
|
|
| See the [Sapiens2 Collection](https://huggingface.co/collections/facebook/sapiens2) for all variants and downstream task checkpoints (pose, segmentation, normals, pointmaps). |
|
|
| ## Intended Use |
|
|
| - Feature extraction for human-centric downstream tasks |
| - Initialization for fine-tuning task heads (pose, segmentation, normals, pointmap) |
| - Research on human-centric vision |
|
|
| ## License |
|
|
| Released under the [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md). |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{khirodkarsapiens2, |
| title={Sapiens2}, |
| author={Khirodkar, Rawal and Wen, He and Martinez, Julieta and Dong, Yuan and Su, Zhaoen and Saito, Shunsuke}, |
| journal={arXiv preprint arXiv:2604.21681}, |
| year={2026} |
| } |
| ``` |
|
|