File size: 3,922 Bytes
1dabc48 47765d5 1dabc48 47765d5 1dabc48 1a1db3c 1dabc48 47765d5 1dabc48 47765d5 1dabc48 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | ---
license: other
license_name: sapiens2-license
license_link: https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md
pipeline_tag: image-feature-extraction
library_name: sapiens
tags:
- sapiens
- sapiens2
- vision-transformer
- human-centric
- pretrained-backbone
- feature-extraction
---
# Sapiens2-5B
Sapiens2 is a family of high-resolution vision transformers pretrained on **1 billion human images** โ designed for human-centric tasks such as pose estimation, body-part segmentation, surface normals, and pointmaps.
This repository contains the **5B parameter pretrained backbone**. It produces dense per-patch features suitable for fine-tuning downstream task heads.
- ๐ **Paper:** [arXiv:2604.21681](https://arxiv.org/pdf/2604.21681)
- ๐ **Project Page:** [rawalkhirodkar.github.io/sapiens2](https://rawalkhirodkar.github.io/sapiens2)
- ๐ป **Code:** [github.com/facebookresearch/sapiens2](https://github.com/facebookresearch/sapiens2)
## Model Details
- **Developed by:** Meta
- **Model type:** Vision Transformer
- **License:** [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md)
- **Task:** pretrain
- **Format:** safetensors
- **File:** `sapiens2_5b_pretrain.safetensors`
## Quick Start
Install the [Sapiens2 repo](https://github.com/facebookresearch/sapiens2) (`pip install -e .`).
```python
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from sapiens.backbones.standalone.sapiens2 import Sapiens2
# Build the model and load the pretrained checkpoint
model = Sapiens2(arch="sapiens2_5b", img_size=(1024, 768), patch_size=16).eval().cuda() # img_size is (H, W)
ckpt_path = hf_hub_download(repo_id="facebook/sapiens2-pretrain-5b", filename="sapiens2_5b_pretrain.safetensors")
model.load_state_dict(load_file(ckpt_path))
# Forward pass on a single image (RGB; ImageNet normalization recommended)
x = torch.randn(1, 3, 1024, 768).cuda()
with torch.no_grad():
features = model(x)[0] # dense backbone features: (B, num_tokens, embed_dim)
```
## Model Card
| Field | Value |
|-------|-------|
| Architecture | Sapiens2 ViT (RoPE, GQA, SwiGLU, RMSNorm, QK-norm) |
| Parameters | 5.071 B |
| FLOPs | 15.722 T |
| Embedding dim | 2432 |
| Layers | 56 |
| Attention heads | 32 |
| Pretraining resolution | 1024 ร 768 (H ร W) |
| Patch size | 16 |
| Pretraining data | 1B human images |
### Sapiens2 Family
| Model | Params | FLOPs | Embed dim | Layers | Heads |
|-------|--------|-------|-----------|--------|-------|
| [Sapiens2-0.1B](https://huggingface.co/facebook/sapiens2-pretrain-0.1b) | 0.114 B | 0.342 T | 768 | 12 | 12 |
| [Sapiens2-0.4B](https://huggingface.co/facebook/sapiens2-pretrain-0.4b) | 0.398 B | 1.260 T | 1024 | 24 | 16 |
| [Sapiens2-0.8B](https://huggingface.co/facebook/sapiens2-pretrain-0.8b) | 0.818 B | 2.592 T | 1280 | 32 | 16 |
| [Sapiens2-1B](https://huggingface.co/facebook/sapiens2-pretrain-1b) | 1.462 B | 4.715 T | 1536 | 40 | 24 |
| [Sapiens2-1B-4K](https://huggingface.co/facebook/sapiens2-pretrain-1b-4k) | 1.607 B | โ | 1536 | 40 | 24 |
| **Sapiens2-5B** *(this)* | 5.071 B | 15.722 T | 2432 | 56 | 32 |
See the [Sapiens2 Collection](https://huggingface.co/collections/facebook/sapiens2) for all variants and downstream task checkpoints (pose, segmentation, normals, pointmaps).
## Intended Use
- Feature extraction for human-centric downstream tasks
- Initialization for fine-tuning task heads (pose, segmentation, normals, pointmap)
- Research on human-centric vision
## License
Released under the [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md).
## Citation
```bibtex
@article{khirodkarsapiens2,
title={Sapiens2},
author={Khirodkar, Rawal and Wen, He and Martinez, Julieta and Dong, Yuan and Su, Zhaoen and Saito, Shunsuke},
journal={arXiv preprint arXiv:2604.21681},
year={2026}
}
```
|