sapiens
sapiens2
human-centric
vision-transformer
sapiens2 / README.md
rawalkhirodkar's picture
Update model card
2acbac6 verified
metadata
license: other
license_name: sapiens2-license
license_link: https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md
library_name: sapiens
tags:
  - sapiens
  - sapiens2
  - human-centric
  - vision-transformer

Sapiens2

Sapiens2 is a family of high-resolution vision transformers pretrained on 1 billion human images โ€” designed for human-centric tasks such as pose estimation, body-part segmentation, surface normals, and pointmaps.

This is the index repository: each variant lives in its own model repo (linked below).

Pretrained Backbones

Model Params Repository
Sapiens2-0.1B 0.114 B facebook/sapiens2-pretrain-0.1b
Sapiens2-0.4B 0.398 B facebook/sapiens2-pretrain-0.4b
Sapiens2-0.8B 0.818 B facebook/sapiens2-pretrain-0.8b
Sapiens2-1B 1.462 B facebook/sapiens2-pretrain-1b
Sapiens2-1B (4K) 1.607 B facebook/sapiens2-pretrain-1b-4k
Sapiens2-5B 5.071 B facebook/sapiens2-pretrain-5b

Task Checkpoints

Pose Estimation

Model Repository
Sapiens2-0.4B facebook/sapiens2-pose-0.4b
Sapiens2-0.8B facebook/sapiens2-pose-0.8b
Sapiens2-1B facebook/sapiens2-pose-1b
Sapiens2-5B facebook/sapiens2-pose-5b

Body-Part Segmentation

Model Repository
Sapiens2-0.4B facebook/sapiens2-seg-0.4b
Sapiens2-0.8B facebook/sapiens2-seg-0.8b
Sapiens2-1B facebook/sapiens2-seg-1b
Sapiens2-5B facebook/sapiens2-seg-5b

Surface Normal Estimation

Pointmap Estimation

License

Released under the Sapiens2 License.

Citation

@article{khirodkarsapiens2,
  title={Sapiens2},
  author={Khirodkar, Rawal and Wen, He and Martinez, Julieta and Dong, Yuan and Su, Zhaoen and Saito, Shunsuke},
  journal={arXiv preprint arXiv:2604.21681},
  year={2026}
}