| --- |
| license: other |
| license_name: sapiens2-license |
| license_link: https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md |
| pipeline_tag: keypoint-detection |
| library_name: sapiens |
| base_model: facebook/sapiens2-pretrain-0.4b |
| tags: |
| - sapiens |
| - sapiens2 |
| - human-centric |
| - pose |
| --- |
| |
| # Sapiens2-0.4B-Pose |
|
|
| 308-keypoint top-down pose estimation including detailed face (274 keypoints), hand, and foot keypoints. Predictions follow the [Sociopticon keypoint format](https://github.com/facebookresearch/sapiens2/blob/main/sapiens/pose/configs/_base_/keypoints308.py). |
|
|
| This repository contains the **0.4B Pose Estimation** checkpoint, finetuned from the [Sapiens2-0.4B pretrained backbone](https://huggingface.co/facebook/sapiens2-pretrain-0.4b). |
|
|
| Pose is top-down β it requires bounding boxes from a person detector. We use [RTMDet](https://github.com/open-mmlab/mmdetection/tree/main/configs/rtmdet). |
|
|
| - π **Paper:** [arXiv:2604.21681](https://arxiv.org/pdf/2604.21681) |
| - π **Project Page:** [rawalkhirodkar.github.io/sapiens2](https://rawalkhirodkar.github.io/sapiens2) |
| - π» **Code:** [github.com/facebookresearch/sapiens2](https://github.com/facebookresearch/sapiens2) |
|
|
| ## Model Details |
|
|
| - **Developed by:** Meta |
| - **Model type:** Vision Transformer |
| - **License:** [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md) |
| - **Task:** pose |
| - **Base model:** [facebook/sapiens2-pretrain-0.4b](https://huggingface.co/facebook/sapiens2-pretrain-0.4b) |
| - **Format:** safetensors |
| - **File:** `sapiens2_0.4b_pose.safetensors` |
|
|
| ## Quick Start |
|
|
| Install the [Sapiens2 repo](https://github.com/facebookresearch/sapiens2) (`pip install -e .`), download the checkpoint, and run the demo: |
|
|
| ```bash |
| # 1. Download the checkpoint to $SAPIENS_CHECKPOINT_ROOT/pose/ |
| hf download facebook/sapiens2-pose-0.4b sapiens2_0.4b_pose.safetensors \ |
| --local-dir ~/sapiens2_host/pose |
| |
| # 2. Run the demo (edit INPUT, OUTPUT, and MODEL_NAME inside the script) |
| cd $SAPIENS_ROOT/sapiens/pose |
| ./scripts/demo/keypoints308.sh |
| ``` |
|
|
| See the [Pose Estimation guide](https://github.com/facebookresearch/sapiens2/blob/main/docs/POSE.md) for details on inputs, outputs, and visualization options. |
|
|
| ## Model Card |
|
|
| | Field | Value | |
| |-------|-------| |
| | Architecture | Sapiens2 ViT backbone + Pose Estimation head | |
| | Backbone parameters | 0.398 B | |
| | Backbone FLOPs | 1.260 T | |
| | Embedding dim | 1024 | |
| | Layers | 24 | |
| | Attention heads | 16 | |
| | Inference resolution | 1024 Γ 768 (H Γ W) | |
| | Patch size | 16 | |
|
|
| ### Sapiens2-Pose Family |
|
|
| | Model | Params | FLOPs | Embed dim | Layers | Heads | |
| |-------|--------|-------|-----------|--------|-------| |
| | **Sapiens2-0.4B** *(this)* | 0.398 B | 1.260 T | 1024 | 24 | 16 | |
| | [Sapiens2-0.8B](https://huggingface.co/facebook/sapiens2-pose-0.8b) | 0.818 B | 2.592 T | 1280 | 32 | 16 | |
| | [Sapiens2-1B](https://huggingface.co/facebook/sapiens2-pose-1b) | 1.462 B | 4.715 T | 1536 | 40 | 24 | |
| | [Sapiens2-5B](https://huggingface.co/facebook/sapiens2-pose-5b) | 5.071 B | 15.722 T | 2432 | 56 | 32 | |
|
|
| See the [Sapiens2 Collection](https://huggingface.co/collections/facebook/sapiens2) for all variants and other downstream task checkpoints. |
|
|
| ## Intended Use |
|
|
| - Pose Estimation on human-centric imagery |
| - Research on human-centric vision |
|
|
| ## License |
|
|
| Released under the [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md). |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{khirodkarsapiens2, |
| title={Sapiens2}, |
| author={Khirodkar, Rawal and Wen, He and Martinez, Julieta and Dong, Yuan and Su, Zhaoen and Saito, Shunsuke}, |
| journal={arXiv preprint arXiv:2604.21681}, |
| year={2026} |
| } |
| ``` |
|
|