File size: 3,650 Bytes
2c5c709 d898e87 2c5c709 d898e87 2c5c709 d898e87 2c5c709 d898e87 2c5c709 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | ---
license: other
license_name: sapiens2-license
license_link: https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md
pipeline_tag: keypoint-detection
library_name: sapiens
base_model: facebook/sapiens2-pretrain-0.4b
tags:
- sapiens
- sapiens2
- human-centric
- pose
---
# Sapiens2-0.4B-Pose
308-keypoint top-down pose estimation including detailed face (274 keypoints), hand, and foot keypoints. Predictions follow the [Sociopticon keypoint format](https://github.com/facebookresearch/sapiens2/blob/main/sapiens/pose/configs/_base_/keypoints308.py).
This repository contains the **0.4B Pose Estimation** checkpoint, finetuned from the [Sapiens2-0.4B pretrained backbone](https://huggingface.co/facebook/sapiens2-pretrain-0.4b).
Pose is top-down — it requires bounding boxes from a person detector. We use [RTMDet](https://github.com/open-mmlab/mmdetection/tree/main/configs/rtmdet).
- 📄 **Paper:** [arXiv:2604.21681](https://arxiv.org/pdf/2604.21681)
- 🌐 **Project Page:** [rawalkhirodkar.github.io/sapiens2](https://rawalkhirodkar.github.io/sapiens2)
- 💻 **Code:** [github.com/facebookresearch/sapiens2](https://github.com/facebookresearch/sapiens2)
## Model Details
- **Developed by:** Meta
- **Model type:** Vision Transformer
- **License:** [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md)
- **Task:** pose
- **Base model:** [facebook/sapiens2-pretrain-0.4b](https://huggingface.co/facebook/sapiens2-pretrain-0.4b)
- **Format:** safetensors
- **File:** `sapiens2_0.4b_pose.safetensors`
## Quick Start
Install the [Sapiens2 repo](https://github.com/facebookresearch/sapiens2) (`pip install -e .`), download the checkpoint, and run the demo:
```bash
# 1. Download the checkpoint to $SAPIENS_CHECKPOINT_ROOT/pose/
hf download facebook/sapiens2-pose-0.4b sapiens2_0.4b_pose.safetensors \
--local-dir ~/sapiens2_host/pose
# 2. Run the demo (edit INPUT, OUTPUT, and MODEL_NAME inside the script)
cd $SAPIENS_ROOT/sapiens/pose
./scripts/demo/keypoints308.sh
```
See the [Pose Estimation guide](https://github.com/facebookresearch/sapiens2/blob/main/docs/POSE.md) for details on inputs, outputs, and visualization options.
## Model Card
| Field | Value |
|-------|-------|
| Architecture | Sapiens2 ViT backbone + Pose Estimation head |
| Backbone parameters | 0.398 B |
| Backbone FLOPs | 1.260 T |
| Embedding dim | 1024 |
| Layers | 24 |
| Attention heads | 16 |
| Inference resolution | 1024 × 768 (H × W) |
| Patch size | 16 |
### Sapiens2-Pose Family
| Model | Params | FLOPs | Embed dim | Layers | Heads |
|-------|--------|-------|-----------|--------|-------|
| **Sapiens2-0.4B** *(this)* | 0.398 B | 1.260 T | 1024 | 24 | 16 |
| [Sapiens2-0.8B](https://huggingface.co/facebook/sapiens2-pose-0.8b) | 0.818 B | 2.592 T | 1280 | 32 | 16 |
| [Sapiens2-1B](https://huggingface.co/facebook/sapiens2-pose-1b) | 1.462 B | 4.715 T | 1536 | 40 | 24 |
| [Sapiens2-5B](https://huggingface.co/facebook/sapiens2-pose-5b) | 5.071 B | 15.722 T | 2432 | 56 | 32 |
See the [Sapiens2 Collection](https://huggingface.co/collections/facebook/sapiens2) for all variants and other downstream task checkpoints.
## Intended Use
- Pose Estimation on human-centric imagery
- Research on human-centric vision
## License
Released under the [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md).
## Citation
```bibtex
@article{khirodkarsapiens2,
title={Sapiens2},
author={Khirodkar, Rawal and Wen, He and Martinez, Julieta and Dong, Yuan and Su, Zhaoen and Saito, Shunsuke},
journal={arXiv preprint arXiv:2604.21681},
year={2026}
}
```
|