Keypoint Detection
sapiens
sapiens2
human-centric
pose
rawalkhirodkar commited on
Commit
6296eaa
·
verified ·
1 Parent(s): 7c3177a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: sapiens2-license
4
+ license_link: https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md
5
+ pipeline_tag: keypoint-detection
6
+ library_name: sapiens
7
+ base_model: facebook/sapiens2-pretrain-1b
8
+ tags:
9
+ - sapiens
10
+ - sapiens2
11
+ - human-centric
12
+ - pose
13
+ ---
14
+
15
+ # Sapiens2-1B-Pose
16
+
17
+ 308-keypoint top-down pose estimation including detailed face (274 keypoints), hand, and foot keypoints. Predictions follow the [Sociopticon keypoint format](https://github.com/facebookresearch/sapiens2/blob/main/sapiens/pose/configs/_base_/keypoints308.py).
18
+
19
+ This repository contains the **1B Pose Estimation** checkpoint, finetuned from the [Sapiens2-1B pretrained backbone](https://huggingface.co/facebook/sapiens2-pretrain-1b).
20
+
21
+ Pose is top-down — it requires bounding boxes from a person detector. We use [RTMDet](https://github.com/open-mmlab/mmdetection/tree/main/configs/rtmdet).
22
+
23
+ - 📄 **Paper:** [OpenReview (ICLR 2026)](https://openreview.net/pdf?id=IVAlYCqdvW)
24
+ - 🌐 **Project Page:** [rawalkhirodkar.github.io/sapiens2](https://rawalkhirodkar.github.io/sapiens2)
25
+ - 💻 **Code:** [github.com/facebookresearch/sapiens2](https://github.com/facebookresearch/sapiens2)
26
+
27
+ ## Model Details
28
+
29
+ - **Developed by:** Meta
30
+ - **Model type:** Vision Transformer + Pose Estimation head
31
+ - **License:** [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md)
32
+ - **Task:** pose
33
+ - **Base model:** [facebook/sapiens2-pretrain-1b](https://huggingface.co/facebook/sapiens2-pretrain-1b)
34
+ - **Format:** safetensors
35
+ - **File:** `sapiens2_1b_pose.safetensors`
36
+
37
+ ## Quick Start
38
+
39
+ Install the [Sapiens2 repo](https://github.com/facebookresearch/sapiens2) (`pip install -e .`), download the checkpoint, and run the demo:
40
+
41
+ ```bash
42
+ # 1. Download the checkpoint to $SAPIENS_CHECKPOINT_ROOT/pose/
43
+ hf download facebook/sapiens2-pose-1b sapiens2_1b_pose.safetensors \
44
+ --local-dir ~/sapiens2_host/pose
45
+
46
+ # 2. Run the demo (edit INPUT, OUTPUT, and MODEL_NAME inside the script)
47
+ cd $SAPIENS_ROOT/sapiens/pose
48
+ ./scripts/demo/keypoints308.sh
49
+ ```
50
+
51
+ See the [Pose Estimation guide](https://github.com/facebookresearch/sapiens2/blob/main/docs/POSE.md) for details on inputs, outputs, and visualization options.
52
+
53
+ ## Model Card
54
+
55
+ | Field | Value |
56
+ |-------|-------|
57
+ | Architecture | Sapiens2 ViT backbone + Pose Estimation head |
58
+ | Backbone parameters | 1.462 B |
59
+ | Backbone FLOPs | 4.715 T |
60
+ | Embedding dim | 1536 |
61
+ | Layers | 40 |
62
+ | Attention heads | 24 |
63
+ | Inference resolution | 1024 × 768 (H × W) |
64
+ | Patch size | 16 |
65
+
66
+ ### Sapiens2-Pose Family
67
+
68
+ | Model | Params | FLOPs | Embed dim | Layers | Heads |
69
+ |-------|--------|-------|-----------|--------|-------|
70
+ | [Sapiens2-0.4B](https://huggingface.co/facebook/sapiens2-pose-0.4b) | 0.398 B | 1.260 T | 1024 | 24 | 16 |
71
+ | [Sapiens2-0.8B](https://huggingface.co/facebook/sapiens2-pose-0.8b) | 0.818 B | 2.592 T | 1280 | 32 | 16 |
72
+ | **Sapiens2-1B** *(this)* | 1.462 B | 4.715 T | 1536 | 40 | 24 |
73
+ | [Sapiens2-5B](https://huggingface.co/facebook/sapiens2-pose-5b) | 5.071 B | 15.722 T | 2432 | 56 | 32 |
74
+
75
+ See the [Sapiens2 Collection](https://huggingface.co/collections/facebook/sapiens2) for all variants and other downstream task checkpoints.
76
+
77
+ ## Intended Use
78
+
79
+ - Pose Estimation on human-centric imagery
80
+ - Research on human-centric vision
81
+
82
+ ## License
83
+
84
+ Released under the [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md).
85
+
86
+ ## Citation
87
+
88
+ ```bibtex
89
+ @inproceedings{khirodkar2026sapiens2,
90
+ title={Sapiens2},
91
+ author={Khirodkar, Rawal and Wen, He and Martinez, Julieta and Dong, Yuan and Zhaoen, Su and Saito, Shunsuke},
92
+ booktitle={International Conference on Learning Representations (ICLR)},
93
+ year={2026}
94
+ }
95
+ ```