Update model card

d898e87 verified 13 days ago

3.65 kB

	---
	license: other
	license_name: sapiens2-license
	license_link: https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md
	pipeline_tag: keypoint-detection
	library_name: sapiens
	base_model: facebook/sapiens2-pretrain-0.4b
	tags:
	- sapiens
	- sapiens2
	- human-centric
	- pose
	---

	# Sapiens2-0.4B-Pose

	308-keypoint top-down pose estimation including detailed face (274 keypoints), hand, and foot keypoints. Predictions follow the [Sociopticon keypoint format](https://github.com/facebookresearch/sapiens2/blob/main/sapiens/pose/configs/_base_/keypoints308.py).

	This repository contains the 0.4B Pose Estimation checkpoint, finetuned from the [Sapiens2-0.4B pretrained backbone](https://huggingface.co/facebook/sapiens2-pretrain-0.4b).

	Pose is top-down — it requires bounding boxes from a person detector. We use [RTMDet](https://github.com/open-mmlab/mmdetection/tree/main/configs/rtmdet).

	- 📄 Paper: [arXiv:2604.21681](https://arxiv.org/pdf/2604.21681)
	- 🌐 Project Page: [rawalkhirodkar.github.io/sapiens2](https://rawalkhirodkar.github.io/sapiens2)
	- 💻 Code: [github.com/facebookresearch/sapiens2](https://github.com/facebookresearch/sapiens2)

	## Model Details

	- Developed by: Meta
	- Model type: Vision Transformer
	- License: [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md)
	- Task: pose
	- Base model: [facebook/sapiens2-pretrain-0.4b](https://huggingface.co/facebook/sapiens2-pretrain-0.4b)
	- Format: safetensors
	- File: `sapiens2_0.4b_pose.safetensors`

	## Quick Start

	Install the [Sapiens2 repo](https://github.com/facebookresearch/sapiens2) (`pip install -e .`), download the checkpoint, and run the demo:

	```bash
	# 1. Download the checkpoint to $SAPIENS_CHECKPOINT_ROOT/pose/
	hf download facebook/sapiens2-pose-0.4b sapiens2_0.4b_pose.safetensors \
	--local-dir ~/sapiens2_host/pose

	# 2. Run the demo (edit INPUT, OUTPUT, and MODEL_NAME inside the script)
	cd $SAPIENS_ROOT/sapiens/pose
	./scripts/demo/keypoints308.sh
	```

	See the [Pose Estimation guide](https://github.com/facebookresearch/sapiens2/blob/main/docs/POSE.md) for details on inputs, outputs, and visualization options.

	## Model Card

	\| Field \| Value \|
	\|-------\|-------\|
	\| Architecture \| Sapiens2 ViT backbone + Pose Estimation head \|
	\| Backbone parameters \| 0.398 B \|
	\| Backbone FLOPs \| 1.260 T \|
	\| Embedding dim \| 1024 \|
	\| Layers \| 24 \|
	\| Attention heads \| 16 \|
	\| Inference resolution \| 1024 × 768 (H × W) \|
	\| Patch size \| 16 \|

	### Sapiens2-Pose Family

	\| Model \| Params \| FLOPs \| Embed dim \| Layers \| Heads \|
	\|-------\|--------\|-------\|-----------\|--------\|-------\|
	\| Sapiens2-0.4B (this) \| 0.398 B \| 1.260 T \| 1024 \| 24 \| 16 \|
	\| [Sapiens2-0.8B](https://huggingface.co/facebook/sapiens2-pose-0.8b) \| 0.818 B \| 2.592 T \| 1280 \| 32 \| 16 \|
	\| [Sapiens2-1B](https://huggingface.co/facebook/sapiens2-pose-1b) \| 1.462 B \| 4.715 T \| 1536 \| 40 \| 24 \|
	\| [Sapiens2-5B](https://huggingface.co/facebook/sapiens2-pose-5b) \| 5.071 B \| 15.722 T \| 2432 \| 56 \| 32 \|

	See the [Sapiens2 Collection](https://huggingface.co/collections/facebook/sapiens2) for all variants and other downstream task checkpoints.

	## Intended Use

	- Pose Estimation on human-centric imagery
	- Research on human-centric vision

	## License

	Released under the [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md).

	## Citation

	```bibtex
	@article{khirodkarsapiens2,
	title={Sapiens2},
	author={Khirodkar, Rawal and Wen, He and Martinez, Julieta and Dong, Yuan and Su, Zhaoen and Saito, Shunsuke},
	journal={arXiv preprint arXiv:2604.21681},
	year={2026}
	}
	```