--- license: other license_name: sapiens2-license license_link: https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md pipeline_tag: depth-estimation library_name: sapiens base_model: facebook/sapiens2-pretrain-1b tags: - sapiens - sapiens2 - human-centric - normal --- # Sapiens2-1B-Surface Per-pixel surface-normal estimation (3-channel unit vectors in camera frame). This repository contains the **1B Surface Normal Estimation** checkpoint, finetuned from the [Sapiens2-1B pretrained backbone](https://huggingface.co/facebook/sapiens2-pretrain-1b). - 📄 **Paper:** [arXiv:2604.21681](https://arxiv.org/pdf/2604.21681) - 🌐 **Project Page:** [rawalkhirodkar.github.io/sapiens2](https://rawalkhirodkar.github.io/sapiens2) - 💻 **Code:** [github.com/facebookresearch/sapiens2](https://github.com/facebookresearch/sapiens2) ## Model Details - **Developed by:** Meta - **Model type:** Vision Transformer - **License:** [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md) - **Task:** normal - **Base model:** [facebook/sapiens2-pretrain-1b](https://huggingface.co/facebook/sapiens2-pretrain-1b) - **Format:** safetensors - **File:** `sapiens2_1b_normal.safetensors` ## Quick Start Install the [Sapiens2 repo](https://github.com/facebookresearch/sapiens2) (`pip install -e .`), download the checkpoint, and run the demo: ```bash # 1. Download the checkpoint to $SAPIENS_CHECKPOINT_ROOT/normal/ hf download facebook/sapiens2-normal-1b sapiens2_1b_normal.safetensors \ --local-dir ~/sapiens2_host/normal # 2. Run the demo (edit INPUT, OUTPUT, and MODEL_NAME inside the script) cd $SAPIENS_ROOT/sapiens/dense ./scripts/demo/normal.sh ``` See the [Surface Normal Estimation guide](https://github.com/facebookresearch/sapiens2/blob/main/docs/NORMAL.md) for details on inputs, outputs, and visualization options. ## Model Card | Field | Value | |-------|-------| | Architecture | Sapiens2 ViT backbone + Surface Normal Estimation head | | Backbone parameters | 1.462 B | | Backbone FLOPs | 4.715 T | | Embedding dim | 1536 | | Layers | 40 | | Attention heads | 24 | | Inference resolution | 1024 × 768 (H × W) | | Patch size | 16 | ### Sapiens2-Surface Family | Model | Params | FLOPs | Embed dim | Layers | Heads | |-------|--------|-------|-----------|--------|-------| | [Sapiens2-0.4B](https://huggingface.co/facebook/sapiens2-normal-0.4b) | 0.398 B | 1.260 T | 1024 | 24 | 16 | | [Sapiens2-0.8B](https://huggingface.co/facebook/sapiens2-normal-0.8b) | 0.818 B | 2.592 T | 1280 | 32 | 16 | | **Sapiens2-1B** *(this)* | 1.462 B | 4.715 T | 1536 | 40 | 24 | | [Sapiens2-1B-4K](https://huggingface.co/facebook/sapiens2-pretrain-1b-4k) | 1.607 B | — | 1536 | 40 | 24 | | [Sapiens2-5B](https://huggingface.co/facebook/sapiens2-normal-5b) | 5.071 B | 15.722 T | 2432 | 56 | 32 | See the [Sapiens2 Collection](https://huggingface.co/collections/facebook/sapiens2) for all variants and other downstream task checkpoints. ## Intended Use - Surface Normal Estimation on human-centric imagery - Research on human-centric vision ## License Released under the [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md). ## Citation ```bibtex @article{khirodkarsapiens2, title={Sapiens2}, author={Khirodkar, Rawal and Wen, He and Martinez, Julieta and Dong, Yuan and Su, Zhaoen and Saito, Shunsuke}, journal={arXiv preprint arXiv:2604.21681}, year={2026} } ```