Depth Estimation
sapiens
sapiens2
human-centric
normal
File size: 3,355 Bytes
9a8c790
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8b02446
9a8c790
 
 
 
 
 
8b02446
9a8c790
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8b02446
9a8c790
8b02446
 
9a8c790
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
license: other
license_name: sapiens2-license
license_link: https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md
pipeline_tag: depth-estimation
library_name: sapiens
base_model: facebook/sapiens2-pretrain-5b
tags:
  - sapiens
  - sapiens2
  - human-centric
  - normal
---

# Sapiens2-5B-Surface

Per-pixel surface-normal estimation (3-channel unit vectors in camera frame).

This repository contains the **5B Surface Normal Estimation** checkpoint, finetuned from the [Sapiens2-5B pretrained backbone](https://huggingface.co/facebook/sapiens2-pretrain-5b).

- 📄 **Paper:** [arXiv:2604.21681](https://arxiv.org/pdf/2604.21681)
- 🌐 **Project Page:** [rawalkhirodkar.github.io/sapiens2](https://rawalkhirodkar.github.io/sapiens2)
- 💻 **Code:** [github.com/facebookresearch/sapiens2](https://github.com/facebookresearch/sapiens2)

## Model Details

- **Developed by:** Meta
- **Model type:** Vision Transformer
- **License:** [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md)
- **Task:** normal
- **Base model:** [facebook/sapiens2-pretrain-5b](https://huggingface.co/facebook/sapiens2-pretrain-5b)
- **Format:** safetensors
- **File:** `sapiens2_5b_normal.safetensors`

## Quick Start

Install the [Sapiens2 repo](https://github.com/facebookresearch/sapiens2) (`pip install -e .`), download the checkpoint, and run the demo:

```bash
# 1. Download the checkpoint to $SAPIENS_CHECKPOINT_ROOT/normal/
hf download facebook/sapiens2-normal-5b sapiens2_5b_normal.safetensors \
    --local-dir ~/sapiens2_host/normal

# 2. Run the demo (edit INPUT, OUTPUT, and MODEL_NAME inside the script)
cd $SAPIENS_ROOT/sapiens/dense
./scripts/demo/normal.sh
```

See the [Surface Normal Estimation guide](https://github.com/facebookresearch/sapiens2/blob/main/docs/NORMAL.md) for details on inputs, outputs, and visualization options.

## Model Card

| Field | Value |
|-------|-------|
| Architecture | Sapiens2 ViT backbone + Surface Normal Estimation head |
| Backbone parameters | 5.071 B |
| Backbone FLOPs | 15.722 T |
| Embedding dim | 2432 |
| Layers | 56 |
| Attention heads | 32 |
| Inference resolution | 1024 × 768 (H × W) |
| Patch size | 16 |

### Sapiens2-Surface Family

| Model | Params | FLOPs | Embed dim | Layers | Heads |
|-------|--------|-------|-----------|--------|-------|
| [Sapiens2-0.4B](https://huggingface.co/facebook/sapiens2-normal-0.4b) | 0.398 B | 1.260 T | 1024 | 24 | 16 |
| [Sapiens2-0.8B](https://huggingface.co/facebook/sapiens2-normal-0.8b) | 0.818 B | 2.592 T | 1280 | 32 | 16 |
| [Sapiens2-1B](https://huggingface.co/facebook/sapiens2-normal-1b) | 1.462 B | 4.715 T | 1536 | 40 | 24 |
| **Sapiens2-5B** *(this)* | 5.071 B | 15.722 T | 2432 | 56 | 32 |

See the [Sapiens2 Collection](https://huggingface.co/collections/facebook/sapiens2) for all variants and other downstream task checkpoints.

## Intended Use

- Surface Normal Estimation on human-centric imagery
- Research on human-centric vision

## License

Released under the [Sapiens2 License](https://github.com/facebookresearch/sapiens2/blob/main/LICENSE.md).

## Citation

```bibtex
@article{khirodkarsapiens2,
  title={Sapiens2},
  author={Khirodkar, Rawal and Wen, He and Martinez, Julieta and Dong, Yuan and Su, Zhaoen and Saito, Shunsuke},
  journal={arXiv preprint arXiv:2604.21681},
  year={2026}
}
```