Spaces:
Running
Running
README: verified 15/15 times, document 5B chain OOM
Browse files
README.md
CHANGED
|
@@ -18,12 +18,16 @@ Meta's `facebook/sapiens2-*` running on free HF CPU. 15 variants exposed: seg, n
|
|
| 18 |
|
| 19 |
| Task | Notes | 0.4b | 0.8b | 1b | 5b (INT8 ONNX) |
|
| 20 |
|---|---|---|---|---|---|
|
| 21 |
-
| seg | DOME 29-class body parts |
|
| 22 |
-
| normal | per-pixel surface normals |
|
| 23 |
-
| pointmap | per-pixel XYZ in meters |
|
| 24 |
-
| pose | DETR detect, 308 keypoints |
|
| 25 |
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
The model fixes a 1024×768 input tensor (NCHW with H=1024, W=768, a portrait canvas in Meta's convention). Any input is aspect-preserve resized then padded to that.
|
| 29 |
|
|
|
|
| 18 |
|
| 19 |
| Task | Notes | 0.4b | 0.8b | 1b | 5b (INT8 ONNX) |
|
| 20 |
|---|---|---|---|---|---|
|
| 21 |
+
| seg | DOME 29-class body parts | 57 s | 74 s | 208 s | 189 s |
|
| 22 |
+
| normal | per-pixel surface normals | 72 s | 84 s | 206 s | 359 s |
|
| 23 |
+
| pointmap | per-pixel XYZ in meters | 78 s | 99 s | 274 s | 386 s |
|
| 24 |
+
| pose | DETR detect, 308 keypoints | 47 s | 68 s | 232 s | not shipped |
|
| 25 |
|
| 26 |
+
Verified 15/15 via Gradio API on 2026-05-12. Times include first-call downloads.
|
| 27 |
+
|
| 28 |
+
0.4b through 1b run as fp32 PyTorch. 5B runs as INT8 ONNX (5 to 6 GB on disk; fp32 5B would need ~20 GB RAM, more than the free tier provides). Dense 0.4b/0.8b share an LRU(2) cache. Loading any 1B variant hard-clears all model caches (dense + pose + ORT) since 16 GB cpu-basic cannot fit two 1B-class models simultaneously. Pose has its own slot and DETR (`facebook/detr-resnet-50`) is sticky-loaded once.
|
| 29 |
+
|
| 30 |
+
**5B chain limitation:** calling a 5B variant right after another 5B variant on the same Space instance OOMs. ONNX Runtime's C++ session shutdown is not synchronous with the Python `_ORT_SESSIONS.clear()` call, so loading the next 5B session before the previous one's worker threads exit peaks RAM above 16 GB. If you need to benchmark multiple 5B variants, factory-restart the Space (Settings → Factory restart) between calls, or run one variant per cold Space.
|
| 31 |
|
| 32 |
The model fixes a 1024×768 input tensor (NCHW with H=1024, W=768, a portrait canvas in Meta's convention). Any input is aspect-preserve resized then padded to that.
|
| 33 |
|