Spaces:

WeReCooking
/

sapiens2-cpu

Running

Nekochu commited on 12 days ago

Commit

cb4d7d1

1 Parent(s): 86547e5

README: verified 15/15 times, document 5B chain OOM

Files changed (1) hide show

README.md CHANGED Viewed

@@ -18,12 +18,16 @@ Meta's `facebook/sapiens2-*` running on free HF CPU. 15 variants exposed: seg, n
 | Task | Notes | 0.4b | 0.8b | 1b | 5b (INT8 ONNX) |
 |---|---|---|---|---|---|
-| seg | DOME 29-class body parts | 54 s | 79 s | 142 s | downloads once, then runs |
-| normal | per-pixel surface normals | 50 s | 121 s | 150 s | downloads once, then runs |
-| pointmap | per-pixel XYZ in meters | 76 s | 145 s | 173 s | downloads once, then runs |
-| pose | DETR detect, 308 keypoints | 72 s | 103 s | 174 s | not shipped |
-0.4b through 1b run as fp32 PyTorch. 5B runs as INT8 ONNX (5 to 6 GB on disk; fp32 5B would need ~20 GB RAM, more than the free tier provides). Dense tasks share an LRU(2) cache; pose has its own slot and DETR is loaded once. First call per variant downloads then loads (~30-90 s warmup).
 The model fixes a 1024×768 input tensor (NCHW with H=1024, W=768, a portrait canvas in Meta's convention). Any input is aspect-preserve resized then padded to that.

 | Task | Notes | 0.4b | 0.8b | 1b | 5b (INT8 ONNX) |
 |---|---|---|---|---|---|
+| seg | DOME 29-class body parts | 57 s | 74 s | 208 s | 189 s |
+| normal | per-pixel surface normals | 72 s | 84 s | 206 s | 359 s |
+| pointmap | per-pixel XYZ in meters | 78 s | 99 s | 274 s | 386 s |
+| pose | DETR detect, 308 keypoints | 47 s | 68 s | 232 s | not shipped |
+Verified 15/15 via Gradio API on 2026-05-12. Times include first-call downloads.
+0.4b through 1b run as fp32 PyTorch. 5B runs as INT8 ONNX (5 to 6 GB on disk; fp32 5B would need ~20 GB RAM, more than the free tier provides). Dense 0.4b/0.8b share an LRU(2) cache. Loading any 1B variant hard-clears all model caches (dense + pose + ORT) since 16 GB cpu-basic cannot fit two 1B-class models simultaneously. Pose has its own slot and DETR (`facebook/detr-resnet-50`) is sticky-loaded once.
+**5B chain limitation:** calling a 5B variant right after another 5B variant on the same Space instance OOMs. ONNX Runtime's C++ session shutdown is not synchronous with the Python `_ORT_SESSIONS.clear()` call, so loading the next 5B session before the previous one's worker threads exit peaks RAM above 16 GB. If you need to benchmark multiple 5B variants, factory-restart the Space (Settings → Factory restart) between calls, or run one variant per cold Space.
 The model fixes a 1024×768 input tensor (NCHW with H=1024, W=768, a portrait canvas in Meta's convention). Any input is aspect-preserve resized then padded to that.