Nekochu commited on
Commit
cb4d7d1
·
1 Parent(s): 86547e5

README: verified 15/15 times, document 5B chain OOM

Browse files
Files changed (1) hide show
  1. README.md +9 -5
README.md CHANGED
@@ -18,12 +18,16 @@ Meta's `facebook/sapiens2-*` running on free HF CPU. 15 variants exposed: seg, n
18
 
19
  | Task | Notes | 0.4b | 0.8b | 1b | 5b (INT8 ONNX) |
20
  |---|---|---|---|---|---|
21
- | seg | DOME 29-class body parts | 54 s | 79 s | 142 s | downloads once, then runs |
22
- | normal | per-pixel surface normals | 50 s | 121 s | 150 s | downloads once, then runs |
23
- | pointmap | per-pixel XYZ in meters | 76 s | 145 s | 173 s | downloads once, then runs |
24
- | pose | DETR detect, 308 keypoints | 72 s | 103 s | 174 s | not shipped |
25
 
26
- 0.4b through 1b run as fp32 PyTorch. 5B runs as INT8 ONNX (5 to 6 GB on disk; fp32 5B would need ~20 GB RAM, more than the free tier provides). Dense tasks share an LRU(2) cache; pose has its own slot and DETR is loaded once. First call per variant downloads then loads (~30-90 s warmup).
 
 
 
 
27
 
28
  The model fixes a 1024×768 input tensor (NCHW with H=1024, W=768, a portrait canvas in Meta's convention). Any input is aspect-preserve resized then padded to that.
29
 
 
18
 
19
  | Task | Notes | 0.4b | 0.8b | 1b | 5b (INT8 ONNX) |
20
  |---|---|---|---|---|---|
21
+ | seg | DOME 29-class body parts | 57 s | 74 s | 208 s | 189 s |
22
+ | normal | per-pixel surface normals | 72 s | 84 s | 206 s | 359 s |
23
+ | pointmap | per-pixel XYZ in meters | 78 s | 99 s | 274 s | 386 s |
24
+ | pose | DETR detect, 308 keypoints | 47 s | 68 s | 232 s | not shipped |
25
 
26
+ Verified 15/15 via Gradio API on 2026-05-12. Times include first-call downloads.
27
+
28
+ 0.4b through 1b run as fp32 PyTorch. 5B runs as INT8 ONNX (5 to 6 GB on disk; fp32 5B would need ~20 GB RAM, more than the free tier provides). Dense 0.4b/0.8b share an LRU(2) cache. Loading any 1B variant hard-clears all model caches (dense + pose + ORT) since 16 GB cpu-basic cannot fit two 1B-class models simultaneously. Pose has its own slot and DETR (`facebook/detr-resnet-50`) is sticky-loaded once.
29
+
30
+ **5B chain limitation:** calling a 5B variant right after another 5B variant on the same Space instance OOMs. ONNX Runtime's C++ session shutdown is not synchronous with the Python `_ORT_SESSIONS.clear()` call, so loading the next 5B session before the previous one's worker threads exit peaks RAM above 16 GB. If you need to benchmark multiple 5B variants, factory-restart the Space (Settings → Factory restart) between calls, or run one variant per cold Space.
31
 
32
  The model fixes a 1024×768 input tensor (NCHW with H=1024, W=768, a portrait canvas in Meta's convention). Any input is aspect-preserve resized then padded to that.
33