flyingbertman commited on
Commit
1f1d878
·
verified ·
1 Parent(s): 70b4dd9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md CHANGED
@@ -1,3 +1,91 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ library_name: onnx
4
+ tags:
5
+ - image-segmentation
6
+ - salient-object-detection
7
+ - background-removal
8
+ - u2net
9
+ - onnx
10
+ base_model: xuebinqin/U-2-Net
11
+ pipeline_tag: image-segmentation
12
+ language:
13
+ - en
14
  ---
15
+
16
+ # U²-Net — Salient Object Segmentation (ONNX)
17
+
18
+ ONNX checkpoints of [xuebinqin/U-2-Net](https://github.com/xuebinqin/U-2-Net) — a nested U-structure network for salient-object detection. Trained to separate the "main subject" of an image from the background. Pair the output mask with `image_cutout()` for background removal, or with `apply_colormap()` to visualize saliency.
19
+
20
+ Not converted locally — these are the official ONNX checkpoints, republished by [danielgatis/rembg](https://github.com/danielgatis/rembg) in a convenient release.
21
+
22
+ Credit: Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane, Martin Jagersand — University of Alberta (*"U²-Net: Going Deeper with Nested U-Structure for Salient Object Detection"*, Pattern Recognition 2020).
23
+
24
+ ## What this repo contains
25
+
26
+ | File | Params | Size | Use |
27
+ |---|---|---|---|
28
+ | `u2netp.onnx` | 4.7M | ~4.7 MB | **Recommended default.** Distilled lite variant — CPU/mobile/edge-friendly |
29
+ | `u2net.onnx` | 176M | ~170 MB | Full network — sharper edges on hair, fur, lace, thin structures |
30
+
31
+ Both files share the same input/output tensor signature, so inference code is identical — you can swap variants without rewriting anything.
32
+
33
+ ## Input / output
34
+
35
+ | | Spec |
36
+ |---|---|
37
+ | Input name | `input.1` (verify in Netron) |
38
+ | Input shape | `[1, 3, 320, 320]` (NCHW) |
39
+ | Input dtype | float32 |
40
+ | Input color order | **RGB** |
41
+ | Preprocessing | Resize to 320×320, scale to `[0,1]`, normalize with ImageNet stats: `mean=[0.485, 0.456, 0.406]`, `std=[0.229, 0.224, 0.225]` |
42
+ | Outputs | 7 tensors: `d0`..`d6`, saliency maps at decreasing resolution. **`d0` is the final fused mask** — the other six are intermediate supervisions used during training; ignore them at inference. |
43
+ | Output shape (per map) | `[1, 1, 320, 320]` |
44
+ | Output meaning | Per-pixel saliency in `[0, 1]` — higher = more likely to be the subject. Threshold (typically ~0.5) for a binary mask, or use raw values as a soft alpha. |
45
+
46
+ ## How to use
47
+
48
+ ```python
49
+ import onnxruntime as ort
50
+ import numpy as np
51
+ from PIL import Image
52
+
53
+ sess = ort.InferenceSession("u2netp.onnx") # or "u2net.onnx" — same signature
54
+
55
+ # Remember the original size so we can resize the mask back at the end
56
+ orig = Image.open("photo.jpg").convert("RGB")
57
+ W, H = orig.size
58
+
59
+ # Preprocess
60
+ img = orig.resize((320, 320), Image.BILINEAR)
61
+ arr = np.asarray(img, dtype=np.float32) / 255.0
62
+ arr = (arr - [0.485, 0.456, 0.406]) / [0.229, 0.224, 0.225]
63
+ arr = arr.transpose(2, 0, 1)[None, ...].astype(np.float32)
64
+
65
+ # Inference — outputs is a list of 7 tensors; d0 is index 0
66
+ outputs = sess.run(None, {sess.get_inputs()[0].name: arr})
67
+ d0 = outputs[0][0, 0] # 320x320 saliency
68
+
69
+ # Normalize (U²-Net outputs aren't strictly in [0,1] before squashing)
70
+ d0 = (d0 - d0.min()) / (d0.max() - d0.min() + 1e-8)
71
+
72
+ # Resize mask back to original image dimensions
73
+ mask = Image.fromarray((d0 * 255).astype(np.uint8)).resize((W, H), Image.BILINEAR)
74
+ ```
75
+
76
+ For background removal: apply `mask` as the alpha channel to the original RGB image (RGBA cutout).
77
+
78
+ ## Which one should I use?
79
+
80
+ - **`u2netp`** is the right default. 4.7 MB on disk, ~30 ms / image on CPU, mask quality good enough for >90% of background-removal and saliency-mapping use cases. Loads instantly.
81
+ - **`u2net`** earns its disk + latency cost on **fine-edge** subjects: hair, fur, lace, complex foliage, transparent objects. If the lite variant's edges look "blocky" on your inputs, the full model is the upgrade.
82
+
83
+ For interactive segmentation (clicks / boxes / prompts), pair with [MobileSAM](https://huggingface.co/Heliosoph/sam-onnx) instead — U²-Net is automatic / non-interactive.
84
+
85
+ ## Excluded variant
86
+
87
+ The original [xuebinqin/U-2-Net](https://github.com/xuebinqin/U-2-Net) repo also ships a third checkpoint called `u2net_portrait` (line-drawing portrait sketches). It's **deliberately not bundled here** — it was trained on the APDrawing dataset, which carries non-commercial restrictions that would taint the otherwise-clean Apache-2.0 status of this bundle. If you need it, grab it directly from the upstream repo and read the dataset terms first.
88
+
89
+ ## License
90
+
91
+ **Apache-2.0** — same as the upstream [xuebinqin/U-2-Net](https://github.com/xuebinqin/U-2-Net) repo. `LICENSE` file included. The danielgatis/rembg release just bundles the original weights; no relicensing occurred.