Updated styling and HFViewer embedding
Browse files
README.md
CHANGED
|
@@ -30,6 +30,16 @@ Deployable INT8-quantized version of [`apple/mobilevit-small`](https://huggingfa
|
|
| 30 |
optimized with [embedl-deploy](https://github.com/embedl/embedl-deploy)
|
| 31 |
for low-latency NVIDIA TensorRT inference on edge GPUs.
|
| 32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
## Highlights
|
| 34 |
|
| 35 |
- **Mixed-precision INT8/FP16 quantization** with hardware-aware
|
|
@@ -39,8 +49,9 @@ for low-latency NVIDIA TensorRT inference on edge GPUs.
|
|
| 39 |
semantics.
|
| 40 |
- **Validated accuracy** within 3.30 pp of the FP32
|
| 41 |
baseline on ImageNet (see Accuracy table below).
|
| 42 |
-
- **
|
| 43 |
-
(see Performance table
|
|
|
|
| 44 |
- Includes both **ONNX** (for TensorRT) and **PT2**
|
| 45 |
(`torch.export`-loadable) artifacts plus runnable inference scripts.
|
| 46 |
|
|
@@ -62,16 +73,14 @@ python infer_pt2.py --image path/to/image.jpg # pure PyTorch via torch.export
|
|
| 62 |
| `embedl_mobilevit_small_int8.pt2` | INT8-quantized `torch.export` ExportedProgram. |
|
| 63 |
| `infer_trt.py` | Build a TRT engine from the ONNX and run sample inference. |
|
| 64 |
| `infer_pt2.py` | Load the `.pt2` with `torch.export.load` and run sample inference. |
|
| 65 |
-
| `latency_comparison.png` | Latency comparison across precisions and devices. |
|
| 66 |
|
| 67 |
## Performance
|
| 68 |
|
| 69 |
Latency measured with TensorRT + `trtexec`, GPU compute time only
|
| 70 |
(`--noDataTransfers`), CUDA Graph + Spin Wait enabled, clocks locked
|
| 71 |
-
(`nvpmodel -m 0 && jetson_clocks` on Jetson).
|
| 72 |
-
`latency_comparison.png` for a visual summary.
|
| 73 |
|
| 74 |
-
|
| 75 |
|
| 76 |
### NVIDIA Jetson AGX Orin
|
| 77 |
|
|
|
|
| 30 |
optimized with [embedl-deploy](https://github.com/embedl/embedl-deploy)
|
| 31 |
for low-latency NVIDIA TensorRT inference on edge GPUs.
|
| 32 |
|
| 33 |
+
## Upstream Model
|
| 34 |
+
|
| 35 |
+
<a href="https://hfviewer.com/apple/mobilevit-small?utm_source=huggingface&utm_medium=embedded_model_card&utm_campaign=apple__mobilevit-small_card" target="_blank" rel="noopener">
|
| 36 |
+
<img
|
| 37 |
+
src="https://hfviewer.com/api/card.svg?source=apple%2Fmobilevit-small&v=20260501clipcard"
|
| 38 |
+
alt="Open apple/mobilevit-small in hfviewer"
|
| 39 |
+
width="100%"
|
| 40 |
+
/>
|
| 41 |
+
</a>
|
| 42 |
+
|
| 43 |
## Highlights
|
| 44 |
|
| 45 |
- **Mixed-precision INT8/FP16 quantization** with hardware-aware
|
|
|
|
| 49 |
semantics.
|
| 50 |
- **Validated accuracy** within 3.30 pp of the FP32
|
| 51 |
baseline on ImageNet (see Accuracy table below).
|
| 52 |
+
- **Matches the latency of `trtexec --best`** on supported NVIDIA
|
| 53 |
+
hardware while preserving INT8 accuracy (see Performance table
|
| 54 |
+
below).
|
| 55 |
- Includes both **ONNX** (for TensorRT) and **PT2**
|
| 56 |
(`torch.export`-loadable) artifacts plus runnable inference scripts.
|
| 57 |
|
|
|
|
| 73 |
| `embedl_mobilevit_small_int8.pt2` | INT8-quantized `torch.export` ExportedProgram. |
|
| 74 |
| `infer_trt.py` | Build a TRT engine from the ONNX and run sample inference. |
|
| 75 |
| `infer_pt2.py` | Load the `.pt2` with `torch.export.load` and run sample inference. |
|
|
|
|
| 76 |
|
| 77 |
## Performance
|
| 78 |
|
| 79 |
Latency measured with TensorRT + `trtexec`, GPU compute time only
|
| 80 |
(`--noDataTransfers`), CUDA Graph + Spin Wait enabled, clocks locked
|
| 81 |
+
(`nvpmodel -m 0 && jetson_clocks` on Jetson).
|
|
|
|
| 82 |
|
| 83 |
+
<img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/mobilevit-small-quantized/mobilevit-small-quantized__orin-mountain-view.svg" alt="MobileViT-Small benchmark on NVIDIA Jetson AGX Orin">
|
| 84 |
|
| 85 |
### NVIDIA Jetson AGX Orin
|
| 86 |
|