dann-od commited on
Commit
ed1810b
·
verified ·
1 Parent(s): d31c7d4

Updated styling and HFViewer embedding

Browse files
Files changed (1) hide show
  1. README.md +15 -6
README.md CHANGED
@@ -30,6 +30,16 @@ Deployable INT8-quantized version of [`apple/mobilevit-small`](https://huggingfa
30
  optimized with [embedl-deploy](https://github.com/embedl/embedl-deploy)
31
  for low-latency NVIDIA TensorRT inference on edge GPUs.
32
 
 
 
 
 
 
 
 
 
 
 
33
  ## Highlights
34
 
35
  - **Mixed-precision INT8/FP16 quantization** with hardware-aware
@@ -39,8 +49,9 @@ for low-latency NVIDIA TensorRT inference on edge GPUs.
39
  semantics.
40
  - **Validated accuracy** within 3.30 pp of the FP32
41
  baseline on ImageNet (see Accuracy table below).
42
- - **Faster than `trtexec --best`** on supported NVIDIA hardware
43
- (see Performance table below).
 
44
  - Includes both **ONNX** (for TensorRT) and **PT2**
45
  (`torch.export`-loadable) artifacts plus runnable inference scripts.
46
 
@@ -62,16 +73,14 @@ python infer_pt2.py --image path/to/image.jpg # pure PyTorch via torch.export
62
  | `embedl_mobilevit_small_int8.pt2` | INT8-quantized `torch.export` ExportedProgram. |
63
  | `infer_trt.py` | Build a TRT engine from the ONNX and run sample inference. |
64
  | `infer_pt2.py` | Load the `.pt2` with `torch.export.load` and run sample inference. |
65
- | `latency_comparison.png` | Latency comparison across precisions and devices. |
66
 
67
  ## Performance
68
 
69
  Latency measured with TensorRT + `trtexec`, GPU compute time only
70
  (`--noDataTransfers`), CUDA Graph + Spin Wait enabled, clocks locked
71
- (`nvpmodel -m 0 && jetson_clocks` on Jetson). See
72
- `latency_comparison.png` for a visual summary.
73
 
74
- ![Latency comparison across precisions](latency_comparison.png)
75
 
76
  ### NVIDIA Jetson AGX Orin
77
 
 
30
  optimized with [embedl-deploy](https://github.com/embedl/embedl-deploy)
31
  for low-latency NVIDIA TensorRT inference on edge GPUs.
32
 
33
+ ## Upstream Model
34
+
35
+ <a href="https://hfviewer.com/apple/mobilevit-small?utm_source=huggingface&amp;utm_medium=embedded_model_card&amp;utm_campaign=apple__mobilevit-small_card" target="_blank" rel="noopener">
36
+ <img
37
+ src="https://hfviewer.com/api/card.svg?source=apple%2Fmobilevit-small&amp;v=20260501clipcard"
38
+ alt="Open apple/mobilevit-small in hfviewer"
39
+ width="100%"
40
+ />
41
+ </a>
42
+
43
  ## Highlights
44
 
45
  - **Mixed-precision INT8/FP16 quantization** with hardware-aware
 
49
  semantics.
50
  - **Validated accuracy** within 3.30 pp of the FP32
51
  baseline on ImageNet (see Accuracy table below).
52
+ - **Matches the latency of `trtexec --best`** on supported NVIDIA
53
+ hardware while preserving INT8 accuracy (see Performance table
54
+ below).
55
  - Includes both **ONNX** (for TensorRT) and **PT2**
56
  (`torch.export`-loadable) artifacts plus runnable inference scripts.
57
 
 
73
  | `embedl_mobilevit_small_int8.pt2` | INT8-quantized `torch.export` ExportedProgram. |
74
  | `infer_trt.py` | Build a TRT engine from the ONNX and run sample inference. |
75
  | `infer_pt2.py` | Load the `.pt2` with `torch.export.load` and run sample inference. |
 
76
 
77
  ## Performance
78
 
79
  Latency measured with TensorRT + `trtexec`, GPU compute time only
80
  (`--noDataTransfers`), CUDA Graph + Spin Wait enabled, clocks locked
81
+ (`nvpmodel -m 0 && jetson_clocks` on Jetson).
 
82
 
83
+ <img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/mobilevit-small-quantized/mobilevit-small-quantized__orin-mountain-view.svg" alt="MobileViT-Small benchmark on NVIDIA Jetson AGX Orin">
84
 
85
  ### NVIDIA Jetson AGX Orin
86