File size: 10,211 Bytes
f8fe280
 
 
 
 
d31c7d4
f8fe280
d31c7d4
f8fe280
d31c7d4
 
 
 
 
 
f8fe280
d31c7d4
 
f8fe280
d31c7d4
f8fe280
 
d31c7d4
f8fe280
d31c7d4
8164d09
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d31c7d4
 
 
 
 
 
 
ed1810b
 
 
 
 
 
 
 
 
 
d31c7d4
 
 
 
 
 
 
 
 
30d278b
 
 
ed1810b
 
 
d31c7d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ed1810b
d31c7d4
ed1810b
d31c7d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba03ed5
 
7c98247
 
 
 
ba03ed5
 
 
 
 
 
 
 
 
7c98247
ba03ed5
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
license: other
license_name: embedl-models-community-licence-1.0
license_link: https://github.com/embedl/embedl-models/blob/main/LICENSE
base_model:
  - apple/mobilevit-small
quantized_from:
  - apple/mobilevit-small
tags:
  - image-classification
  - quantization
  - onnx
  - tensorrt
  - edge
  - embedl
gated: true
extra_gated_heading: "Access Embedl Mobilevit Small"
extra_gated_description: "To access this model, please review and accept the terms below. Your contact information is collected solely to manage access and, with your explicit consent, to notify you about updated or new optimized models from Embedl."
extra_gated_button_content: "Agree and request access"
extra_gated_prompt: "By requesting access you agree to the Embedl Models Community Licence and the upstream Mobilevit Small License"
extra_gated_fields:
  Company: text
  I agree to the Embedl Models Community Licence and upstream Mobilevit Small License: checkbox
  I consent to being contacted by Embedl about products and services (optional): checkbox
---
<!-- embedl-banner:start -->
<style>
.embedl-btn-primary { transition: background 160ms ease, box-shadow 160ms ease; }
.embedl-btn-primary:hover { background: #4FDCE4 !important; box-shadow: 0 8px 22px rgba(45,212,221,0.45) !important; }
.embedl-btn-secondary { transition: background 160ms ease; }
.embedl-btn-secondary:hover { background: rgba(45,212,221,0.15) !important; }
.embedl-headline { font-size: clamp(11px, 2.15vw, 15px) !important; }
.embedl-btn-primary, .embedl-btn-secondary {
  font-size: clamp(11px, 1.65vw, 13px) !important;
  padding: clamp(6px, 1.1vw, 9px) clamp(10px, 1.6vw, 14px) !important;
}
</style>
<div style="background:radial-gradient(600px 220px at 0% 50%,rgba(45,212,221,0.22) 0%,rgba(45,212,221,0) 60%),radial-gradient(400px 180px at 100% 100%,rgba(45,212,221,0.10) 0%,rgba(45,212,221,0) 55%),linear-gradient(135deg,#0B1626 0%,#142338 100%);border:1px solid rgba(45,212,221,0.28);border-radius:12px;padding:22px 24px;margin:0 0 24px 0;color:#F2F6FA;box-shadow:0 4px 16px rgba(11,22,38,0.18);overflow:hidden;box-sizing:border-box;max-width:100%;">
  <table style="width:100%;border-collapse:collapse;border:0;background:transparent;">
    <tr style="background:transparent;">
      <td style="vertical-align:middle;border:0;padding:0;background:transparent;">
        <div style="display:inline-block;font-size:10px;letter-spacing:0.08em;text-transform:uppercase;font-weight:700;color:#2DD4DD;background:rgba(45,212,221,0.15);border:1px solid rgba(45,212,221,0.35);padding:4px 10px;border-radius:999px;margin-bottom:10px;white-space:nowrap;">Optimized by Embedl</div>
        <div class="embedl-headline" style="font-size:15px;font-weight:700;line-height:1.35;color:#F2F6FA;margin-bottom:4px;">Need to <span style="color:#2DD4DD;white-space:nowrap;">fine-tune</span>, hit <span style="color:#2DD4DD;white-space:nowrap;">performance targets</span>, or deploy on <span style="color:#2DD4DD;white-space:nowrap;">specific hardware</span>?</div>
        <div style="font-size:13px;color:#9BA7B5;">We've got you covered.</div>
      </td>
      <td width="1%" style="vertical-align:middle;border:0;padding:0 0 0 18px;white-space:nowrap;text-align:right;background:transparent;">
        <a href="https://www.embedl.com/models" class="embedl-btn-secondary" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #2DD4DD;color:#2DD4DD;text-decoration:none;margin-right:8px;">Learn more</a>
        <a href="https://www.embedl.com/contact" class="embedl-btn-primary" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #2DD4DD;background:#2DD4DD;color:#0B1626;text-decoration:none;box-shadow:0 6px 18px rgba(45,212,221,0.28);">Get in touch →</a>
      </td>
    </tr>
  </table>
</div>
<!-- embedl-banner:end -->

# Embedl Mobilevit Small (Quantized for TensorRT)

Deployable INT8-quantized version of [`apple/mobilevit-small`](https://huggingface.co/apple/mobilevit-small),
optimized with [embedl-deploy](https://github.com/embedl/embedl-deploy)
for low-latency NVIDIA TensorRT inference on edge GPUs.

## Upstream Model

<a href="https://hfviewer.com/apple/mobilevit-small?utm_source=huggingface&amp;utm_medium=embedded_model_card&amp;utm_campaign=apple__mobilevit-small_card" target="_blank" rel="noopener">
  <img
    src="https://hfviewer.com/api/card.svg?source=apple%2Fmobilevit-small&amp;v=20260501clipcard"
    alt="Open apple/mobilevit-small in hfviewer"
    width="100%"
  />
</a>

## Highlights

- **Mixed-precision INT8/FP16 quantization** with hardware-aware
  optimizations from [embedl-deploy](https://github.com/embedl/embedl-deploy).
- **Drop-in replacement** for `apple/mobilevit-small` in TensorRT pipelines —
  same input shape (256×256), same output
  semantics.
- **Validated accuracy** within 3.30 pp of the FP32
  baseline on ImageNet (see Accuracy table below).
- **Quantization-aware training (QAT)** further recovers accuracy
  lost in INT8 conversion by fine-tuning the model with simulated
  quantization in the forward pass.
- **Matches the latency of `trtexec --best`** on supported NVIDIA
  hardware while preserving INT8 accuracy (see Performance table
  below).
- Includes both **ONNX** (for TensorRT) and **PT2**
  (`torch.export`-loadable) artifacts plus runnable inference scripts.

## Quick Start

```bash
pip install huggingface_hub onnxruntime-gpu pillow numpy
python -c "from huggingface_hub import snapshot_download; snapshot_download('embedl/mobilevit-small-quantized', local_dir='.')"
python infer_trt.py --image path/to/image.jpg   # TensorRT
# or
python infer_pt2.py --image path/to/image.jpg   # pure PyTorch via torch.export
```

## Files

| File | Purpose |
|---|---|
| `embedl_mobilevit_small_int8.onnx` | INT8-quantized ONNX with Q/DQ nodes — feed to TensorRT. |
| `embedl_mobilevit_small_int8.pt2` | INT8-quantized `torch.export` ExportedProgram. |
| `infer_trt.py` | Build a TRT engine from the ONNX and run sample inference. |
| `infer_pt2.py` | Load the `.pt2` with `torch.export.load` and run sample inference. |

## Performance

Latency measured with TensorRT + `trtexec`, GPU compute time only
(`--noDataTransfers`), CUDA Graph + Spin Wait enabled, clocks locked
(`nvpmodel -m 0 && jetson_clocks` on Jetson).

<img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/mobilevit-small-quantized/mobilevit-small-quantized__orin-mountain-view.svg" alt="MobileViT-Small benchmark on NVIDIA Jetson AGX Orin">

### NVIDIA Jetson AGX Orin

| Configuration | Mean Latency | Speedup vs FP16 |
|---|---|---|
| TensorRT FP16 | 1.28 ms | 1.00x |
| TensorRT --best (unconstrained) | 1.09 ms | 1.17x |
| **Embedl Deploy INT8** | **1.09 ms** | **1.17x** |


## Accuracy

Evaluated on the ImageNet validation split. The quantized model
retains nearly all of the FP32 accuracy with a small tolerance.

| Model | Top-1 | Top-5 |
|---|---|---|
| `apple/mobilevit-small` FP32 (ours) | 78.14% | 94.08% |
| **Embedl Mobilevit Small INT8** | **74.83%** | **92.28%** |

## Creating Your Own Optimized Models

This artifact was produced with
[embedl-deploy](https://github.com/embedl/embedl-deploy),
Embedl's open-source PyTorch → TensorRT deployment library. You can
apply the same workflow to your own models — see
[the documentation](https://github.com/embedl/embedl-deploy#readme)
for installation and usage.

## License

| Component | License |
|---|---|
| Optimized model artifacts (this repo) | [Embedl Models Community Licence v1.0](https://github.com/embedl/embedl-models/blob/main/LICENSE) — no redistribution as a hosted service |
| Upstream architecture and weights | [Mobilevit Small License](https://huggingface.co/apple/mobilevit-small) |

## Contact

We offer engineering support for on-prem/edge deployments and partner
co-marketing opportunities. Reach out at
[contact@embedl.com](mailto:contact@embedl.com), or open an issue on
[GitHub](https://github.com/embedl/embedl-deploy).

<!-- embedl-discord-banner:start -->
<style>
.embedl-discord-btn { transition: background 160ms ease, box-shadow 160ms ease; }
.embedl-discord-btn:hover { background: #6C77F5 !important; box-shadow: 0 8px 22px rgba(88,101,242,0.55) !important; }
</style>
<div style="background:radial-gradient(600px 220px at 0% 50%,rgba(88,101,242,0.22) 0%,rgba(88,101,242,0) 60%),radial-gradient(400px 180px at 100% 100%,rgba(88,101,242,0.10) 0%,rgba(88,101,242,0) 55%),linear-gradient(135deg,#0B1626 0%,#142338 100%);border:1px solid rgba(88,101,242,0.35);border-radius:12px;padding:22px 24px;margin:24px 0 0 0;color:#F2F6FA;box-shadow:0 4px 16px rgba(11,22,38,0.18);overflow:hidden;box-sizing:border-box;max-width:100%;">
  <table style="width:100%;border-collapse:collapse;border:0;background:transparent;">
    <tr style="background:transparent;">
      <td style="vertical-align:middle;border:0;padding:0;background:transparent;">
        <div style="display:inline-block;font-size:10px;letter-spacing:0.08em;text-transform:uppercase;font-weight:700;color:#A5B4FC;background:rgba(88,101,242,0.18);border:1px solid rgba(88,101,242,0.45);padding:4px 10px;border-radius:999px;margin-bottom:10px;white-space:nowrap;">Community &amp; support</div>
        <div style="font-size:15px;font-weight:700;line-height:1.35;color:#F2F6FA;margin-bottom:4px;">Need help with this model? Chat with the Embedl team and other engineers on <span style="color:#A5B4FC;white-space:nowrap;">Discord</span>.</div>
        <div style="font-size:13px;color:#9BA7B5;">Quantization gotchas, hardware questions, fine-tuning tips — bring them all.</div>
      </td>
      <td width="1%" style="vertical-align:middle;border:0;padding:0 0 0 18px;white-space:nowrap;text-align:right;background:transparent;">
        <a href="https://discord.gg/MTbMWdKqE" class="embedl-discord-btn" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #5865F2;background:#5865F2;color:#FFFFFF;text-decoration:none;box-shadow:0 6px 18px rgba(88,101,242,0.35);">Join our Discord →</a>
      </td>
    </tr>
  </table>
</div>
<!-- embedl-discord-banner:end -->