Instructions to use embedl/mobilevit-small-quantized with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TensorRT
How to use embedl/mobilevit-small-quantized with TensorRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| license: other | |
| license_name: embedl-models-community-licence-1.0 | |
| license_link: https://github.com/embedl/embedl-models/blob/main/LICENSE | |
| base_model: | |
| - apple/mobilevit-small | |
| quantized_from: | |
| - apple/mobilevit-small | |
| tags: | |
| - image-classification | |
| - quantization | |
| - onnx | |
| - tensorrt | |
| - edge | |
| - embedl | |
| gated: true | |
| extra_gated_heading: "Access Embedl Mobilevit Small" | |
| extra_gated_description: "To access this model, please review and accept the terms below. Your contact information is collected solely to manage access and, with your explicit consent, to notify you about updated or new optimized models from Embedl." | |
| extra_gated_button_content: "Agree and request access" | |
| extra_gated_prompt: "By requesting access you agree to the Embedl Models Community Licence and the upstream Mobilevit Small License" | |
| extra_gated_fields: | |
| Company: text | |
| I agree to the Embedl Models Community Licence and upstream Mobilevit Small License: checkbox | |
| I consent to being contacted by Embedl about products and services (optional): checkbox | |
| # Embedl Mobilevit Small (Quantized for TensorRT) | |
| Deployable INT8-quantized version of [`apple/mobilevit-small`](https://huggingface.co/apple/mobilevit-small), | |
| optimized with [embedl-deploy](https://github.com/embedl/embedl-deploy) | |
| for low-latency NVIDIA TensorRT inference on edge GPUs. | |
| ## Highlights | |
| - **Mixed-precision INT8/FP16 quantization** with hardware-aware | |
| optimizations from [embedl-deploy](https://github.com/embedl/embedl-deploy). | |
| - **Drop-in replacement** for `apple/mobilevit-small` in TensorRT pipelines β | |
| same input shape (256Γ256), same output | |
| semantics. | |
| - **Validated accuracy** within 3.30 pp of the FP32 | |
| baseline on ImageNet (see Accuracy table below). | |
| - **Faster than `trtexec --best`** on supported NVIDIA hardware | |
| (see Performance table below). | |
| - Includes both **ONNX** (for TensorRT) and **PT2** | |
| (`torch.export`-loadable) artifacts plus runnable inference scripts. | |
| ## Quick Start | |
| ```bash | |
| pip install huggingface_hub onnxruntime-gpu pillow numpy | |
| python -c "from huggingface_hub import snapshot_download; snapshot_download('embedl/mobilevit-small-quantized', local_dir='.')" | |
| python infer_trt.py --image path/to/image.jpg # TensorRT | |
| # or | |
| python infer_pt2.py --image path/to/image.jpg # pure PyTorch via torch.export | |
| ``` | |
| ## Files | |
| | File | Purpose | | |
| |---|---| | |
| | `embedl_mobilevit_small_int8.onnx` | INT8-quantized ONNX with Q/DQ nodes β feed to TensorRT. | | |
| | `embedl_mobilevit_small_int8.pt2` | INT8-quantized `torch.export` ExportedProgram. | | |
| | `infer_trt.py` | Build a TRT engine from the ONNX and run sample inference. | | |
| | `infer_pt2.py` | Load the `.pt2` with `torch.export.load` and run sample inference. | | |
| | `latency_comparison.png` | Latency comparison across precisions and devices. | | |
| ## Performance | |
| Latency measured with TensorRT + `trtexec`, GPU compute time only | |
| (`--noDataTransfers`), CUDA Graph + Spin Wait enabled, clocks locked | |
| (`nvpmodel -m 0 && jetson_clocks` on Jetson). See | |
| `latency_comparison.png` for a visual summary. | |
|  | |
| ### NVIDIA Jetson AGX Orin | |
| | Configuration | Mean Latency | Speedup vs FP16 | | |
| |---|---|---| | |
| | TensorRT FP16 | 1.28 ms | 1.00x | | |
| | TensorRT --best (unconstrained) | 1.09 ms | 1.17x | | |
| | **Embedl Deploy INT8** | **1.09 ms** | **1.17x** | | |
| ## Accuracy | |
| Evaluated on the ImageNet validation split. The quantized model | |
| retains nearly all of the FP32 accuracy with a small tolerance. | |
| | Model | Top-1 | Top-5 | | |
| |---|---|---| | |
| | `apple/mobilevit-small` FP32 (ours) | 78.14% | 94.08% | | |
| | **Embedl Mobilevit Small INT8** | **74.83%** | **92.28%** | | |
| ## Creating Your Own Optimized Models | |
| This artifact was produced with | |
| [embedl-deploy](https://github.com/embedl/embedl-deploy), | |
| Embedl's open-source PyTorch β TensorRT deployment library. You can | |
| apply the same workflow to your own models β see | |
| [the documentation](https://github.com/embedl/embedl-deploy#readme) | |
| for installation and usage. | |
| ## License | |
| | Component | License | | |
| |---|---| | |
| | Optimized model artifacts (this repo) | [Embedl Models Community Licence v1.0](https://github.com/embedl/embedl-models/blob/main/LICENSE) β no redistribution as a hosted service | | |
| | Upstream architecture and weights | [Mobilevit Small License](https://huggingface.co/apple/mobilevit-small) | | |
| ## Contact | |
| We offer engineering support for on-prem/edge deployments and partner | |
| co-marketing opportunities. Reach out at | |
| [contact@embedl.com](mailto:contact@embedl.com), or open an issue on | |
| [GitHub](https://github.com/embedl/embedl-deploy). | |