MobileVit-Small First commit

d31c7d4 verified 22 days ago

4.57 kB

license: other
license_name: embedl-models-community-licence-1.0
license_link: https://github.com/embedl/embedl-models/blob/main/LICENSE
base_model:
  - apple/mobilevit-small
quantized_from:
  - apple/mobilevit-small
tags:
  - image-classification
  - quantization
  - onnx
  - tensorrt
  - edge
  - embedl
gated: true
extra_gated_heading: Access Embedl Mobilevit Small
extra_gated_description: >-
  To access this model, please review and accept the terms below. Your contact
  information is collected solely to manage access and, with your explicit
  consent, to notify you about updated or new optimized models from Embedl.
extra_gated_button_content: Agree and request access
extra_gated_prompt: >-
  By requesting access you agree to the Embedl Models Community Licence and the
  upstream Mobilevit Small License
extra_gated_fields:
  Company: text
  I agree to the Embedl Models Community Licence and upstream Mobilevit Small License: checkbox
  I consent to being contacted by Embedl about products and services (optional): checkbox

Embedl Mobilevit Small (Quantized for TensorRT)

Deployable INT8-quantized version of apple/mobilevit-small, optimized with embedl-deploy for low-latency NVIDIA TensorRT inference on edge GPUs.

Highlights

Mixed-precision INT8/FP16 quantization with hardware-aware optimizations from embedl-deploy.
Drop-in replacement for apple/mobilevit-small in TensorRT pipelines — same input shape (256×256), same output semantics.
Validated accuracy within 3.30 pp of the FP32 baseline on ImageNet (see Accuracy table below).
Faster than trtexec --best on supported NVIDIA hardware (see Performance table below).
Includes both ONNX (for TensorRT) and PT2 (torch.export-loadable) artifacts plus runnable inference scripts.

Quick Start

pip install huggingface_hub onnxruntime-gpu pillow numpy
python -c "from huggingface_hub import snapshot_download; snapshot_download('embedl/mobilevit-small-quantized', local_dir='.')"
python infer_trt.py --image path/to/image.jpg   # TensorRT
# or
python infer_pt2.py --image path/to/image.jpg   # pure PyTorch via torch.export

Files

File	Purpose
`embedl_mobilevit_small_int8.onnx`	INT8-quantized ONNX with Q/DQ nodes — feed to TensorRT.
`embedl_mobilevit_small_int8.pt2`	INT8-quantized `torch.export` ExportedProgram.
`infer_trt.py`	Build a TRT engine from the ONNX and run sample inference.
`infer_pt2.py`	Load the `.pt2` with `torch.export.load` and run sample inference.
`latency_comparison.png`	Latency comparison across precisions and devices.

Performance

Latency measured with TensorRT + trtexec, GPU compute time only (--noDataTransfers), CUDA Graph + Spin Wait enabled, clocks locked (nvpmodel -m 0 && jetson_clocks on Jetson). See latency_comparison.png for a visual summary.

NVIDIA Jetson AGX Orin

Configuration	Mean Latency	Speedup vs FP16
TensorRT FP16	1.28 ms	1.00x
TensorRT --best (unconstrained)	1.09 ms	1.17x
Embedl Deploy INT8	1.09 ms	1.17x

Accuracy

Evaluated on the ImageNet validation split. The quantized model retains nearly all of the FP32 accuracy with a small tolerance.

Model	Top-1	Top-5
`apple/mobilevit-small` FP32 (ours)	78.14%	94.08%
Embedl Mobilevit Small INT8	74.83%	92.28%

Creating Your Own Optimized Models

This artifact was produced with embedl-deploy, Embedl's open-source PyTorch → TensorRT deployment library. You can apply the same workflow to your own models — see the documentation for installation and usage.

License

Component	License
Optimized model artifacts (this repo)	Embedl Models Community Licence v1.0 — no redistribution as a hosted service
Upstream architecture and weights	Mobilevit Small License

Contact

We offer engineering support for on-prem/edge deployments and partner co-marketing opportunities. Reach out at contact@embedl.com, or open an issue on GitHub.