Video-MAE: Optimized for Qualcomm Devices
Video MAE (Masked Auto Encoder) is a network for doing video classification that uses the ViT (Vision Transformer) backbone.
This is based on the implementation of Video-MAE found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.
Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.
Getting Started
There are two ways to deploy this model on your device:
Option 1: Download Pre-Exported Models
Below are pre-exported model assets ready for deployment.
| Runtime | Precision | Chipset | SDK Versions | Download |
|---|---|---|---|---|
| ONNX | float | Universal | QAIRT 2.42, ONNX Runtime 1.24.3 | Download |
| ONNX | w8a16 | Universal | QAIRT 2.42, ONNX Runtime 1.24.3 | Download |
| QNN_DLC | float | Universal | QAIRT 2.43 | Download |
| TFLITE | float | Universal | QAIRT 2.43, TFLite 2.19.1 | Download |
For more device-specific assets and performance metrics, visit Video-MAE on Qualcomm® AI Hub.
Option 2: Export with Custom Configurations
Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:
- Custom weights (e.g., fine-tuned checkpoints)
- Custom input shapes
- Target device and runtime configurations
This option is ideal if you need to customize the model beyond the default configuration provided here.
See our repository for Video-MAE on GitHub for usage instructions.
Model Details
Model Type: Model_use_case.video_classification
Model Stats:
- Model checkpoint: Kinectics-400
- Input resolution: 224x224
- Number of parameters: 87.7M
- Model size (float): 335 MB
Performance Summary
| Model | Runtime | Precision | Chipset | Inference Time (ms) | Peak Memory Range (MB) | Primary Compute Unit |
|---|---|---|---|---|---|---|
| Video-MAE | ONNX | float | Snapdragon® 8 Elite Gen 5 Mobile | 456.524 ms | 9 - 1085 MB | NPU |
| Video-MAE | ONNX | float | Snapdragon® X2 Elite | 449.272 ms | 187 - 187 MB | NPU |
| Video-MAE | ONNX | float | Snapdragon® X Elite | 586.794 ms | 187 - 187 MB | NPU |
| Video-MAE | ONNX | float | Snapdragon® 8 Gen 3 Mobile | 368.651 ms | 1 - 1260 MB | NPU |
| Video-MAE | ONNX | float | Qualcomm® QCS8550 (Proxy) | 566.739 ms | 0 - 219 MB | NPU |
| Video-MAE | ONNX | float | Qualcomm® QCS9075 | 702.93 ms | 9 - 21 MB | NPU |
| Video-MAE | ONNX | float | Snapdragon® 8 Elite For Galaxy Mobile | 390.742 ms | 1 - 1006 MB | NPU |
| Video-MAE | ONNX | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 320.662 ms | 5 - 1071 MB | NPU |
| Video-MAE | ONNX | w8a16 | Snapdragon® X2 Elite | 320.484 ms | 100 - 100 MB | NPU |
| Video-MAE | ONNX | w8a16 | Snapdragon® X Elite | 477.996 ms | 98 - 98 MB | NPU |
| Video-MAE | ONNX | w8a16 | Snapdragon® 8 Gen 3 Mobile | 461.219 ms | 5 - 1266 MB | NPU |
| Video-MAE | ONNX | w8a16 | Qualcomm® QCS6490 | 8924.347 ms | 337 - 353 MB | CPU |
| Video-MAE | ONNX | w8a16 | Qualcomm® QCS8550 (Proxy) | 465.061 ms | 0 - 128 MB | NPU |
| Video-MAE | ONNX | w8a16 | Qualcomm® QCS9075 | 1393.064 ms | 0 - 7 MB | NPU |
| Video-MAE | ONNX | w8a16 | Qualcomm® QCM6690 | 4490.407 ms | 344 - 355 MB | CPU |
| Video-MAE | ONNX | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 276.519 ms | 1 - 1046 MB | NPU |
| Video-MAE | ONNX | w8a16 | Snapdragon® 7 Gen 4 Mobile | 4388.447 ms | 313 - 326 MB | CPU |
| Video-MAE | QNN_DLC | float | Snapdragon® 8 Elite Gen 5 Mobile | 330.03 ms | 9 - 963 MB | NPU |
| Video-MAE | QNN_DLC | float | Snapdragon® X2 Elite | 296.63 ms | 9 - 9 MB | NPU |
| Video-MAE | QNN_DLC | float | Snapdragon® X Elite | 472.162 ms | 9 - 9 MB | NPU |
| Video-MAE | QNN_DLC | float | Snapdragon® 8 Gen 3 Mobile | 382.065 ms | 9 - 1167 MB | NPU |
| Video-MAE | QNN_DLC | float | Qualcomm® QCS8275 (Proxy) | 1090.226 ms | 0 - 947 MB | NPU |
| Video-MAE | QNN_DLC | float | Qualcomm® QCS8550 (Proxy) | 450.843 ms | 9 - 11 MB | NPU |
| Video-MAE | QNN_DLC | float | Qualcomm® SA8775P | 496.616 ms | 0 - 972 MB | NPU |
| Video-MAE | QNN_DLC | float | Qualcomm® QCS9075 | 514.726 ms | 11 - 22 MB | NPU |
| Video-MAE | QNN_DLC | float | Qualcomm® QCS8450 (Proxy) | 580.84 ms | 8 - 1064 MB | NPU |
| Video-MAE | QNN_DLC | float | Qualcomm® SA7255P | 1090.226 ms | 0 - 947 MB | NPU |
| Video-MAE | QNN_DLC | float | Qualcomm® SA8295P | 568.854 ms | 0 - 858 MB | NPU |
| Video-MAE | QNN_DLC | float | Snapdragon® 8 Elite For Galaxy Mobile | 260.148 ms | 9 - 969 MB | NPU |
| Video-MAE | TFLITE | float | Snapdragon® 8 Elite Gen 5 Mobile | 51.847 ms | 0 - 934 MB | NPU |
| Video-MAE | TFLITE | float | Snapdragon® 8 Gen 3 Mobile | 92.345 ms | 0 - 1132 MB | NPU |
| Video-MAE | TFLITE | float | Qualcomm® QCS8275 (Proxy) | 421.709 ms | 0 - 923 MB | NPU |
| Video-MAE | TFLITE | float | Qualcomm® QCS8550 (Proxy) | 127.723 ms | 0 - 237 MB | NPU |
| Video-MAE | TFLITE | float | Qualcomm® SA8775P | 152.797 ms | 0 - 924 MB | NPU |
| Video-MAE | TFLITE | float | Qualcomm® QCS9075 | 157.616 ms | 0 - 206 MB | NPU |
| Video-MAE | TFLITE | float | Qualcomm® QCS8450 (Proxy) | 262.805 ms | 0 - 1068 MB | NPU |
| Video-MAE | TFLITE | float | Qualcomm® SA7255P | 421.709 ms | 0 - 923 MB | NPU |
| Video-MAE | TFLITE | float | Qualcomm® SA8295P | 202.006 ms | 0 - 867 MB | NPU |
| Video-MAE | TFLITE | float | Snapdragon® 8 Elite For Galaxy Mobile | 69.86 ms | 0 - 930 MB | NPU |
License
- The license for the original implementation of Video-MAE can be found here.
References
- Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
- Source Model Implementation
Community
- Join our AI Hub Slack community to collaborate, post questions and learn more about on-device AI.
- For questions or feedback please reach out to us.
