--- license: apache-2.0 library_name: onnx tags: - face-detection - face-landmark - mediapipe - blazeface - onnx base_model: google/mediapipe pipeline_tag: object-detection --- # MediaPipe Face — Detection + 6-point Landmarks (ONNX) Commercial-clean face pipeline bundling both precision variants of Google's MediaPipe Face Detection + Face Landmark models. Apache-2.0 throughout the entire chain: Google MediaPipe → zmurez/MediaPipePyTorch port → Qualcomm AI Hub ONNX export. Two-stage pipeline: 1. **Face detector** (BlazeFace-derived) — finds face bounding boxes 2. **Face landmark detector** — for each detected face, returns 6 keypoints (left eye, right eye, nose tip, mouth, left eye tragion, right eye tragion) Re-hosted under Heliosoph as a single bundled repo for convenience — float + int8 variants live in separate subfolders. Credit: Google MediaPipe team (original models), Zak Murez (PyTorch port), Qualcomm AI Hub (ONNX export). ## What this repo contains ``` float/ # fp32 — recommended default face_detector.onnx # 78 KB graph face_detector.data # 517 KB external weights face_landmark_detector.onnx # 58 KB graph face_landmark_detector.data # 2.4 MB external weights metadata.json int8/ # W8A8 — quantized, smaller/faster face_detector.onnx face_detector.data face_landmark_detector.onnx face_landmark_detector.data metadata.json LICENSE README.md ``` **Important: external weights pattern.** Each `.onnx` file is paired with a `.data` file holding the actual tensor weights. Both files must be in the same directory at load time — ONNX Runtime resolves the `.data` file by relative path from the `.onnx`. ## How to use ```python import onnxruntime as ort import numpy as np # Stage 1: detect faces detector = ort.InferenceSession("float/face_detector.onnx") # Input: 128×128 RGB, normalized to [-1, 1] detections = detector.run(None, {"image": preprocessed_128x128_image}) # Stage 2: landmark each detected face landmarker = ort.InferenceSession("float/face_landmark_detector.onnx") # Input: 192×192 RGB crop around each detected face landmarks = landmarker.run(None, {"image": face_crop_192x192}) ``` Reference preprocessing + decoding: [zmurez/MediaPipePyTorch](https://github.com/zmurez/MediaPipePyTorch) has the canonical Python implementation. ## float vs int8 — which to pick | Variant | Size | Best for | |---|---|---| | `float/` (this default) | ~3 MB | GPU, max accuracy. Recommended general default. | | `int8/` | ~1.5 MB | CPU, NPU (OpenVINO), mobile. Some accuracy loss on small/distant faces; near-identical on close portraits. | Catalog entries: `mediapipe-face` (float) and `mediapipe-face-int8` (int8). Both reference this single repo with different `include` patterns. ## Why MediaPipe over alternatives - **vs InsightFace SCRFD** — SCRFD's released weights are non-commercial-research-only (WIDER FACE dataset terms). MediaPipe was trained by Google on commercial-friendly data and released under permissive terms. - **vs YuNet** — YuNet is technically also encumbered by WIDER FACE; upstream just doesn't surface that. MediaPipe is unambiguous. - **vs YOLOv8-Face** — Ultralytics AGPL-3.0. MediaPipe is Apache-2.0. If you need higher accuracy on small faces in dense scenes (crowd photos, surveillance angles), MediaPipe will underperform RetinaFace-class detectors. For the common case (close portraits, video conferencing, photo tagging), MediaPipe is the right default. ## License **Apache-2.0** — same as upstream (Google MediaPipe). `LICENSE` file included; chain of attribution is documented above.