Perch V2 — Optimized TFLite Models for Raspberry Pi

Optimized variants of Google's Perch V2 bird vocalization classifier for edge deployment on Raspberry Pi and ARM64 devices.

Three model variants converted directly from the official Google SavedModel, each targeting a different performance/quality trade-off.

Models

Model Size Inference (RPi 5) Embedding cosine Top-1 agree Top-5 agree Best for
perch_v2_original.tflite 409 MB 435 ms baseline baseline baseline Reference / high-RAM devices
perch_v2_fp16.tflite 205 MB 384 ms 0.9999 100% 99% RPi 5 (recommended)
perch_v2_dynint8.tflite 105 MB 299 ms 0.9927 93% 90% RPi 4 / low-RAM devices

Benchmarked on Raspberry Pi 5 Model B (8GB, Cortex-A76 @ 2.4GHz), 20 real bird recordings from 20 species, 5 runs each, 4 threads.

Quick Start

Choose your model

  • RPi 5 (4-8 GB): Use perch_v2_fp16.tflite — near-perfect accuracy, 2x smaller than original
  • RPi 4 (2-4 GB): Use perch_v2_dynint8.tflite — 4x smaller, 31% faster, very good accuracy
  • Desktop / reference: Use perch_v2_original.tflite — exact Google baseline

Usage

# Works with ai-edge-litert, tflite-runtime, or tensorflow
from ai_edge_litert.interpreter import Interpreter
import numpy as np

model_path = "perch_v2_fp16.tflite"  # or dynint8, or original
interpreter = Interpreter(model_path=model_path, num_threads=4)
interpreter.allocate_tensors()

inp = interpreter.get_input_details()
out = interpreter.get_output_details()

# Input: 5 seconds of audio at 32 kHz
audio = np.zeros((1, 160000), dtype=np.float32)  # replace with real audio
interpreter.set_tensor(inp[0]["index"], audio)
interpreter.invoke()

# Get species logits (14,795 classes)
logits = interpreter.get_tensor(out[3]["index"])[0]
top_species = np.argsort(logits)[-5:][::-1]

Download a single model

from huggingface_hub import hf_hub_download

# Download only the model you need
model_path = hf_hub_download(
    "ernensbjorn/perch-v2-int8-tflite",
    "perch_v2_fp16.tflite"
)

Model Details

Architecture

  • Backbone: EfficientNet-B3 (~12M params for embeddings)
  • Classification head: 91M params (101.8M total)
  • Input: 5.0 seconds @ 32,000 Hz = 160,000 float32 samples
  • Outputs:
    • Index 0: Spatial embeddings (16 x 4 x 1536)
    • Index 1: Temporal features
    • Index 2: 1536-dim global embedding
    • Index 3: 14,795 species logits (use this for classification)

Species Coverage

10,340 bird species + frogs, insects, mammals (14,795 total classes).

Use the included labels.txt for class names and bird_indices.json to filter bird-only species.

Quantization Methods

Variant Method What's quantized File size reduction
original None (float32 baseline) Nothing 1x
fp16 TFLite float16 quantization Weights stored as float16, dequantized at runtime 2x smaller
dynint8 TFLite dynamic range quantization Weights quantized to int8, activations remain float32 4x smaller

All variants were converted directly from the official Google Perch V2 SavedModel using tf.lite.TFLiteConverter with appropriate optimization flags. No binary patching or post-hoc manipulation.

Detailed Benchmarks

Raspberry Pi 5 (8 GB, Cortex-A76 @ 2.4 GHz, 4 threads)

Model Size p50 latency p95 latency Embedding cosine (mean) Embedding cosine (min) Top-1 Top-5
original 409 MB 435 ms 534 ms baseline baseline baseline baseline
fp16 205 MB 384 ms 477 ms 0.999994 0.999991 100% 99%
dynint8 105 MB 299 ms 405 ms 0.992748 0.972732 93% 90%
  • Embedding cosine: Cosine similarity of the 1536-dim embedding vector vs the float32 baseline. Values > 0.99 indicate negligible quality loss for downstream tasks.
  • Top-1/Top-5 agreement: How often the quantized model's top predicted species matches the original's prediction.
  • Test data: 20 real field recordings from 20 species (Rougegorge familier, Courlis cendré, Grive mauvis, Sarcelle d'hiver, Râle d'eau, etc.)

Raspberry Pi 4 Estimates

The RPi 4 (Cortex-A72 @ 1.8 GHz) is roughly 2-3x slower than the RPi 5. Expected latencies:

Model Estimated p50 RAM needed
original ~1000-1300 ms ~500 MB
fp16 ~900-1150 ms ~300 MB
dynint8 ~700-900 ms ~150 MB

For RPi 4 with 2 GB RAM, dynint8 is strongly recommended.

Origin

Converted from the official Google Perch V2 SavedModel (hosted by Google researcher cgeorgiaw on HuggingFace).

Created as part of the Birdash project — an open-source bird detection dashboard and engine for Raspberry Pi.

License

Apache 2.0 (same as the original Perch V2 model by Google)

Citation

If you use these models, please cite the original Perch V2 work:

@article{ghani2023global,
  title={Global birdsong embeddings enable superior transfer learning for bioacoustic classification},
  author={Ghani, Burooj and Denton, Tom and Kahl, Stefan and Klinck, Holger},
  journal={Scientific Reports},
  year={2023}
}
Downloads last month
178
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support