BirdVision — EfficientNet-V2-S Bird Species Classifier

Fine-tuned EfficientNet-V2-S for bird species classification across 237 North American species (Northeast / Long Island focus).

Part of the BirdVision project — real-time bird species identification from video using a Raspberry Pi 5 + Hailo-8 AI accelerator.

Model details


Base model	EfficientNet-V2-S (ImageNet-1K pretrained)
Input	224×224 RGB, ImageNet normalization
Output	237-class softmax logits
Training data	iNaturalist research-grade observations, New York state
Training images	~94,800 photos across 237 species
Val top-1 accuracy	80.7%
Val top-5 accuracy	94.0%

Training

Two-phase fine-tune on an NVIDIA RTX 3080 Ti:

Phase 1 (5 epochs, head only): frozen backbone, LR=1e-3
Phase 2 (15 epochs, full): all layers unfrozen, LR=5e-5, cosine annealing

Augmentation: random resized crop, horizontal flip, rotation ±20°, color jitter.

Usage

import json
import numpy as np
import onnxruntime as ort
from PIL import Image
from huggingface_hub import hf_hub_download

# Load model and labels
onnx_path = hf_hub_download("k10z/birdvision-efficientnet-s", "efficientnet_s_birds.onnx")
labels_path = hf_hub_download("k10z/birdvision-efficientnet-s", "species_labels.json")

session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
species = json.loads(open(labels_path).read())

# Preprocess image (224×224, ImageNet normalization)
def preprocess(image_path):
    img = Image.open(image_path).convert("RGB").resize((224, 224))
    arr = np.array(img, dtype=np.float32) / 255.0
    mean = np.array([0.485, 0.456, 0.406])
    std  = np.array([0.229, 0.224, 0.225])
    arr = (arr - mean) / std
    return arr.transpose(2, 0, 1)[None]  # NCHW

# Run inference
logits = session.run(None, {"input": preprocess("bird.jpg")})[0][0]
top5 = np.argsort(logits)[::-1][:5]
for i in top5:
    print(f"{species[i]:40s} {logits[i]:.3f}")

Species list

237 species — Northeast North America focus (Long Island / NY area). See species_labels.json for the full list.

Hailo-8 HEF (Raspberry Pi 5)

A compiled efficientnet_s_birds.hef for the Hailo-8 AI accelerator is included in this repo.

Benchmark on Raspberry Pi 5 (HailoRT 4.23.0):

22.3 FPS hardware throughput
43.7 ms hardware latency
4 contexts, 8 clusters

License

Model weights derived from iNaturalist training data licensed CC BY-NC 4.0 — non-commercial use only.

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for k10z/birdvision-efficientnet-s

EfficientNetV2: Smaller Models and Faster Training

Paper • 2104.00298 • Published Apr 1, 2021 • 1