BirdVision — EfficientNet-V2-S Bird Species Classifier

Fine-tuned EfficientNet-V2-S for bird species classification across 237 North American species (Northeast / Long Island focus).

Part of the BirdVision project — real-time bird species identification from video using a Raspberry Pi 5 + Hailo-8 AI accelerator.

Model details

Base model EfficientNet-V2-S (ImageNet-1K pretrained)
Input 224×224 RGB, ImageNet normalization
Output 237-class softmax logits
Training data iNaturalist research-grade observations, New York state
Training images ~94,800 photos across 237 species
Val top-1 accuracy 80.7%
Val top-5 accuracy 94.0%

Training

Two-phase fine-tune on an NVIDIA RTX 3080 Ti:

  • Phase 1 (5 epochs, head only): frozen backbone, LR=1e-3
  • Phase 2 (15 epochs, full): all layers unfrozen, LR=5e-5, cosine annealing

Augmentation: random resized crop, horizontal flip, rotation ±20°, color jitter.

Usage

import json
import numpy as np
import onnxruntime as ort
from PIL import Image
from huggingface_hub import hf_hub_download

# Load model and labels
onnx_path = hf_hub_download("k10z/birdvision-efficientnet-s", "efficientnet_s_birds.onnx")
labels_path = hf_hub_download("k10z/birdvision-efficientnet-s", "species_labels.json")

session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
species = json.loads(open(labels_path).read())

# Preprocess image (224×224, ImageNet normalization)
def preprocess(image_path):
    img = Image.open(image_path).convert("RGB").resize((224, 224))
    arr = np.array(img, dtype=np.float32) / 255.0
    mean = np.array([0.485, 0.456, 0.406])
    std  = np.array([0.229, 0.224, 0.225])
    arr = (arr - mean) / std
    return arr.transpose(2, 0, 1)[None]  # NCHW

# Run inference
logits = session.run(None, {"input": preprocess("bird.jpg")})[0][0]
top5 = np.argsort(logits)[::-1][:5]
for i in top5:
    print(f"{species[i]:40s} {logits[i]:.3f}")

Species list

237 species — Northeast North America focus (Long Island / NY area). See species_labels.json for the full list.

Hailo-8 HEF (Raspberry Pi 5)

A compiled efficientnet_s_birds.hef for the Hailo-8 AI accelerator is included in this repo.

Benchmark on Raspberry Pi 5 (HailoRT 4.23.0):

  • 22.3 FPS hardware throughput
  • 43.7 ms hardware latency
  • 4 contexts, 8 clusters

License

Model weights derived from iNaturalist training data licensed CC BY-NC 4.0 — non-commercial use only.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for k10z/birdvision-efficientnet-s