EfficientNetV2: Smaller Models and Faster Training
Paper • 2104.00298 • Published • 1
Fine-tuned EfficientNet-V2-S for bird species classification across 237 North American species (Northeast / Long Island focus).
Part of the BirdVision project — real-time bird species identification from video using a Raspberry Pi 5 + Hailo-8 AI accelerator.
| Base model | EfficientNet-V2-S (ImageNet-1K pretrained) |
| Input | 224×224 RGB, ImageNet normalization |
| Output | 237-class softmax logits |
| Training data | iNaturalist research-grade observations, New York state |
| Training images | ~94,800 photos across 237 species |
| Val top-1 accuracy | 80.7% |
| Val top-5 accuracy | 94.0% |
Two-phase fine-tune on an NVIDIA RTX 3080 Ti:
Augmentation: random resized crop, horizontal flip, rotation ±20°, color jitter.
import json
import numpy as np
import onnxruntime as ort
from PIL import Image
from huggingface_hub import hf_hub_download
# Load model and labels
onnx_path = hf_hub_download("k10z/birdvision-efficientnet-s", "efficientnet_s_birds.onnx")
labels_path = hf_hub_download("k10z/birdvision-efficientnet-s", "species_labels.json")
session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
species = json.loads(open(labels_path).read())
# Preprocess image (224×224, ImageNet normalization)
def preprocess(image_path):
img = Image.open(image_path).convert("RGB").resize((224, 224))
arr = np.array(img, dtype=np.float32) / 255.0
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
arr = (arr - mean) / std
return arr.transpose(2, 0, 1)[None] # NCHW
# Run inference
logits = session.run(None, {"input": preprocess("bird.jpg")})[0][0]
top5 = np.argsort(logits)[::-1][:5]
for i in top5:
print(f"{species[i]:40s} {logits[i]:.3f}")
237 species — Northeast North America focus (Long Island / NY area).
See species_labels.json for the full list.
A compiled efficientnet_s_birds.hef for the Hailo-8
AI accelerator is included in this repo.
Benchmark on Raspberry Pi 5 (HailoRT 4.23.0):
Model weights derived from iNaturalist training data licensed CC BY-NC 4.0 — non-commercial use only.