YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Hierarchical Grocery Detector (hYOLO V4)

A hierarchical object detection model for fine-grained grocery product recognition on retail shelf images. Built for the NorgesGruppen grocery dataset as part of NM i AI 2026 (Norway's National AI Championship).

Model Description

Standard object detectors assign a single flat class label per detection. On a grocery shelf with hundreds of visually similar products, this fails — the model has no way to express that a misidentified Nescafé variant is a less severe error than misidentifying it as a completely different brand.

This model uses a 4-level hierarchical classification head (hYOLO V4) on top of a YOLOv8x backbone. Each detected product is simultaneously classified at four levels of specificity:

Level Description Classes
L0 Section (e.g. hot drinks, breakfast) 4
L1 Format / type (e.g. filter coffee, capsules) 30
L2 Brand (e.g. Nescafé, Friele, Evergood) 149
L3 Specific product / SKU 323

The hierarchy enforces consistency: a product predicted as nescafe__kapsler at L3 must also be predicted as Nescafé at L2, hot drinks at L1, and varmedrikker at L0. A parent-consistency penalty during training enforces this.

Architecture

  • Backbone: YOLOv8x pretrained on SKU-110K (generic shelf product detection)
  • Neck: Standard YOLOv8 PANet
  • Detection head: Standard YOLOv8 bbox regression head (DFL + CIoU)
  • Classification head: hYOLO V4 hierarchical head
    • Per-level Conv branches with bidirectional cross-level information flow
    • Each level sees the previous level's prediction as context
    • 4 independent classification outputs (L0–L3)

Training

Pretrained weights: SKU-110K backbone (generic retail shelf detection)

Fine-tuning dataset: NorgesGruppen grocery images

  • 248 training images across 323 leaf product categories
  • 4-level product taxonomy built from NorgesGruppen's product catalogue

Loss function:

  • Per-level BCE with label smoothing (ε=0.1)
  • Inverse-frequency pos_weight for long-tail class balancing
  • Parent-consistency penalty (α=25) penalises hierarchy violations
  • Level loss weights scaled to compensate for bidirectional gradient flow

Training procedure:

  • Phase 1 (50 epochs): frozen backbone, classification head only
  • Phase 2 (250 epochs): full fine-tune with gradual backbone unfreeze
  • AdamW optimiser, cosine LR schedule
  • EMA weights tracked; best checkpoint saved by F1Hier

Training hardware: NVIDIA A100 (Google Colab Pro+)

Performance

Evaluated on held-out validation set using hierarchical F1 (Kiritchenko et al. 2006):

Metric Value
F1Hier 0.9426
L0 accuracy (section) 0.972
L1 accuracy (format) 0.949
L2 accuracy (brand) 0.926
L3 accuracy (product) 0.884

F1Hier rewards partial credit for correct ancestor predictions — getting the brand right when the specific SKU is wrong scores better than a completely wrong prediction.

License

MIT

Language:

Norwegian

Task

type: object-detection and classification

metrics

type: f1 value: 0.9426 name: F1Hier (val) type: accuracy value: 0.884 name: L3 leaf accuracy (val)

ONNX Inference (recommended)

import json
import cv2
import numpy as np
import onnxruntime as ort
from pathlib import Path

# Load session
sess = ort.InferenceSession("model.onnx",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"])

# Load L3 → COCO category mapping
with open("l3_to_coco.json") as f:
    l3_to_coco = {int(k): int(v) for k, v in json.load(f).items()}

# Preprocess image (letterbox to 1024×1024, FP16)
img  = cv2.imread("shelf.jpg")
# ... letterbox, normalise, transpose to (1, 3, 1024, 1024) ...
blob = preprocessed_image.astype(np.float16)

# Inference — output shape: (1, num_anchors, 4 + 323)
output = sess.run(None, {"images": blob})[0]

# output[:, :, :4]  = xyxy boxes in pixel space
# output[:, :, 4:]  = sigmoid leaf class scores (323 classes)

PyTorch Inference

import torch
from hyolo_finetune import HierDetect, Hierarchy, _get_head_inputs

ckpt      = torch.load("best.pt", map_location="cpu", weights_only=False)
model     = ckpt["model"].float().eval()
hierarchy = Hierarchy("hierarchy.json")

with torch.no_grad():
    feats              = _get_head_inputs(model, image_tensor)
    box_preds, lvl_cls = model.model[-1].forward_hier(feats)
    # lvl_cls[3] = leaf-level (L3) predictions
    # lvl_cls[2] = brand-level (L2) predictions

Files

File Description
best.pt PyTorch checkpoint (EMA weights, FP16)
model.onnx Exported ONNX model (FP16, opset 18)
hierarchy.json 4-level product taxonomy
l3_to_coco.json L3 index → COCO category ID mapping
hyolo_finetune.py Model definition and training code

Limitations

  • Trained on Norwegian grocery products — performance on non-Norwegian retailers will be limited
  • 1 out of 323 leaf classes has no training examples (FRIELE FROKOST PRESSKANNE 250G) and cannot be predicted
  • Performance degrades on heavily occluded products and non-shelf contexts
  • Optimised for shelf-level photography; very close-up or overhead angles may reduce accuracy

Citation

If you use this model, please cite:

@misc{hyolo-v4-grocery-2026,
  title  = {Hierarchical Grocery Detector (hYOLO V4)},
  author = {Ryan Marinelli},
  year   = {2026},
  url    = {https://huggingface.co/zrmarine/hierarchical-grocery-detecto}
}

Relationship to hYOLO Paper

The hierarchical classification head in this model is inspired by Tsenkova et al. (arXiv 2510.23278), which proposed hYOLO for hierarchical classification on top of YOLOv8. This implementation extends that work in several ways:

  • Full object detection: the original paper focuses on classification of pre-cropped images. This model adds a complete YOLOv8 detection pipeline — anchor-free bbox regression, DFL loss, Task-Aligned Assigner, and IoU-matched validation — so it localises and classifies products simultaneously from raw shelf images.
  • Bidirectional gradient flow: cross-level information flows both up and down the hierarchy, allowing fine-grained SKU features to refine coarser brand and section predictions.
  • Hierarchy-aware training improvements: per-level BCE loss weighting to compensate for cascaded gradients, inverse-frequency pos_weights for long-tail class balancing, label smoothing, and a parent-consistency penalty with reward.
  • EMA tracking and IoU-matched validation metrics throughout training.

Citation

If building on the hYOLO architecture, please cite the original paper:

@misc{tsenkova2025hyolo,
  title         = {hYOLO Model: Enhancing Object Classification with Hierarchical Context in YOLOv8},
  author        = {Veska Tsenkova and Peter Stanchev and Daniel Petrov and Deyan Lazarov},
  year          = {2025},
  eprint        = {2510.23278},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2510.23278}
}

Acknowledgements

  • hYOLO architecture (extended): Tsenkova et al., "hYOLO Model: Enhancing Object Classification with Hierarchical Context in YOLOv8" (arXiv 2510.23278, 2025)
  • YOLOv8 architecture by Ultralytics
  • SKU-110K pretraining dataset: Goldman et al., "Precise Detection in Densely Packed Scenes" (CVPR 2019)
  • Hierarchical F1 metric: Kiritchenko et al., "Functional Annotation of Genes Using Hierarchical Text Categorization" (2006)
  • NM i AI 2026 organised by Astar Technologies
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for zrmarine/hierarchical-grocery-detector