Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,9 +1,71 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
-
# NM i AI 2026
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- object-detection
|
| 5 |
+
- yolov8
|
| 6 |
+
- grocery
|
| 7 |
+
- retail
|
| 8 |
+
- onnx
|
| 9 |
+
datasets:
|
| 10 |
+
- custom
|
| 11 |
+
pipeline_tag: object-detection
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# NM i AI 2026 — NorgesGruppen Object Detection
|
| 15 |
|
| 16 |
+
Multi-class YOLOv8x detector for 356 grocery product categories on store shelf images.
|
| 17 |
|
| 18 |
+
## Performance
|
| 19 |
+
|
| 20 |
+
| Method | Leaderboard Score |
|
| 21 |
+
|--------|------------------|
|
| 22 |
+
| Multi-scale TTA (640+960+1280 + flip) | **0.9230** |
|
| 23 |
+
| Single inference | 0.8922 |
|
| 24 |
+
|
| 25 |
+
Competition scoring:
|
| 26 |
+
|
| 27 |
+
## Model Details
|
| 28 |
+
|
| 29 |
+
- **Architecture:** YOLOv8x (68.5M parameters)
|
| 30 |
+
- **Classes:** 356 grocery product categories
|
| 31 |
+
- **Training data:** 248 shelf images, 22,731 COCO annotations
|
| 32 |
+
- **Training resolution:** 1280px
|
| 33 |
+
- **Export format:** ONNX (dynamic input, 262 MB)
|
| 34 |
+
- **Inference:** Multi-scale TTA at 640/960/1280px with horizontal flip + WBF fusion
|
| 35 |
+
|
| 36 |
+
## Training
|
| 37 |
+
|
| 38 |
+
- Pretrained on COCO (YOLOv8x), fine-tuned on competition data
|
| 39 |
+
- Optimizer: AdamW (lr=0.01, weight_decay=0.0005, cosine LR)
|
| 40 |
+
- Augmentation: mosaic, mixup (0.2), copy-paste (0.15), perspective, rotation (±15°)
|
| 41 |
+
- 300 epochs at 1280px, batch=2 on NVIDIA A100 40GB
|
| 42 |
+
- Model soup: weight averaging of epochs 240-290 for better generalization
|
| 43 |
+
|
| 44 |
+
## Submission Contents
|
| 45 |
+
|
| 46 |
+
contains:
|
| 47 |
+
- — YOLOv8x model soup, dynamic input (262 MB)
|
| 48 |
+
- — YOLO class → COCO category_id mapping
|
| 49 |
+
- — Multi-scale TTA inference pipeline
|
| 50 |
+
|
| 51 |
+
## Usage
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
## Sandbox Environment
|
| 56 |
+
|
| 57 |
+
- GPU: NVIDIA L4, 24 GB VRAM
|
| 58 |
+
- Runtime: ~113s for test set (300s timeout)
|
| 59 |
+
- Dependencies: onnxruntime-gpu, opencv, numpy, ensemble-boxes
|
| 60 |
+
|
| 61 |
+
## Key Learnings
|
| 62 |
+
|
| 63 |
+
1. Multi-class YOLO (detect + classify in one step) massively outperformed two-stage (detector + kNN classifier)
|
| 64 |
+
2. Multi-scale TTA gave +0.031 improvement by better detecting small products
|
| 65 |
+
3. Model soup (weight averaging) improves generalization
|
| 66 |
+
4. Higher validation mAP does NOT predict better leaderboard score when training on all data
|
| 67 |
+
5. Dynamic ONNX export required for multi-scale inference
|
| 68 |
+
|
| 69 |
+
## License
|
| 70 |
+
|
| 71 |
+
MIT
|