File size: 3,821 Bytes

b93c971
3aa1ceb
 
b93c971
3aa1ceb
 
b93c971
 
 
 
 
 
3aa1ceb
 
 
 
b93c971
 
3aa1ceb
 
 
 
 
 
 
 
 
 
 
b93c971
3aa1ceb
b93c971
3aa1ceb
 
 
 
 
b93c971
 
 
3aa1ceb
 
 
 
b93c971
3aa1ceb
 
 
 
 
 
b93c971
3aa1ceb
b93c971
3aa1ceb
b93c971
 
3aa1ceb
b93c971
 
 
3aa1ceb
 
 
 
 
 
 
b93c971
3aa1ceb
b93c971
3aa1ceb
 
b93c971
3aa1ceb
b93c971
 
 
 
3aa1ceb
b93c971
 
 
 
 
 
 
3aa1ceb
b93c971
 
 
3aa1ceb
 
 
b93c971
 
3aa1ceb
 
 
 
 
 
 
 
b93c971
 
3aa1ceb
 
 
 
b93c971
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3aa1ceb

---
license: mit
pipeline_tag: object-detection
library_name: ultralytics
datasets:
  - LogoDet-3K
tags:
  - yolo
  - yolo11
  - logo-detection
  - logodet-3k
  - brandspotter
  - brand-detection
  - sports-broadcasting
  - dooh
  - computer-vision
---

# BrandSpotter: Logo Detection and Brand Identification for Sports Broadcasting

BrandSpotter is a multi-stage computer vision pipeline built to detect and
identify brand logos in broadcast video, with applications in sponsor visibility
measurement and digital out-of-home (DOOH) advertising analytics.

**Problem:** Broadcasters and sponsors need to quantify how often, and how
clearly, brand logos appear on screen during live sports. This requires
detecting logos at broadcast speed, classifying them by brand, and handling
real-world challenges like motion blur, partial occlusion, camera angle
variation, and lighting washout.

**Approach:** A three-stage pipeline:

1. **YOLO11m** for single-class logo region detection (this repo)
2. **ResNet50** for brand classification with open-set rejection (coming soon)
3. **Frame-level analytics** for dwell-time measurement and visibility scoring

Source code: [github.com/daa2618/brandspotter](https://github.com/daa2618/brandspotter)

## Models

### YOLO11m: Logo Detection (`yolo/`)

Fine-tuned YOLO11m for single-class logo detection on
[LogoDet-3K](https://github.com/Wangjing1551/LogoDet-3K-Dataset).

| Metric         | Value     |
|----------------|-----------|
| mAP@0.5        | **0.894** |
| mAP@0.5:0.95   | **0.639** |
| Precision      | 0.829     |
| Recall         | 0.863     |

**Training configuration:**

- Base model: `yolo11m.pt` (COCO-pretrained)
- Epochs: 50 (best checkpoint at epoch 47)
- Image size: 640x640
- Optimizer: AdamW (auto-selected)
- Learning rate: 0.001
- Batch size: auto
- Hardware: Google Colab T4 GPU (~2 hours)
- Dataset: LogoDet-3K (158,652 images, 3,000 classes collapsed to single "logo" class)
- Augmentation: mosaic, RandAugment, erasing (0.4), horizontal flip (0.5)

**Design rationale:** A single-class detector maximises recall across all logo
types, delegating brand-specific classification to the downstream ResNet
stage. This separation allows the detector to generalise to unseen brands
without retraining.

### ResNet50: Brand Classification (`resnet/`)

*Coming soon.* A fine-tuned ResNet50 classifier trained on a curated
brand dictionary with open-set rejection for unknown or novel logos.

## Quick Start

```python
from ultralytics import YOLO

# Load directly from HuggingFace
model = YOLO("hf://vectorized-dev/brandspotter/yolo/best.pt")

# Run inference
results = model("path/to/image.jpg")
results[0].show()
```

## Repository Contents

```
yolo/
  best.pt       # Trained weights (best checkpoint, ~39 MB)
  args.yaml     # Full training arguments
  results.csv   # Per-epoch training metrics
```

## Roadmap

- [ ] ResNet50 brand classifier weights and evaluation
- [ ] Open-set rejection threshold calibration
- [ ] End-to-end inference script (detect + classify + dwell-time)
- [ ] Sample results on sports broadcast footage
- [ ] Dataset card for curated brand dictionary

## Dataset

[LogoDet-3K](https://github.com/Wangjing1551/LogoDet-3K-Dataset) (Wang et al.,
ACM TOMM 2022). 158,652 images across 3,000 logo classes. The detection model
treats all logos as a single class for region proposal; brand identification is
handled downstream.

## Citation

```bibtex
@article{wang2022logodet3k,
  title={LogoDet-3K: A Large-scale Image Dataset for Logo Detection},
  author={Wang, Jing and Min, Weiqing and Hou, Sujuan and Ma, Shengnan and Zheng, Yuanjie and Jiang, Shuqiang},
  journal={ACM Transactions on Multimedia Computing, Communications, and Applications},
  volume={18},
  number={3},
  year={2022},
  publisher={ACM}
}
```

## License

MIT