🚗 Vehicle Detection using Ensemble YOLO + Weighted Boxes Fusion

M.Tech Thesis Project — Optimized vehicle detection using a Self-Adaptive 3-Tier WBF ensemble of YOLO11m + YOLO26m.

📊 Dataset

Source: Vehicle Dataset for YOLO (Kaggle)

Property	Value
Total Images	3,000
Classes	6 (balanced ~500 each)
Format	YOLO `.txt` (ready to use)
Original Split	2,100 train / 900 valid
Pipeline Split	60% train / 20% val / 20% test

Classes

ID	Class	Description
0	Car	Standard passenger car
1	Threewheel	Three-wheeler / tuk-tuk / auto-rickshaw
2	Bus
3	Truck
4	Motorbike
5	Van

📝 This dataset is balanced (~500 images per class) — no minority class augmentation needed (unlike DAWN).

📁 Repository Contents

File	Description
`kaggle_notebook.py`	Complete Kaggle-ready pipeline — paste into notebook and run (~4-6 hrs)

🏗️ Architecture

Models

YOLO11m (20M params) — Ultralytics YOLO11 medium, COCO pretrained
YOLO26m (20.4M params) — Ultralytics YOLO26 medium, COCO pretrained

Self-Adaptive 3-Tier WBF Ensemble

Input Image
    ├──→ YOLO11m ──→ Detections₁
    ├──→ YOLO11m_ft ──→ Detections₂  
    └──→ YOLO26m_ft ──→ Detections₃
                          │
                    ┌─────┴─────┐
                    │  Tier 1   │  Per-class F1-based weights (static, from val set)
                    │  Tier 2   │  Per-image confidence modulation (dynamic, α=0.1)
                    │  Tier 3   │  Log-dampened count normalization
                    └─────┬─────┘
                          │
                    Weighted Boxes Fusion (iou=0.55, conf_type=box_and_model_avg)
                          │
                    Fused Detections

Tier 1: $w^{base}{i,j} = \frac{F1{i,j}}{\sum_{k} F1_{k,j}}$

Tier 2: $\phi_i(I) = 1 + \alpha \cdot (\bar{s}_i - 0.5)$

Tier 3: $\hat{s}d = \frac{s_d \cdot w^{base}{i,j} \cdot \phi_i(I)}{\log_2(\max(n_{i,j}, 2))}$

🔧 Pipeline Phases

Phase	Description	Est. Time
1	Download from Kaggle + re-split (60/20/20)	~5 min
2	HP Search (4 trials × 15 epochs × 2 models)	~45 min
3	Train YOLO11m (100 epochs) + fine-tune (30 epochs)	~90 min
4	Train YOLO26m (100 epochs) + fine-tune (30 epochs)	~90 min
5	WBF Ensemble calibration	~15 min
6	Full evaluation (individual + ensemble)	~15 min
7	Save all results	~1 min

🚀 Quick Start (Kaggle)

# 1. Create Kaggle Notebook → GPU T4 x2
# 2. Paste kaggle_notebook.py contents into a cell
# 3. Run → results in /kaggle/working/results/

Output structure:

/kaggle/working/results/
├── all_results.json          # Complete metrics
├── ensemble_config.json      # WBF weights (loadable)
├── hp_search/
│   ├── yolo11m.json          # HP search results
│   └── yolo26m.json
└── weights/
    ├── yolo11m_ft_best.pt    # Fine-tuned YOLO11m
    ├── yolo11m_best.pt       # Base YOLO11m
    └── yolo26m_ft_best.pt    # Fine-tuned YOLO26m

🐛 Bug Fixes (from DAWN pipeline)

All fixes from the previous DAWN pipeline are pre-applied:

Bug	Fix
`NoneType` HP search crash	Null-check + bounded HP + cache clearing
`_thread.lock` pickle error	JSON save/load (no YOLO serialization)
Ensemble metric drop	`log2` normalization, no max-renorm, `α=0.1`
No test split	Auto-creates 60/20/20 from combined train+valid

📖 Citation

@misc{vehicle-yolo-wbf-2026,
  title={Optimized Vehicle Detection using Ensemble YOLO and Weighted Boxes Fusion},
  author={AmeenAktharT},
  year={2026},
  note={YOLO11m + YOLO26m with Self-Adaptive 3-Tier WBF Ensemble}
}

@article{solovyev2021wbf,
  title={Weighted boxes fusion: Ensembling boxes from different object detection models},
  author={Solovyev, Roman and Wang, Weimin and Gabruseva, Tatiana},
  journal={Image and Vision Computing},
  year={2021}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support