YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
π§Ύ Model Card β CivicAi-YOLO11m-v1
π§ Model Overview
PotholeNet-YOLO11m-v1 is a fine-tuned object detection model built on Ultralytics YOLO11m architecture, specifically trained to detect potholes, road damage, and garbage from street-level imagery. The model leverages YOLO11m's C2PSA (Cross-Stage Partial with Spatial Attention) mechanism, making it highly effective at identifying irregular-shaped urban defects like potholes.
Trained on a large-scale, curated civic infrastructure dataset of 23,000+ street-level images from Indian urban environments, this model is designed to power real-time civic issue detection systems, enabling automated reporting and faster municipal response.
It serves as the Detection Layer (Layer 1) of the Aamchi City AI Civic System β an end-to-end intelligent dashboard for urban infrastructure monitoring.
ποΈ Training Details
| Parameter | Value |
|---|---|
| Base Model | yolo11m.pt (COCO pretrained) |
| Architecture | YOLO11m (C3k2 + C2PSA Spatial Attention) |
| Framework | Ultralytics v8.x |
| Training Hardware | Kaggle β NVIDIA T4 Γ2 (Dual GPU) |
| Epochs | 50 |
| Input Resolution | 768Γ768 |
| Batch Size | Auto (batch=-1) |
| Optimizer | AdamW |
| Learning Rate | lr0=0.001, cosine decay to lrf=0.01 |
| Warmup | 3 epochs |
| Weight Decay | 0.0005 |
| AMP | Enabled (FP16 mixed precision) |
| Early Stopping | patience=10 (did not trigger β model was still improving) |
Loss Weights
| Loss | Weight |
|---|---|
| Box Loss | 7.5 |
| Classification Loss | 1.0 |
| DFL Loss | 1.5 |
Augmentation Pipeline
| Augmentation | Value |
|---|---|
| Mosaic | 1.0 |
| MixUp | 0.15 |
| Copy-Paste | 0.1 |
| HSV (H/S/V) | 0.015 / 0.7 / 0.4 |
| Rotation | Β±10Β° |
| Scale | 0.5 |
| Shear | 2.0 |
| Horizontal Flip | 0.5 |
| Erasing | 0.3 |
| Label Smoothing | 0.05 |
| Close Mosaic | Last 8 epochs |
π Dataset Description
The model was trained on a curated subset of 23,179 street-level images collected from Indian urban environments. The dataset underwent extensive preprocessing:
- Perceptual Hash (pHash) Deduplication β Removed near-duplicate images using hamming distance β€ 4
- Corrupt Image Removal β Verified all images via PIL
- Intelligent Negative Sampling β Trimmed empty-label (background) images to 2,000 hard negatives
- Stratified Split β 80% Train / 15% Val / 5% Test, stratified by dominant class
Label Classes
| Class ID | Class Name | Description |
|---|---|---|
| π΄ 0 | Pothole | Road surface cavities and depressions |
| π‘ 1 | Road Damage | Cracks, surface wear, and structural deterioration |
| π’ 2 | Garbage | Street-level waste and debris accumulation |
Priority: Pothole (primary) > Garbage > Road Damage
π― Evaluation Metrics
| Metric | Score |
|---|---|
| mAP50 | 0.86 |
| mAP50-95 | β |
| Parameters | ~20M |
| Model Size | ~39 MB |
| Inference Speed | Real-time on GPU |
β‘ The model did not trigger early stopping at 50 epochs, indicating further training could yield additional performance gains.
π¬ Example Usage
Python (Ultralytics)
from ultralytics import YOLO
# Load model
model = YOLO("best.pt")
# Run inference
results = model("street_image.jpg", imgsz=768, conf=0.25)
# Display results
results[0].show()
# Access detections
for box in results[0].boxes:
cls = int(box.cls)
conf = float(box.conf)
xyxy = box.xyxy[0].tolist()
class_names = {0: "pothole", 1: "road_damage", 2: "garbage"}
print(f"{class_names[cls]}: {conf:.2f} at {xyxy}")
With Test-Time Augmentation (TTA)
# TTA boosts mAP by +1-3% at the cost of inference speed
results = model("street_image.jpg", imgsz=768, conf=0.25, augment=True)
Filter Pothole-Only Detections
results = model("street_image.jpg", conf=0.25)
boxes = results[0].boxes
pothole_mask = boxes.cls == 0
pothole_boxes = boxes[pothole_mask]
print(f"Found {len(pothole_boxes)} potholes")
π§© Intended Use
- Real-time pothole detection from dashcam, mobile phone, or street-view imagery
- Automated civic issue reporting β GPS-tagged detection for municipal dashboards
- Infrastructure health monitoring β Severity scoring and trend analysis for road maintenance
- Smart city integration β Layer 1 detection input for AI-driven civic action systems
- Mobile deployment β Exportable to ONNX for edge inference on mobile devices
β οΈ Limitations
- The model is optimized for Indian urban road conditions; performance may degrade on highways, rural roads, or non-Indian geographies.
- Road damage class has visual overlap with potholes, which may cause occasional misclassification between the two.
- Performance is best on daytime, clear-weather imagery β low-light and rain-occluded scenes may reduce accuracy.
- The model was trained for 50 epochs without early stopping trigger, suggesting the checkpoint is not fully converged and further fine-tuning could improve results.
- Small potholes (< 32px at 768px resolution) may be missed in wide-angle shots.
π§βπ» Developer
| Author | Vansh Momaya |
| Institution | D. J. Sanghvi College of Engineering |
| Focus Area | Computer Vision, Object Detection, AI for Civic Infrastructure |
| vanshmomaya9@gmail.com |
π Citation
If you use PotholeNet-YOLO11m-v1 in your research or project:
@online{momaya2026potholenet,
author = {Vansh Momaya},
title = {PotholeNet-YOLO11m-v1: Real-Time Pothole and Civic Issue Detection for Indian Urban Roads},
year = {2026},
version = {v1},
url = {https://huggingface.co/Vansh180/PotholeNet-YOLO11m-v1},
institution = {D. J. Sanghvi College of Engineering},
note = {Fine-tuned YOLO11m model for detecting potholes, road damage, and garbage in Indian street imagery},
license = {MIT}
}
π Acknowledgements
- Ultralytics YOLO11 β Base architecture and training framework
- Kaggle β Training infrastructure (Dual T4 GPU)
- Aamchi City β Datahack 4 β Hackathon context and dataset
Built for the Aamchi City AI Civic System β Datahack 4, PS2 Core ML