YOLOv11s Skin Lesion Detection — ISIC 2018

Fine-tuned YOLOv11s on ISIC 2018 Task 3 for dermoscopic skin lesion detection and classification, with Grad-CAM++ explainability analysis.

Model Details

Architecture: YOLOv11s (9.4M parameters, 21.6 GFLOPs)
Dataset: ISIC 2018 Task 3 — 10,015 dermoscopy images
Classes: 7 skin conditions
Training: 80 epochs, AdamW + Cosine LR, Tesla T4 GPU
Input size: 640×640

Files

File	Description
`best_v2.pt`	Best model — 80 epochs, cos_lr (recommended)
`best.pt`	v1 baseline — 50 epochs
`results_v2.png`	Training curves v2

Classes

ID	Class	Full Name	Train Samples
0	MEL	Melanoma	1113
1	NV	Melanocytic Nevus	6705
2	BCC	Basal Cell Carcinoma	514
3	AKIEC	Actinic Keratosis / Intraepithelial Carcinoma	327
4	BKL	Benign Keratosis	1099
5	DF	Dermatofibroma	115
6	VASC	Vascular Lesion	142

Results — v1 vs v2

Metric	v1 (50 epochs)	v2 (80 epochs + cos_lr)
mAP@0.5	0.551	0.603
mAP@0.5:0.95	0.473	0.526
Precision	0.486	0.541
Recall	0.585	0.595

Per-class AP@0.5 (v2)

Class	AP@0.5	Change vs v1
MEL	0.546	+2.4%
NV	0.956	+0.7%
BCC	0.556	+2.5%
AKIEC	0.441	+14.8%
BKL	0.569	+0.6%
DF	0.200	~flat
VASC	0.850	+5.9%

Explainability — Grad-CAM++ Analysis

Grad-CAM++ was applied to the backbone (C3k2 layer 8) to visualize which regions the model attends to during inference.

Two learned attention strategies discovered:

1. Border Ring Detection On well-defined lesions, the model consistently focuses on the lesion perimeter rather than the center. This aligns with clinical dermoscopy criteria where border irregularity is a primary diagnostic indicator — learned without explicit supervision.

2. Multi-focal Pigment Tracking On irregular lesions, the model distributes attention across multiple pigment-dense sub-regions simultaneously, mirroring how dermatologists assess pigment distribution patterns.

Key finding:

These clinically meaningful attention patterns emerged from detection training alone — no segmentation masks, no border annotations, no explicit feature supervision. The model discovered dermoscopy-relevant features autonomously.

False negative behavior:

In low-confidence cases, Grad-CAM shows the backbone correctly localizes the lesion but detection confidence falls below threshold. This is a known limitation of single-stage detectors on small lesions — the backbone sees it, the detection head doesn't commit.

Usage

from ultralytics import YOLO

model = YOLO("raj5517/yolov11s-skin-lesion-isic2018/best_v2.pt")
results = model("dermoscopy_image.jpg")
results[0].show()

Methodology

Bounding boxes: Saliency-based lesion localization (HSV saturation + dark pixel detection via Otsu threshold)
Class imbalance: Inverse-frequency class weights (DF: 12.4×, VASC: 10.1×)
Augmentation: HSV jitter, rotation ±15°, horizontal/vertical flip, mosaic, mixup=0.05
LR Schedule: Cosine annealing (warmup 3 epochs → cosine decay)
Explainability: Grad-CAM++ with gradient² weighting on backbone layer 8

Limitations

DF (23 samples) and AKIEC (65 samples) are data-starved — performance bounded by dataset size not model capacity
Trained on dermoscopy images only — not validated on clinical photography
Saliency-based bboxes are approximate localization, not ground-truth segmentation

Downloads last month: -; Downloads are not tracked for this model. How to track