YOLOv11s Skin Lesion Detection β ISIC 2018
Fine-tuned YOLOv11s on ISIC 2018 Task 3 for dermoscopic skin lesion detection and classification, with Grad-CAM++ explainability analysis.
Model Details
- Architecture: YOLOv11s (9.4M parameters, 21.6 GFLOPs)
- Dataset: ISIC 2018 Task 3 β 10,015 dermoscopy images
- Classes: 7 skin conditions
- Training: 80 epochs, AdamW + Cosine LR, Tesla T4 GPU
- Input size: 640Γ640
Files
| File | Description |
|---|---|
best_v2.pt |
Best model β 80 epochs, cos_lr (recommended) |
best.pt |
v1 baseline β 50 epochs |
results_v2.png |
Training curves v2 |
Classes
| ID | Class | Full Name | Train Samples |
|---|---|---|---|
| 0 | MEL | Melanoma | 1113 |
| 1 | NV | Melanocytic Nevus | 6705 |
| 2 | BCC | Basal Cell Carcinoma | 514 |
| 3 | AKIEC | Actinic Keratosis / Intraepithelial Carcinoma | 327 |
| 4 | BKL | Benign Keratosis | 1099 |
| 5 | DF | Dermatofibroma | 115 |
| 6 | VASC | Vascular Lesion | 142 |
Results β v1 vs v2
| Metric | v1 (50 epochs) | v2 (80 epochs + cos_lr) |
|---|---|---|
| mAP@0.5 | 0.551 | 0.603 |
| mAP@0.5:0.95 | 0.473 | 0.526 |
| Precision | 0.486 | 0.541 |
| Recall | 0.585 | 0.595 |
Per-class AP@0.5 (v2)
| Class | AP@0.5 | Change vs v1 |
|---|---|---|
| MEL | 0.546 | +2.4% |
| NV | 0.956 | +0.7% |
| BCC | 0.556 | +2.5% |
| AKIEC | 0.441 | +14.8% |
| BKL | 0.569 | +0.6% |
| DF | 0.200 | ~flat |
| VASC | 0.850 | +5.9% |
Explainability β Grad-CAM++ Analysis
Grad-CAM++ was applied to the backbone (C3k2 layer 8) to visualize which regions the model attends to during inference.
Two learned attention strategies discovered:
1. Border Ring Detection On well-defined lesions, the model consistently focuses on the lesion perimeter rather than the center. This aligns with clinical dermoscopy criteria where border irregularity is a primary diagnostic indicator β learned without explicit supervision.
2. Multi-focal Pigment Tracking On irregular lesions, the model distributes attention across multiple pigment-dense sub-regions simultaneously, mirroring how dermatologists assess pigment distribution patterns.
Key finding:
These clinically meaningful attention patterns emerged from detection training alone β no segmentation masks, no border annotations, no explicit feature supervision. The model discovered dermoscopy-relevant features autonomously.
False negative behavior:
In low-confidence cases, Grad-CAM shows the backbone correctly localizes the lesion but detection confidence falls below threshold. This is a known limitation of single-stage detectors on small lesions β the backbone sees it, the detection head doesn't commit.
Usage
from ultralytics import YOLO
model = YOLO("raj5517/yolov11s-skin-lesion-isic2018/best_v2.pt")
results = model("dermoscopy_image.jpg")
results[0].show()
Methodology
- Bounding boxes: Saliency-based lesion localization (HSV saturation + dark pixel detection via Otsu threshold)
- Class imbalance: Inverse-frequency class weights (DF: 12.4Γ, VASC: 10.1Γ)
- Augmentation: HSV jitter, rotation Β±15Β°, horizontal/vertical flip, mosaic, mixup=0.05
- LR Schedule: Cosine annealing (warmup 3 epochs β cosine decay)
- Explainability: Grad-CAM++ with gradientΒ² weighting on backbone layer 8
Limitations
- DF (23 samples) and AKIEC (65 samples) are data-starved β performance bounded by dataset size not model capacity
- Trained on dermoscopy images only β not validated on clinical photography
- Saliency-based bboxes are approximate localization, not ground-truth segmentation