U-Net β Gastrointestinal Polyp Segmentation (Kvasir-SEG)
Architecture
Standard U-Net with 4 encoder levels, a bottleneck, and 4 decoder levels.
Input (B, 3, 256, 256)
β enc1: double_conv β (B, 64, 256, 256) [skip]
β pool β enc2: double_conv β (B, 128, 128, 128) [skip]
β pool β enc3: double_conv β (B, 256, 64, 64) [skip]
β pool β enc4: double_conv β (B, 512, 32, 32) [skip]
β pool β bottleneck: double_conv β (B, 1024, 16, 16)
β up + concat(skip) β dec4 β (B, 512, 32, 32)
β up + concat(skip) β dec3 β (B, 256, 64, 64)
β up + concat(skip) β dec2 β (B, 128, 128, 128)
β up + concat(skip) β dec1 β (B, 64, 256, 256)
β Conv2d(1) β (B, 1, 256, 256) [raw logits]
- Double conv block: Conv2d β BatchNorm β ReLU β Conv2d β BatchNorm β ReLU
- Upsampling: learned ConvTranspose2d (2Γ2, stride 2)
- Skip connections: encoder feature maps concatenated to decoder at each level
- Output: raw logits β apply
sigmoidfor probabilities, threshold at 0.5 for binary mask
Loss Function
Combined BCE + Dice loss to handle class imbalance (polyps occupy a small fraction of pixels):
total_loss = BCE_with_logits(logits, labels) + dice_loss(logits, labels)
- BCE ensures per-pixel accuracy across the full image.
- Dice loss directly optimises the overlap ratio between prediction and ground truth, penalising missed polyp regions even when they represent a tiny fraction of pixels.
Dataset
Angelou0516/kvasir-seg β 800 train / 100 validation images of gastrointestinal polyps with binary segmentation masks. Images resized to 256Γ256.
Training
- Epochs: 20
- Batch size: 8
- Optimiser: AdamW (HuggingFace Trainer default)
- Framework: PyTorch + HuggingFace Transformers Trainer