U-Net for Gastrointestinal Polyp Segmentation

Architecture

Standard U-Net with:

  • Encoder: 4 levels (64, 128, 256, 512 channels), each with two Conv3x3+BN+ReLU blocks followed by MaxPool2x2
  • Bottleneck: 1024 channels at 16x16 spatial resolution
  • Decoder: 4 levels mirroring the encoder, using ConvTranspose2d for learned upsampling + skip connections via concatenation
  • Output: 1x1 Conv producing a single-channel binary mask

Input: 3x256x256 RGB image -> Output: 1x256x256 segmentation mask

Loss Function

BCE + Dice Loss -- Binary Cross-Entropy provides smooth per-pixel gradients, while Dice Loss directly optimizes mask overlap and handles class imbalance (polyps are typically small relative to background).

Training

  • 20 epochs, batch size 8, learning rate 1e-4 (Adam)
  • Trained on Kvasir-SEG dataset (gastrointestinal polyp segmentation)
  • Best checkpoint selected by validation loss

Parameters

~31M parameters

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train rmachado23/unet-kvasir-seg