U-Net for Gastrointestinal Polyp Segmentation
Architecture
Standard U-Net with:
- Encoder: 4 levels (64, 128, 256, 512 channels), each with two Conv3x3+BN+ReLU blocks followed by MaxPool2x2
- Bottleneck: 1024 channels at 16x16 spatial resolution
- Decoder: 4 levels mirroring the encoder, using ConvTranspose2d for learned upsampling + skip connections via concatenation
- Output: 1x1 Conv producing a single-channel binary mask
Input: 3x256x256 RGB image -> Output: 1x256x256 segmentation mask
Loss Function
BCE + Dice Loss -- Binary Cross-Entropy provides smooth per-pixel gradients, while Dice Loss directly optimizes mask overlap and handles class imbalance (polyps are typically small relative to background).
Training
- 20 epochs, batch size 8, learning rate 1e-4 (Adam)
- Trained on Kvasir-SEG dataset (gastrointestinal polyp segmentation)
- Best checkpoint selected by validation loss
Parameters
~31M parameters