U-Net — Gastrointestinal Polyp Segmentation (Kvasir-SEG)

Architecture

Standard U-Net with 4 encoder levels, a bottleneck, and 4 decoder levels.

Input (B, 3, 256, 256)
  → enc1: double_conv → (B, 64, 256, 256)   [skip]
  → pool → enc2: double_conv → (B, 128, 128, 128) [skip]
  → pool → enc3: double_conv → (B, 256, 64, 64)   [skip]
  → pool → enc4: double_conv → (B, 512, 32, 32)   [skip]
  → pool → bottleneck: double_conv → (B, 1024, 16, 16)
  → up + concat(skip) → dec4 → (B, 512, 32, 32)
  → up + concat(skip) → dec3 → (B, 256, 64, 64)
  → up + concat(skip) → dec2 → (B, 128, 128, 128)
  → up + concat(skip) → dec1 → (B, 64, 256, 256)
  → Conv2d(1) → (B, 1, 256, 256)  [raw logits]

Double conv block: Conv2d → BatchNorm → ReLU → Conv2d → BatchNorm → ReLU
Upsampling: learned ConvTranspose2d (2×2, stride 2)
Skip connections: encoder feature maps concatenated to decoder at each level
Output: raw logits — apply sigmoid for probabilities, threshold at 0.5 for binary mask

Loss Function

Combined BCE + Dice loss to handle class imbalance (polyps occupy a small fraction of pixels):

total_loss = BCE_with_logits(logits, labels) + dice_loss(logits, labels)

BCE ensures per-pixel accuracy across the full image.
Dice loss directly optimises the overlap ratio between prediction and ground truth, penalising missed polyp regions even when they represent a tiny fraction of pixels.

Dataset

Angelou0516/kvasir-seg — 800 train / 100 validation images of gastrointestinal polyps with binary segmentation masks. Images resized to 256×256.

Training

Epochs: 20
Batch size: 8
Optimiser: AdamW (HuggingFace Trainer default)
Framework: PyTorch + HuggingFace Transformers Trainer

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

31.1M params

Tensor type

F32

henriqueft04
/

unet-kvasir-seg

U-Net — Gastrointestinal Polyp Segmentation (Kvasir-SEG)

Architecture

Loss Function

Dataset

Training

Dataset used to train henriqueft04/unet-kvasir-seg