Attention U-Net ConvNeXt — Polyp Segmentation (Best Test Dice)

Binary polyp segmentation model trained on Kvasir-SEG. Highest Dice score on the test set (0.9411) among all 24 architecture × backbone combinations evaluated in the UNet-A benchmark sweep.

Model Description

Property Value
Architecture Attention U-Net (gate-based skip connections)
Backbone ConvNeXt-Tiny (ImageNet pre-trained via timm)
Input size 256 × 256 × 3
Output 256 × 256 × 1 logit map (sigmoid → binary mask)
Parameters ~133 MB
Loss BCEDice (α = 0.5)

Architecture Details

Attention U-Net (Oktay et al., 2018) augments the standard encoder-decoder with attention gates on every skip connection. The gate computes a spatial attention coefficient from the decoder query and the encoder key, suppressing activations in irrelevant background regions and focusing the model on lesion boundaries.

The ConvNeXt-Tiny backbone (pre-trained on ImageNet-1k) provides five resolution levels of feature maps. ConvNeXt's depthwise convolution design gives excellent feature quality with relatively low memory usage, making it well-suited for high-resolution segmentation.

Test Set Results

Evaluated on the fixed 53-image test partition of Kvasir-SEG (50 % of the original validation split, seed 42):

Metric Value
Dice 0.9411
IoU 0.8889
F1 0.9411
Precision 0.9618
Recall 0.9213
Accuracy 0.9803
Loss 0.0876

Sweep Leaderboard (all 24 models)

Rank Model Test Dice Test IoU
1 attention_unet_convnext (this model) 0.9411 0.8888
2 unet3plus_convnext 0.9395 0.8859
3 unet_convnext 0.9383 0.8838
4 resunet_efficientnet 0.9338 0.8759
5 unet3plus_efficientnet 0.9335 0.8753

Training Configuration

  • Optimiser: AdamW
  • Learning rate: 1e-3 (cosine decay, 5 % warmup)
  • Weight decay: 1e-4
  • Batch size: 64
  • Epochs: 50
  • FP16: enabled (A100 GPU)
  • Loss: BCEDice
  • Dataset: Kvasir-SEG augmented (4,800 train / 100 val / 100 test)
  • Augmentation: random H/V flips, ±30° rotation, brightness/contrast/saturation ±20 %

Note: This model was not subject to Optuna HPO. It is the raw sweep checkpoint. For the HPO-tuned variant see andreribeiro87/unet3plus-efficientnet-kvasir-seg.

How to Use

This model uses a custom PyTorch architecture. The model code is included in the repository.

Installation

pip install torch torchvision timm transformers

Inference

import torch
from transformers import AutoModel
from torchvision.transforms import functional as TF
from PIL import Image

# Load model — downloads weights + code automatically
model = AutoModel.from_pretrained(
    "andreribeiro87/attention-unet-convnext-kvasir-seg",
    trust_remote_code=True,
)
model.eval()

# Preprocess
image = Image.open("your_colonoscopy_image.jpg").convert("RGB")
x = TF.to_tensor(TF.resize(image, [256, 256])).unsqueeze(0)  # (1, 3, 256, 256)

# Predict
with torch.no_grad():
    outputs = model(pixel_values=x)
    mask = (outputs["logits"].sigmoid() > 0.5).squeeze()  # bool (256, 256)

pred_mask = TF.to_pil_image(mask.float())

Citation

If you use this model or dataset, please cite the original Kvasir-SEG paper:

@inproceedings{jha2020kvasir,
  title     = {Kvasir-SEG: A Segmented Polyp Dataset},
  author    = {Jha, Debesh and Smedsrud, Pia H and Riegler, Michael A and Halvorsen, P{a}l
               and de Lange, Thomas and Johansen, Dag and Johansen, H{a}vard D},
  booktitle = {MultiMedia Modeling (MMM)},
  year      = {2020}
}
@article{oktay2018attention,
  title   = {Attention U-Net: Learning Where to Look for the Pancreas},
  author  = {Oktay, Ozan and Schlemper, Jo and Folgoc, Loic Le and Lee, Matthew and Heinrich,
             Mattias and Misawa, Kazunari and Mori, Kensaku and McDonagh, Steven and
             Hammerla, Nils Y and Kainz, Bernhard and others},
  journal = {arXiv preprint arXiv:1804.03999},
  year    = {2018}
}

Limitations

  • Trained and evaluated exclusively on Kvasir-SEG (single-centre, single-modality). Performance may degrade on other colonoscopy datasets or imaging conditions.
  • Binary segmentation only; does not distinguish between polyp types or severity.
  • Input resolution is fixed at 256 × 256; very small polyps may not be fully captured.
  • Not validated for clinical use. This is a research model.
Downloads last month
26
Safetensors
Model size
34.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train andreribeiro87/attention-unet-convnext-kvasir-seg

Paper for andreribeiro87/attention-unet-convnext-kvasir-seg