Attention U-Net ConvNeXt — Polyp Segmentation (Best Test Dice)

Binary polyp segmentation model trained on Kvasir-SEG. Highest Dice score on the test set (0.9411) among all 24 architecture × backbone combinations evaluated in the UNet-A benchmark sweep.

Model Description

Property	Value
Architecture	Attention U-Net (gate-based skip connections)
Backbone	ConvNeXt-Tiny (ImageNet pre-trained via `timm`)
Input size	256 × 256 × 3
Output	256 × 256 × 1 logit map (sigmoid → binary mask)
Parameters	~133 MB
Loss	BCEDice (α = 0.5)

Architecture Details

Attention U-Net (Oktay et al., 2018) augments the standard encoder-decoder with attention gates on every skip connection. The gate computes a spatial attention coefficient from the decoder query and the encoder key, suppressing activations in irrelevant background regions and focusing the model on lesion boundaries.

The ConvNeXt-Tiny backbone (pre-trained on ImageNet-1k) provides five resolution levels of feature maps. ConvNeXt's depthwise convolution design gives excellent feature quality with relatively low memory usage, making it well-suited for high-resolution segmentation.

Test Set Results

Evaluated on the fixed 53-image test partition of Kvasir-SEG (50 % of the original validation split, seed 42):

Metric	Value
Dice	0.9411
IoU	0.8889
F1	0.9411
Precision	0.9618
Recall	0.9213
Accuracy	0.9803
Loss	0.0876

Sweep Leaderboard (all 24 models)

Rank	Model	Test Dice	Test IoU
1	attention_unet_convnext (this model)	0.9411	0.8888
2	unet3plus_convnext	0.9395	0.8859
3	unet_convnext	0.9383	0.8838
4	resunet_efficientnet	0.9338	0.8759
5	unet3plus_efficientnet	0.9335	0.8753

Training Configuration

Optimiser: AdamW
Learning rate: 1e-3 (cosine decay, 5 % warmup)
Weight decay: 1e-4
Batch size: 64
Epochs: 50
FP16: enabled (A100 GPU)
Loss: BCEDice
Dataset: Kvasir-SEG augmented (4,800 train / 100 val / 100 test)
Augmentation: random H/V flips, ±30° rotation, brightness/contrast/saturation ±20 %

Note: This model was not subject to Optuna HPO. It is the raw sweep checkpoint. For the HPO-tuned variant see andreribeiro87/unet3plus-efficientnet-kvasir-seg.

How to Use

This model uses a custom PyTorch architecture. The model code is included in the repository.

Installation

pip install torch torchvision timm transformers

Inference

import torch
from transformers import AutoModel
from torchvision.transforms import functional as TF
from PIL import Image

# Load model — downloads weights + code automatically
model = AutoModel.from_pretrained(
    "andreribeiro87/attention-unet-convnext-kvasir-seg",
    trust_remote_code=True,
)
model.eval()

# Preprocess
image = Image.open("your_colonoscopy_image.jpg").convert("RGB")
x = TF.to_tensor(TF.resize(image, [256, 256])).unsqueeze(0)  # (1, 3, 256, 256)

# Predict
with torch.no_grad():
    outputs = model(pixel_values=x)
    mask = (outputs["logits"].sigmoid() > 0.5).squeeze()  # bool (256, 256)

pred_mask = TF.to_pil_image(mask.float())

Citation

If you use this model or dataset, please cite the original Kvasir-SEG paper:

@inproceedings{jha2020kvasir,
  title     = {Kvasir-SEG: A Segmented Polyp Dataset},
  author    = {Jha, Debesh and Smedsrud, Pia H and Riegler, Michael A and Halvorsen, P{a}l
               and de Lange, Thomas and Johansen, Dag and Johansen, H{a}vard D},
  booktitle = {MultiMedia Modeling (MMM)},
  year      = {2020}
}

@article{oktay2018attention,
  title   = {Attention U-Net: Learning Where to Look for the Pancreas},
  author  = {Oktay, Ozan and Schlemper, Jo and Folgoc, Loic Le and Lee, Matthew and Heinrich,
             Mattias and Misawa, Kazunari and Mori, Kensaku and McDonagh, Steven and
             Hammerla, Nils Y and Kainz, Bernhard and others},
  journal = {arXiv preprint arXiv:1804.03999},
  year    = {2018}
}

Limitations

Trained and evaluated exclusively on Kvasir-SEG (single-centre, single-modality). Performance may degrade on other colonoscopy datasets or imaging conditions.
Binary segmentation only; does not distinguish between polyp types or severity.
Input resolution is fixed at 256 × 256; very small polyps may not be fully captured.
Not validated for clinical use. This is a research model.

Downloads last month: 26

Safetensors

Model size

34.8M params

Tensor type

F32

Dataset used to train andreribeiro87/attention-unet-convnext-kvasir-seg

Paper for andreribeiro87/attention-unet-convnext-kvasir-seg

Attention U-Net: Learning Where to Look for the Pancreas

Paper • 1804.03999 • Published Apr 11, 2018