Attention U-Net ConvNeXt — Polyp Segmentation (Best Test Dice)
Binary polyp segmentation model trained on Kvasir-SEG. Highest Dice score on the test set (0.9411) among all 24 architecture × backbone combinations evaluated in the UNet-A benchmark sweep.
Model Description
| Property | Value |
|---|---|
| Architecture | Attention U-Net (gate-based skip connections) |
| Backbone | ConvNeXt-Tiny (ImageNet pre-trained via timm) |
| Input size | 256 × 256 × 3 |
| Output | 256 × 256 × 1 logit map (sigmoid → binary mask) |
| Parameters | ~133 MB |
| Loss | BCEDice (α = 0.5) |
Architecture Details
Attention U-Net (Oktay et al., 2018) augments the standard encoder-decoder with attention gates on every skip connection. The gate computes a spatial attention coefficient from the decoder query and the encoder key, suppressing activations in irrelevant background regions and focusing the model on lesion boundaries.
The ConvNeXt-Tiny backbone (pre-trained on ImageNet-1k) provides five resolution levels of feature maps. ConvNeXt's depthwise convolution design gives excellent feature quality with relatively low memory usage, making it well-suited for high-resolution segmentation.
Test Set Results
Evaluated on the fixed 53-image test partition of Kvasir-SEG (50 % of the original validation split, seed 42):
| Metric | Value |
|---|---|
| Dice | 0.9411 |
| IoU | 0.8889 |
| F1 | 0.9411 |
| Precision | 0.9618 |
| Recall | 0.9213 |
| Accuracy | 0.9803 |
| Loss | 0.0876 |
Sweep Leaderboard (all 24 models)
| Rank | Model | Test Dice | Test IoU |
|---|---|---|---|
| 1 | attention_unet_convnext (this model) | 0.9411 | 0.8888 |
| 2 | unet3plus_convnext | 0.9395 | 0.8859 |
| 3 | unet_convnext | 0.9383 | 0.8838 |
| 4 | resunet_efficientnet | 0.9338 | 0.8759 |
| 5 | unet3plus_efficientnet | 0.9335 | 0.8753 |
Training Configuration
- Optimiser: AdamW
- Learning rate: 1e-3 (cosine decay, 5 % warmup)
- Weight decay: 1e-4
- Batch size: 64
- Epochs: 50
- FP16: enabled (A100 GPU)
- Loss: BCEDice
- Dataset: Kvasir-SEG augmented (4,800 train / 100 val / 100 test)
- Augmentation: random H/V flips, ±30° rotation, brightness/contrast/saturation ±20 %
Note: This model was not subject to Optuna HPO. It is the raw sweep checkpoint. For the HPO-tuned variant see andreribeiro87/unet3plus-efficientnet-kvasir-seg.
How to Use
This model uses a custom PyTorch architecture. The model code is included in the repository.
Installation
pip install torch torchvision timm transformers
Inference
import torch
from transformers import AutoModel
from torchvision.transforms import functional as TF
from PIL import Image
# Load model — downloads weights + code automatically
model = AutoModel.from_pretrained(
"andreribeiro87/attention-unet-convnext-kvasir-seg",
trust_remote_code=True,
)
model.eval()
# Preprocess
image = Image.open("your_colonoscopy_image.jpg").convert("RGB")
x = TF.to_tensor(TF.resize(image, [256, 256])).unsqueeze(0) # (1, 3, 256, 256)
# Predict
with torch.no_grad():
outputs = model(pixel_values=x)
mask = (outputs["logits"].sigmoid() > 0.5).squeeze() # bool (256, 256)
pred_mask = TF.to_pil_image(mask.float())
Citation
If you use this model or dataset, please cite the original Kvasir-SEG paper:
@inproceedings{jha2020kvasir,
title = {Kvasir-SEG: A Segmented Polyp Dataset},
author = {Jha, Debesh and Smedsrud, Pia H and Riegler, Michael A and Halvorsen, P{a}l
and de Lange, Thomas and Johansen, Dag and Johansen, H{a}vard D},
booktitle = {MultiMedia Modeling (MMM)},
year = {2020}
}
@article{oktay2018attention,
title = {Attention U-Net: Learning Where to Look for the Pancreas},
author = {Oktay, Ozan and Schlemper, Jo and Folgoc, Loic Le and Lee, Matthew and Heinrich,
Mattias and Misawa, Kazunari and Mori, Kensaku and McDonagh, Steven and
Hammerla, Nils Y and Kainz, Bernhard and others},
journal = {arXiv preprint arXiv:1804.03999},
year = {2018}
}
Limitations
- Trained and evaluated exclusively on Kvasir-SEG (single-centre, single-modality). Performance may degrade on other colonoscopy datasets or imaging conditions.
- Binary segmentation only; does not distinguish between polyp types or severity.
- Input resolution is fixed at 256 × 256; very small polyps may not be fully captured.
- Not validated for clinical use. This is a research model.
- Downloads last month
- 26