UNet Residual ResNet34

A U-Net architecture for polyp segmentation from colonoscopy images, combining:

ResNet34 pretrained encoder (ImageNet weights, early layers frozen during training)
Residual blocks with Squeeze-and-Excitation (SE) attention in the decoder
ASPP (Atrous Spatial Pyramid Pooling) bottleneck for multi-scale context

Inspired by the ResUNet++ architecture.

Model description

The encoder is a ResNet34 backbone that extracts multi-scale features at 5 resolutions (H, H/2, H/4, H/8, H/16) with channel counts [64, 64, 128, 256, 512]. Channel adapters project backbone features to the decoder's expected dimensions before skip connections.

The bottleneck applies a ResidualSEBlock followed by ASPP (rates [1, 6, 12, 18]) and a projection convolution, giving the model a wide receptive field without losing spatial detail.

The decoder uses four ResidualSEBlock stages with transposed convolutions for upsampling and concatenated skip connections from the encoder. A final 1×1 convolution produces the binary segmentation logit map.

Component	Detail
Input size	256×256×3
Output size	256×256×1 (logits)
Encoder	ResNet34 (pretrained, ImageNet)
Bottleneck	ResidualSEBlock + ASPP
Decoder blocks	ResidualSEBlock + ConvTranspose2d
Parameters	~28M

Training

Trained on the Kvasir-SEG dataset with 5× random augmentation (geometric flips/rotations + brightness/contrast/saturation jitter), giving ~5 400 training samples.

Hyperparameter	Value
Epochs	8
Batch size	128
Optimizer	AdamW
Learning rate	1e-3 (linear decay)
Weight decay	0.01
Loss	0.5 × BCE + 0.5 × Dice

Training metrics

Epoch	Train loss	Val loss	Val Dice	Val IoU
1	0.4281	0.2964	0.8118	0.6832
2	0.2200	0.4746	0.6293	0.4591
3	0.1249	0.1628	0.8586	0.7523
4	0.0819	0.1597	0.8556	0.7477
5	0.0735	0.1330	0.8815	0.7881
6	0.0577	0.1296	0.8873	0.7975
7	0.0502	0.1217	0.8919	0.8049
8	0.0399	0.1127	0.8991	0.8167

Usage

import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

# These files are also available in the repository
from models.backbones import ResNet34Backbone
from models.residual_unet import ResidualUNet

# Load model
backbone = ResNet34Backbone(pretrained=False)
model = ResidualUNet(in_channels=3, num_classes=1, backbone=backbone)

weights_path = hf_hub_download(
    repo_id="RGarrido03/unet-residual-resnet34",
    filename="model.safetensors",
)
model.load_state_dict(load_file(weights_path), strict=False)
model.eval()

# Inference
from torchvision.transforms import Compose, Resize, ToTensor
from PIL import Image

transform = Compose([Resize((256, 256)), ToTensor()])
image = transform(Image.open("your_image.jpg").convert("RGB")).unsqueeze(0)

with torch.no_grad():
    logits = model(image)          # (1, 1, 256, 256)
    mask = (logits.sigmoid() > 0.5).squeeze()  # binary mask

Citation

If you use this model, please also cite the Kvasir-SEG dataset:

@inproceedings{Jha2020,
  title     = {Kvasir-{SEG}: A Segmented Polyp Dataset},
  author    = {Jha, Debesh and Smedsrud, Pia H. and Riegler, Michael A. and
               Halvorsen, P{\aa}l and de Lange, Thomas and Johansen, Dag and
               Johansen, H{\aa}vard D.},
  booktitle = {MultiMedia Modeling},
  year      = {2020},
}

Downloads last month: 5

Safetensors

Model size

66.3M params

Tensor type

F32

Dataset used to train RGarrido03/unet-residual-resnet34

Paper for RGarrido03/unet-residual-resnet34

ResUNet++: An Advanced Architecture for Medical Image Segmentation

Paper • 1911.07067 • Published Nov 16, 2019

Evaluation results

Dice Score on Kvasir-SEG (augmented)
self-reported

0.899
IoU Score on Kvasir-SEG (augmented)
self-reported

0.817