UNet Residual ResNet34

A U-Net architecture for polyp segmentation from colonoscopy images, combining:

  • ResNet34 pretrained encoder (ImageNet weights, early layers frozen during training)
  • Residual blocks with Squeeze-and-Excitation (SE) attention in the decoder
  • ASPP (Atrous Spatial Pyramid Pooling) bottleneck for multi-scale context

Inspired by the ResUNet++ architecture.

Model description

The encoder is a ResNet34 backbone that extracts multi-scale features at 5 resolutions (H, H/2, H/4, H/8, H/16) with channel counts [64, 64, 128, 256, 512]. Channel adapters project backbone features to the decoder's expected dimensions before skip connections.

The bottleneck applies a ResidualSEBlock followed by ASPP (rates [1, 6, 12, 18]) and a projection convolution, giving the model a wide receptive field without losing spatial detail.

The decoder uses four ResidualSEBlock stages with transposed convolutions for upsampling and concatenated skip connections from the encoder. A final 1×1 convolution produces the binary segmentation logit map.

Component Detail
Input size 256×256×3
Output size 256×256×1 (logits)
Encoder ResNet34 (pretrained, ImageNet)
Bottleneck ResidualSEBlock + ASPP
Decoder blocks ResidualSEBlock + ConvTranspose2d
Parameters ~28M

Training

Trained on the Kvasir-SEG dataset with 5× random augmentation (geometric flips/rotations + brightness/contrast/saturation jitter), giving ~5 400 training samples.

Hyperparameter Value
Epochs 8
Batch size 128
Optimizer AdamW
Learning rate 1e-3 (linear decay)
Weight decay 0.01
Loss 0.5 × BCE + 0.5 × Dice

Training metrics

Epoch Train loss Val loss Val Dice Val IoU
1 0.4281 0.2964 0.8118 0.6832
2 0.2200 0.4746 0.6293 0.4591
3 0.1249 0.1628 0.8586 0.7523
4 0.0819 0.1597 0.8556 0.7477
5 0.0735 0.1330 0.8815 0.7881
6 0.0577 0.1296 0.8873 0.7975
7 0.0502 0.1217 0.8919 0.8049
8 0.0399 0.1127 0.8991 0.8167

Usage

import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

# These files are also available in the repository
from models.backbones import ResNet34Backbone
from models.residual_unet import ResidualUNet

# Load model
backbone = ResNet34Backbone(pretrained=False)
model = ResidualUNet(in_channels=3, num_classes=1, backbone=backbone)

weights_path = hf_hub_download(
    repo_id="RGarrido03/unet-residual-resnet34",
    filename="model.safetensors",
)
model.load_state_dict(load_file(weights_path), strict=False)
model.eval()

# Inference
from torchvision.transforms import Compose, Resize, ToTensor
from PIL import Image

transform = Compose([Resize((256, 256)), ToTensor()])
image = transform(Image.open("your_image.jpg").convert("RGB")).unsqueeze(0)

with torch.no_grad():
    logits = model(image)          # (1, 1, 256, 256)
    mask = (logits.sigmoid() > 0.5).squeeze()  # binary mask

Citation

If you use this model, please also cite the Kvasir-SEG dataset:

@inproceedings{Jha2020,
  title     = {Kvasir-{SEG}: A Segmented Polyp Dataset},
  author    = {Jha, Debesh and Smedsrud, Pia H. and Riegler, Michael A. and
               Halvorsen, P{\aa}l and de Lange, Thomas and Johansen, Dag and
               Johansen, H{\aa}vard D.},
  booktitle = {MultiMedia Modeling},
  year      = {2020},
}
Downloads last month
5
Safetensors
Model size
66.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train RGarrido03/unet-residual-resnet34

Paper for RGarrido03/unet-residual-resnet34

Evaluation results