UNet Residual ResNet34
A U-Net architecture for polyp segmentation from colonoscopy images, combining:
- ResNet34 pretrained encoder (ImageNet weights, early layers frozen during training)
- Residual blocks with Squeeze-and-Excitation (SE) attention in the decoder
- ASPP (Atrous Spatial Pyramid Pooling) bottleneck for multi-scale context
Inspired by the ResUNet++ architecture.
Model description
The encoder is a ResNet34 backbone that extracts multi-scale features at 5 resolutions (H, H/2, H/4, H/8, H/16) with channel counts [64, 64, 128, 256, 512]. Channel adapters project backbone features to the decoder's expected dimensions before skip connections.
The bottleneck applies a ResidualSEBlock followed by ASPP (rates [1, 6, 12, 18]) and a
projection convolution, giving the model a wide receptive field without losing spatial detail.
The decoder uses four ResidualSEBlock stages with transposed convolutions for upsampling
and concatenated skip connections from the encoder. A final 1×1 convolution produces the
binary segmentation logit map.
| Component | Detail |
|---|---|
| Input size | 256×256×3 |
| Output size | 256×256×1 (logits) |
| Encoder | ResNet34 (pretrained, ImageNet) |
| Bottleneck | ResidualSEBlock + ASPP |
| Decoder blocks | ResidualSEBlock + ConvTranspose2d |
| Parameters | ~28M |
Training
Trained on the Kvasir-SEG dataset with 5× random augmentation (geometric flips/rotations + brightness/contrast/saturation jitter), giving ~5 400 training samples.
| Hyperparameter | Value |
|---|---|
| Epochs | 8 |
| Batch size | 128 |
| Optimizer | AdamW |
| Learning rate | 1e-3 (linear decay) |
| Weight decay | 0.01 |
| Loss | 0.5 × BCE + 0.5 × Dice |
Training metrics
| Epoch | Train loss | Val loss | Val Dice | Val IoU |
|---|---|---|---|---|
| 1 | 0.4281 | 0.2964 | 0.8118 | 0.6832 |
| 2 | 0.2200 | 0.4746 | 0.6293 | 0.4591 |
| 3 | 0.1249 | 0.1628 | 0.8586 | 0.7523 |
| 4 | 0.0819 | 0.1597 | 0.8556 | 0.7477 |
| 5 | 0.0735 | 0.1330 | 0.8815 | 0.7881 |
| 6 | 0.0577 | 0.1296 | 0.8873 | 0.7975 |
| 7 | 0.0502 | 0.1217 | 0.8919 | 0.8049 |
| 8 | 0.0399 | 0.1127 | 0.8991 | 0.8167 |
Usage
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
# These files are also available in the repository
from models.backbones import ResNet34Backbone
from models.residual_unet import ResidualUNet
# Load model
backbone = ResNet34Backbone(pretrained=False)
model = ResidualUNet(in_channels=3, num_classes=1, backbone=backbone)
weights_path = hf_hub_download(
repo_id="RGarrido03/unet-residual-resnet34",
filename="model.safetensors",
)
model.load_state_dict(load_file(weights_path), strict=False)
model.eval()
# Inference
from torchvision.transforms import Compose, Resize, ToTensor
from PIL import Image
transform = Compose([Resize((256, 256)), ToTensor()])
image = transform(Image.open("your_image.jpg").convert("RGB")).unsqueeze(0)
with torch.no_grad():
logits = model(image) # (1, 1, 256, 256)
mask = (logits.sigmoid() > 0.5).squeeze() # binary mask
Citation
If you use this model, please also cite the Kvasir-SEG dataset:
@inproceedings{Jha2020,
title = {Kvasir-{SEG}: A Segmented Polyp Dataset},
author = {Jha, Debesh and Smedsrud, Pia H. and Riegler, Michael A. and
Halvorsen, P{\aa}l and de Lange, Thomas and Johansen, Dag and
Johansen, H{\aa}vard D.},
booktitle = {MultiMedia Modeling},
year = {2020},
}
- Downloads last month
- 5
Dataset used to train RGarrido03/unet-residual-resnet34
Paper for RGarrido03/unet-residual-resnet34
Evaluation results
- Dice Score on Kvasir-SEG (augmented)self-reported0.899
- IoU Score on Kvasir-SEG (augmented)self-reported0.817