metadata
library_name: transformers
license: mit
tags:
- image-segmentation
- semantic-segmentation
- segformer
- facade
- cmp
- vision
pipeline_tag: image-segmentation
datasets:
- Xpitfire/cmp_facade
metrics:
- mean_iou
SegFormer-B0 Fine-Tuned on CMP Facade Dataset
Custom semantic segmentation model for facade parsing: wall, window, door, and balcony detection on rectified building facades.
Model Details
- Architecture: SegFormer-B0 (NVIDIA, ADE20K-pretrained)
- Parameters: ~3.7M
- Task: Semantic Segmentation
- Input Size: 512×512
- Classes: 6 unified facade classes
Class Mapping
| ID | Class | Description |
|---|---|---|
| 0 | background |
Sky, ground, non-facade regions |
| 1 | facade_wall |
Main wall surface + moldings, cornices, pillars, sills, deco |
| 2 | window |
Windows + blinds |
| 3 | door |
Doors + shopfronts |
| 4 | balcony |
Balconies |
| 5 | vegetation_occluder |
Vegetation (trained as background since CMP lacks this class) |
Training
- Dataset: CMP Facade Database — 378 train, 114 test rectified facade images
- Original Classes: 12 (facade, molding, cornice, pillar, window, door, sill, blind, balcony, shop, deco, background)
- Mapping: 12 CMP classes → 6 unified classes (see mapping above)
- Epochs: ~53 (best at epoch 38, mean IoU 0.4856)
- Optimizer: AdamW, lr=6e-5
- Batch Size: 4 per device (effective batch = 8 with grad accumulation)
- Hardware: Tesla T4 GPU
Best Validation Metrics
| Metric | Value |
|---|---|
| Mean IoU | 0.4856 |
| Facade Wall IoU | 0.867 |
| Window IoU | 0.410 |
| Door IoU | 0.460 |
| Balcony IoU | 0.230 |
| Background IoU | 0.467 |
Usage
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import torch.nn as nn
import torch
# Load model
processor = SegformerImageProcessor.from_pretrained("Marco333/segformer-b0-facade-cmp")
model = SegformerForSemanticSegmentation.from_pretrained("Marco333/segformer-b0-facade-cmp")
# Load image
image = Image.open("facade.jpg").convert("RGB")
# Inference
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
# Upsample to original size
upsampled = nn.functional.interpolate(
logits, size=image.size[::-1], mode="bilinear", align_corners=False
)
pred_seg = upsampled.argmax(dim=1)[0].cpu().numpy()
Intended Use
- Primary: Second-pass segmentation of rectified facades (after homography rectification)
- Secondary: First-pass facade detection on raw street photos (with expected lower accuracy due to lack of unrectified training data)
Pipeline Role
This model is designed for use in a 2-pass facade segmentation pipeline:
- Pass 1: Segment raw street photo → find facade wall region
- Rectify facade via homography
- Pass 2: Re-run this model on rectified crop → parse windows, doors, balconies cleanly
Limitations
- Trained only on rectified facade images from CMP. Performance on perspective-distorted street photos will be degraded.
- No vegetation data in training set —
vegetation_occluderclass will detect as background. - Small dataset (378 images) — performance ceiling is moderate.
Citation
Please cite this model if you use it:
@misc{corbetta_segformer_facade_cmp_2026,
author = {Marco Corbetta},
title = {segformer-b0-facade-cmp: SegFormer-B0 fine-tuned on CMP Facade},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Marco333/segformer-b0-facade-cmp}}
}
CMP Dataset:
@INPROCEEDINGS{Tylecek13,
author = {Radim Tyle{\v c}ek and Radim {\v S}{\' a}ra},
title = {Spatial Pattern Templates for Recognition of Objects with Regular Structure},
booktitle = {Proc. GCPR},
year = {2013},
}
SegFormer:
@article{xie2021segformer,
title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
journal={arXiv preprint arXiv:2105.15203},
year={2021}
}