marco's picture
Add author citation, set MIT license
f95d623
---
library_name: transformers
license: mit
tags:
- image-segmentation
- semantic-segmentation
- segformer
- facade
- cmp
- vision
pipeline_tag: image-segmentation
datasets:
- Xpitfire/cmp_facade
metrics:
- mean_iou
---
# SegFormer-B0 Fine-Tuned on CMP Facade Dataset
Custom semantic segmentation model for facade parsing: wall, window, door, and balcony detection on rectified building facades.
## Model Details
- **Architecture**: SegFormer-B0 (NVIDIA, ADE20K-pretrained)
- **Parameters**: ~3.7M
- **Task**: Semantic Segmentation
- **Input Size**: 512×512
- **Classes**: 6 unified facade classes
## Class Mapping
| ID | Class | Description |
|----|-------|-------------|
| 0 | `background` | Sky, ground, non-facade regions |
| 1 | `facade_wall` | Main wall surface + moldings, cornices, pillars, sills, deco |
| 2 | `window` | Windows + blinds |
| 3 | `door` | Doors + shopfronts |
| 4 | `balcony` | Balconies |
| 5 | `vegetation_occluder` | Vegetation (trained as background since CMP lacks this class) |
## Training
- **Dataset**: [CMP Facade Database](https://huggingface.co/datasets/Xpitfire/cmp_facade) — 378 train, 114 test rectified facade images
- **Original Classes**: 12 (facade, molding, cornice, pillar, window, door, sill, blind, balcony, shop, deco, background)
- **Mapping**: 12 CMP classes → 6 unified classes (see mapping above)
- **Epochs**: ~53 (best at epoch 38, mean IoU 0.4856)
- **Optimizer**: AdamW, lr=6e-5
- **Batch Size**: 4 per device (effective batch = 8 with grad accumulation)
- **Hardware**: Tesla T4 GPU
## Best Validation Metrics
| Metric | Value |
|--------|-------|
| Mean IoU | 0.4856 |
| Facade Wall IoU | 0.867 |
| Window IoU | 0.410 |
| Door IoU | 0.460 |
| Balcony IoU | 0.230 |
| Background IoU | 0.467 |
## Usage
```python
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import torch.nn as nn
import torch
# Load model
processor = SegformerImageProcessor.from_pretrained("Marco333/segformer-b0-facade-cmp")
model = SegformerForSemanticSegmentation.from_pretrained("Marco333/segformer-b0-facade-cmp")
# Load image
image = Image.open("facade.jpg").convert("RGB")
# Inference
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
# Upsample to original size
upsampled = nn.functional.interpolate(
logits, size=image.size[::-1], mode="bilinear", align_corners=False
)
pred_seg = upsampled.argmax(dim=1)[0].cpu().numpy()
```
## Intended Use
- **Primary**: Second-pass segmentation of rectified facades (after homography rectification)
- **Secondary**: First-pass facade detection on raw street photos (with expected lower accuracy due to lack of unrectified training data)
## Pipeline Role
This model is designed for use in a 2-pass facade segmentation pipeline:
1. Pass 1: Segment raw street photo → find facade wall region
2. Rectify facade via homography
3. Pass 2: Re-run this model on rectified crop → parse windows, doors, balconies cleanly
## Limitations
- Trained only on **rectified** facade images from CMP. Performance on perspective-distorted street photos will be degraded.
- No vegetation data in training set — `vegetation_occluder` class will detect as background.
- Small dataset (378 images) — performance ceiling is moderate.
## Citation
Please cite this model if you use it:
```bibtex
@misc{corbetta_segformer_facade_cmp_2026,
author = {Marco Corbetta},
title = {segformer-b0-facade-cmp: SegFormer-B0 fine-tuned on CMP Facade},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Marco333/segformer-b0-facade-cmp}}
}
```
CMP Dataset:
```bibtex
@INPROCEEDINGS{Tylecek13,
author = {Radim Tyle{\v c}ek and Radim {\v S}{\' a}ra},
title = {Spatial Pattern Templates for Recognition of Objects with Regular Structure},
booktitle = {Proc. GCPR},
year = {2013},
}
```
SegFormer:
```bibtex
@article{xie2021segformer,
title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
journal={arXiv preprint arXiv:2105.15203},
year={2021}
}
```