File size: 4,313 Bytes
f95d623 c27f16b bcde482 c27f16b bcde482 c27f16b f95d623 c27f16b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ---
library_name: transformers
license: mit
tags:
- image-segmentation
- semantic-segmentation
- segformer
- facade
- cmp
- vision
pipeline_tag: image-segmentation
datasets:
- Xpitfire/cmp_facade
metrics:
- mean_iou
---
# SegFormer-B0 Fine-Tuned on CMP Facade Dataset
Custom semantic segmentation model for facade parsing: wall, window, door, and balcony detection on rectified building facades.
## Model Details
- **Architecture**: SegFormer-B0 (NVIDIA, ADE20K-pretrained)
- **Parameters**: ~3.7M
- **Task**: Semantic Segmentation
- **Input Size**: 512×512
- **Classes**: 6 unified facade classes
## Class Mapping
| ID | Class | Description |
|----|-------|-------------|
| 0 | `background` | Sky, ground, non-facade regions |
| 1 | `facade_wall` | Main wall surface + moldings, cornices, pillars, sills, deco |
| 2 | `window` | Windows + blinds |
| 3 | `door` | Doors + shopfronts |
| 4 | `balcony` | Balconies |
| 5 | `vegetation_occluder` | Vegetation (trained as background since CMP lacks this class) |
## Training
- **Dataset**: [CMP Facade Database](https://huggingface.co/datasets/Xpitfire/cmp_facade) — 378 train, 114 test rectified facade images
- **Original Classes**: 12 (facade, molding, cornice, pillar, window, door, sill, blind, balcony, shop, deco, background)
- **Mapping**: 12 CMP classes → 6 unified classes (see mapping above)
- **Epochs**: ~53 (best at epoch 38, mean IoU 0.4856)
- **Optimizer**: AdamW, lr=6e-5
- **Batch Size**: 4 per device (effective batch = 8 with grad accumulation)
- **Hardware**: Tesla T4 GPU
## Best Validation Metrics
| Metric | Value |
|--------|-------|
| Mean IoU | 0.4856 |
| Facade Wall IoU | 0.867 |
| Window IoU | 0.410 |
| Door IoU | 0.460 |
| Balcony IoU | 0.230 |
| Background IoU | 0.467 |
## Usage
```python
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import torch.nn as nn
import torch
# Load model
processor = SegformerImageProcessor.from_pretrained("Marco333/segformer-b0-facade-cmp")
model = SegformerForSemanticSegmentation.from_pretrained("Marco333/segformer-b0-facade-cmp")
# Load image
image = Image.open("facade.jpg").convert("RGB")
# Inference
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
# Upsample to original size
upsampled = nn.functional.interpolate(
logits, size=image.size[::-1], mode="bilinear", align_corners=False
)
pred_seg = upsampled.argmax(dim=1)[0].cpu().numpy()
```
## Intended Use
- **Primary**: Second-pass segmentation of rectified facades (after homography rectification)
- **Secondary**: First-pass facade detection on raw street photos (with expected lower accuracy due to lack of unrectified training data)
## Pipeline Role
This model is designed for use in a 2-pass facade segmentation pipeline:
1. Pass 1: Segment raw street photo → find facade wall region
2. Rectify facade via homography
3. Pass 2: Re-run this model on rectified crop → parse windows, doors, balconies cleanly
## Limitations
- Trained only on **rectified** facade images from CMP. Performance on perspective-distorted street photos will be degraded.
- No vegetation data in training set — `vegetation_occluder` class will detect as background.
- Small dataset (378 images) — performance ceiling is moderate.
## Citation
Please cite this model if you use it:
```bibtex
@misc{corbetta_segformer_facade_cmp_2026,
author = {Marco Corbetta},
title = {segformer-b0-facade-cmp: SegFormer-B0 fine-tuned on CMP Facade},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Marco333/segformer-b0-facade-cmp}}
}
```
CMP Dataset:
```bibtex
@INPROCEEDINGS{Tylecek13,
author = {Radim Tyle{\v c}ek and Radim {\v S}{\' a}ra},
title = {Spatial Pattern Templates for Recognition of Objects with Regular Structure},
booktitle = {Proc. GCPR},
year = {2013},
}
```
SegFormer:
```bibtex
@article{xie2021segformer,
title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
journal={arXiv preprint arXiv:2105.15203},
year={2021}
}
```
|