File size: 4,313 Bytes
f95d623
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c27f16b
bcde482
c27f16b
bcde482
 
 
c27f16b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f95d623
 
 
 
 
 
 
 
 
 
 
 
c27f16b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
library_name: transformers
license: mit
tags:
  - image-segmentation
  - semantic-segmentation
  - segformer
  - facade
  - cmp
  - vision
pipeline_tag: image-segmentation
datasets:
  - Xpitfire/cmp_facade
metrics:
  - mean_iou
---

# SegFormer-B0 Fine-Tuned on CMP Facade Dataset

Custom semantic segmentation model for facade parsing: wall, window, door, and balcony detection on rectified building facades.

## Model Details

- **Architecture**: SegFormer-B0 (NVIDIA, ADE20K-pretrained)
- **Parameters**: ~3.7M
- **Task**: Semantic Segmentation
- **Input Size**: 512×512
- **Classes**: 6 unified facade classes

## Class Mapping

| ID | Class | Description |
|----|-------|-------------|
| 0 | `background` | Sky, ground, non-facade regions |
| 1 | `facade_wall` | Main wall surface + moldings, cornices, pillars, sills, deco |
| 2 | `window` | Windows + blinds |
| 3 | `door` | Doors + shopfronts |
| 4 | `balcony` | Balconies |
| 5 | `vegetation_occluder` | Vegetation (trained as background since CMP lacks this class) |

## Training

- **Dataset**: [CMP Facade Database](https://huggingface.co/datasets/Xpitfire/cmp_facade) — 378 train, 114 test rectified facade images
- **Original Classes**: 12 (facade, molding, cornice, pillar, window, door, sill, blind, balcony, shop, deco, background)
- **Mapping**: 12 CMP classes → 6 unified classes (see mapping above)
- **Epochs**: ~53 (best at epoch 38, mean IoU 0.4856)
- **Optimizer**: AdamW, lr=6e-5
- **Batch Size**: 4 per device (effective batch = 8 with grad accumulation)
- **Hardware**: Tesla T4 GPU

## Best Validation Metrics

| Metric | Value |
|--------|-------|
| Mean IoU | 0.4856 |
| Facade Wall IoU | 0.867 |
| Window IoU | 0.410 |
| Door IoU | 0.460 |
| Balcony IoU | 0.230 |
| Background IoU | 0.467 |

## Usage

```python
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import torch.nn as nn
import torch

# Load model
processor = SegformerImageProcessor.from_pretrained("Marco333/segformer-b0-facade-cmp")
model = SegformerForSemanticSegmentation.from_pretrained("Marco333/segformer-b0-facade-cmp")

# Load image
image = Image.open("facade.jpg").convert("RGB")

# Inference
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Upsample to original size
upsampled = nn.functional.interpolate(
    logits, size=image.size[::-1], mode="bilinear", align_corners=False
)
pred_seg = upsampled.argmax(dim=1)[0].cpu().numpy()
```

## Intended Use

- **Primary**: Second-pass segmentation of rectified facades (after homography rectification)
- **Secondary**: First-pass facade detection on raw street photos (with expected lower accuracy due to lack of unrectified training data)

## Pipeline Role

This model is designed for use in a 2-pass facade segmentation pipeline:
1. Pass 1: Segment raw street photo → find facade wall region
2. Rectify facade via homography
3. Pass 2: Re-run this model on rectified crop → parse windows, doors, balconies cleanly

## Limitations

- Trained only on **rectified** facade images from CMP. Performance on perspective-distorted street photos will be degraded.
- No vegetation data in training set — `vegetation_occluder` class will detect as background.
- Small dataset (378 images) — performance ceiling is moderate.

## Citation

Please cite this model if you use it:

```bibtex
@misc{corbetta_segformer_facade_cmp_2026,
  author       = {Marco Corbetta},
  title        = {segformer-b0-facade-cmp: SegFormer-B0 fine-tuned on CMP Facade},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Marco333/segformer-b0-facade-cmp}}
}
```

CMP Dataset:
```bibtex
@INPROCEEDINGS{Tylecek13,
  author = {Radim Tyle{\v c}ek and Radim {\v S}{\' a}ra},
  title = {Spatial Pattern Templates for Recognition of Objects with Regular Structure},
  booktitle = {Proc. GCPR},
  year = {2013},
}
```

SegFormer:
```bibtex
@article{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal={arXiv preprint arXiv:2105.15203},
  year={2021}
}
```