File size: 4,313 Bytes

---
library_name: transformers
license: mit
tags:
  - image-segmentation
  - semantic-segmentation
  - segformer
  - facade
  - cmp
  - vision
pipeline_tag: image-segmentation
datasets:
  - Xpitfire/cmp_facade
metrics:
  - mean_iou
---

# SegFormer-B0 Fine-Tuned on CMP Facade Dataset

Custom semantic segmentation model for facade parsing: wall, window, door, and balcony detection on rectified building facades.

## Model Details

- **Architecture**: SegFormer-B0 (NVIDIA, ADE20K-pretrained)
- **Parameters**: ~3.7M
- **Task**: Semantic Segmentation
- **Input Size**: 512×512
- **Classes**: 6 unified facade classes

## Class Mapping

| ID | Class | Description |
|----|-------|-------------|
| 0 | `background` | Sky, ground, non-facade regions |
| 1 | `facade_wall` | Main wall surface + moldings, cornices, pillars, sills, deco |
| 2 | `window` | Windows + blinds |
| 3 | `door` | Doors + shopfronts |
| 4 | `balcony` | Balconies |
| 5 | `vegetation_occluder` | Vegetation (trained as background since CMP lacks this class) |

## Training

- **Dataset**: [CMP Facade Database](https://huggingface.co/datasets/Xpitfire/cmp_facade) — 378 train, 114 test rectified facade images
- **Original Classes**: 12 (facade, molding, cornice, pillar, window, door, sill, blind, balcony, shop, deco, background)
- **Mapping**: 12 CMP classes → 6 unified classes (see mapping above)
- **Epochs**: ~53 (best at epoch 38, mean IoU 0.4856)
- **Optimizer**: AdamW, lr=6e-5
- **Batch Size**: 4 per device (effective batch = 8 with grad accumulation)
- **Hardware**: Tesla T4 GPU

## Best Validation Metrics

| Metric | Value |
|--------|-------|
| Mean IoU | 0.4856 |
| Facade Wall IoU | 0.867 |
| Window IoU | 0.410 |
| Door IoU | 0.460 |
| Balcony IoU | 0.230 |
| Background IoU | 0.467 |

## Usage

```python
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import torch.nn as nn
import torch

# Load model
processor = SegformerImageProcessor.from_pretrained("Marco333/segformer-b0-facade-cmp")
model = SegformerForSemanticSegmentation.from_pretrained("Marco333/segformer-b0-facade-cmp")

# Load image
image = Image.open("facade.jpg").convert("RGB")

# Inference
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Upsample to original size
upsampled = nn.functional.interpolate(
    logits, size=image.size[::-1], mode="bilinear", align_corners=False
)
pred_seg = upsampled.argmax(dim=1)[0].cpu().numpy()
```

## Intended Use

- **Primary**: Second-pass segmentation of rectified facades (after homography rectification)
- **Secondary**: First-pass facade detection on raw street photos (with expected lower accuracy due to lack of unrectified training data)

## Pipeline Role

This model is designed for use in a 2-pass facade segmentation pipeline:
1. Pass 1: Segment raw street photo → find facade wall region
2. Rectify facade via homography
3. Pass 2: Re-run this model on rectified crop → parse windows, doors, balconies cleanly

## Limitations

- Trained only on **rectified** facade images from CMP. Performance on perspective-distorted street photos will be degraded.
- No vegetation data in training set — `vegetation_occluder` class will detect as background.
- Small dataset (378 images) — performance ceiling is moderate.

## Citation

Please cite this model if you use it:

```bibtex
@misc{corbetta_segformer_facade_cmp_2026,
  author       = {Marco Corbetta},
  title        = {segformer-b0-facade-cmp: SegFormer-B0 fine-tuned on CMP Facade},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Marco333/segformer-b0-facade-cmp}}
}
```

CMP Dataset:
```bibtex
@INPROCEEDINGS{Tylecek13,
  author = {Radim Tyle{\v c}ek and Radim {\v S}{\' a}ra},
  title = {Spatial Pattern Templates for Recognition of Objects with Regular Structure},
  booktitle = {Proc. GCPR},
  year = {2013},
}
```

SegFormer:
```bibtex
@article{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal={arXiv preprint arXiv:2105.15203},
  year={2021}
}
```