| --- |
| library_name: transformers |
| license: mit |
| tags: |
| - image-segmentation |
| - semantic-segmentation |
| - segformer |
| - facade |
| - cmp |
| - vision |
| pipeline_tag: image-segmentation |
| datasets: |
| - Xpitfire/cmp_facade |
| metrics: |
| - mean_iou |
| --- |
| |
| # SegFormer-B0 Fine-Tuned on CMP Facade Dataset |
|
|
| Custom semantic segmentation model for facade parsing: wall, window, door, and balcony detection on rectified building facades. |
|
|
| ## Model Details |
|
|
| - **Architecture**: SegFormer-B0 (NVIDIA, ADE20K-pretrained) |
| - **Parameters**: ~3.7M |
| - **Task**: Semantic Segmentation |
| - **Input Size**: 512×512 |
| - **Classes**: 6 unified facade classes |
|
|
| ## Class Mapping |
|
|
| | ID | Class | Description | |
| |----|-------|-------------| |
| | 0 | `background` | Sky, ground, non-facade regions | |
| | 1 | `facade_wall` | Main wall surface + moldings, cornices, pillars, sills, deco | |
| | 2 | `window` | Windows + blinds | |
| | 3 | `door` | Doors + shopfronts | |
| | 4 | `balcony` | Balconies | |
| | 5 | `vegetation_occluder` | Vegetation (trained as background since CMP lacks this class) | |
|
|
| ## Training |
|
|
| - **Dataset**: [CMP Facade Database](https://huggingface.co/datasets/Xpitfire/cmp_facade) — 378 train, 114 test rectified facade images |
| - **Original Classes**: 12 (facade, molding, cornice, pillar, window, door, sill, blind, balcony, shop, deco, background) |
| - **Mapping**: 12 CMP classes → 6 unified classes (see mapping above) |
| - **Epochs**: ~53 (best at epoch 38, mean IoU 0.4856) |
| - **Optimizer**: AdamW, lr=6e-5 |
| - **Batch Size**: 4 per device (effective batch = 8 with grad accumulation) |
| - **Hardware**: Tesla T4 GPU |
|
|
| ## Best Validation Metrics |
|
|
| | Metric | Value | |
| |--------|-------| |
| | Mean IoU | 0.4856 | |
| | Facade Wall IoU | 0.867 | |
| | Window IoU | 0.410 | |
| | Door IoU | 0.460 | |
| | Balcony IoU | 0.230 | |
| | Background IoU | 0.467 | |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation |
| from PIL import Image |
| import torch.nn as nn |
| import torch |
| |
| # Load model |
| processor = SegformerImageProcessor.from_pretrained("Marco333/segformer-b0-facade-cmp") |
| model = SegformerForSemanticSegmentation.from_pretrained("Marco333/segformer-b0-facade-cmp") |
| |
| # Load image |
| image = Image.open("facade.jpg").convert("RGB") |
| |
| # Inference |
| inputs = processor(images=image, return_tensors="pt") |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| logits = outputs.logits |
| |
| # Upsample to original size |
| upsampled = nn.functional.interpolate( |
| logits, size=image.size[::-1], mode="bilinear", align_corners=False |
| ) |
| pred_seg = upsampled.argmax(dim=1)[0].cpu().numpy() |
| ``` |
|
|
| ## Intended Use |
|
|
| - **Primary**: Second-pass segmentation of rectified facades (after homography rectification) |
| - **Secondary**: First-pass facade detection on raw street photos (with expected lower accuracy due to lack of unrectified training data) |
|
|
| ## Pipeline Role |
|
|
| This model is designed for use in a 2-pass facade segmentation pipeline: |
| 1. Pass 1: Segment raw street photo → find facade wall region |
| 2. Rectify facade via homography |
| 3. Pass 2: Re-run this model on rectified crop → parse windows, doors, balconies cleanly |
|
|
| ## Limitations |
|
|
| - Trained only on **rectified** facade images from CMP. Performance on perspective-distorted street photos will be degraded. |
| - No vegetation data in training set — `vegetation_occluder` class will detect as background. |
| - Small dataset (378 images) — performance ceiling is moderate. |
|
|
| ## Citation |
|
|
| Please cite this model if you use it: |
|
|
| ```bibtex |
| @misc{corbetta_segformer_facade_cmp_2026, |
| author = {Marco Corbetta}, |
| title = {segformer-b0-facade-cmp: SegFormer-B0 fine-tuned on CMP Facade}, |
| year = {2026}, |
| publisher = {Hugging Face}, |
| howpublished = {\url{https://huggingface.co/Marco333/segformer-b0-facade-cmp}} |
| } |
| ``` |
|
|
| CMP Dataset: |
| ```bibtex |
| @INPROCEEDINGS{Tylecek13, |
| author = {Radim Tyle{\v c}ek and Radim {\v S}{\' a}ra}, |
| title = {Spatial Pattern Templates for Recognition of Objects with Regular Structure}, |
| booktitle = {Proc. GCPR}, |
| year = {2013}, |
| } |
| ``` |
|
|
| SegFormer: |
| ```bibtex |
| @article{xie2021segformer, |
| title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers}, |
| author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping}, |
| journal={arXiv preprint arXiv:2105.15203}, |
| year={2021} |
| } |
| ``` |
|
|