--- library_name: transformers license: mit tags: - image-segmentation - semantic-segmentation - segformer - facade - cmp - vision pipeline_tag: image-segmentation datasets: - Xpitfire/cmp_facade metrics: - mean_iou --- # SegFormer-B0 Fine-Tuned on CMP Facade Dataset Custom semantic segmentation model for facade parsing: wall, window, door, and balcony detection on rectified building facades. ## Model Details - **Architecture**: SegFormer-B0 (NVIDIA, ADE20K-pretrained) - **Parameters**: ~3.7M - **Task**: Semantic Segmentation - **Input Size**: 512×512 - **Classes**: 6 unified facade classes ## Class Mapping | ID | Class | Description | |----|-------|-------------| | 0 | `background` | Sky, ground, non-facade regions | | 1 | `facade_wall` | Main wall surface + moldings, cornices, pillars, sills, deco | | 2 | `window` | Windows + blinds | | 3 | `door` | Doors + shopfronts | | 4 | `balcony` | Balconies | | 5 | `vegetation_occluder` | Vegetation (trained as background since CMP lacks this class) | ## Training - **Dataset**: [CMP Facade Database](https://huggingface.co/datasets/Xpitfire/cmp_facade) — 378 train, 114 test rectified facade images - **Original Classes**: 12 (facade, molding, cornice, pillar, window, door, sill, blind, balcony, shop, deco, background) - **Mapping**: 12 CMP classes → 6 unified classes (see mapping above) - **Epochs**: ~53 (best at epoch 38, mean IoU 0.4856) - **Optimizer**: AdamW, lr=6e-5 - **Batch Size**: 4 per device (effective batch = 8 with grad accumulation) - **Hardware**: Tesla T4 GPU ## Best Validation Metrics | Metric | Value | |--------|-------| | Mean IoU | 0.4856 | | Facade Wall IoU | 0.867 | | Window IoU | 0.410 | | Door IoU | 0.460 | | Balcony IoU | 0.230 | | Background IoU | 0.467 | ## Usage ```python from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation from PIL import Image import torch.nn as nn import torch # Load model processor = SegformerImageProcessor.from_pretrained("Marco333/segformer-b0-facade-cmp") model = SegformerForSemanticSegmentation.from_pretrained("Marco333/segformer-b0-facade-cmp") # Load image image = Image.open("facade.jpg").convert("RGB") # Inference inputs = processor(images=image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits # Upsample to original size upsampled = nn.functional.interpolate( logits, size=image.size[::-1], mode="bilinear", align_corners=False ) pred_seg = upsampled.argmax(dim=1)[0].cpu().numpy() ``` ## Intended Use - **Primary**: Second-pass segmentation of rectified facades (after homography rectification) - **Secondary**: First-pass facade detection on raw street photos (with expected lower accuracy due to lack of unrectified training data) ## Pipeline Role This model is designed for use in a 2-pass facade segmentation pipeline: 1. Pass 1: Segment raw street photo → find facade wall region 2. Rectify facade via homography 3. Pass 2: Re-run this model on rectified crop → parse windows, doors, balconies cleanly ## Limitations - Trained only on **rectified** facade images from CMP. Performance on perspective-distorted street photos will be degraded. - No vegetation data in training set — `vegetation_occluder` class will detect as background. - Small dataset (378 images) — performance ceiling is moderate. ## Citation Please cite this model if you use it: ```bibtex @misc{corbetta_segformer_facade_cmp_2026, author = {Marco Corbetta}, title = {segformer-b0-facade-cmp: SegFormer-B0 fine-tuned on CMP Facade}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/Marco333/segformer-b0-facade-cmp}} } ``` CMP Dataset: ```bibtex @INPROCEEDINGS{Tylecek13, author = {Radim Tyle{\v c}ek and Radim {\v S}{\' a}ra}, title = {Spatial Pattern Templates for Recognition of Objects with Regular Structure}, booktitle = {Proc. GCPR}, year = {2013}, } ``` SegFormer: ```bibtex @article{xie2021segformer, title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers}, author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping}, journal={arXiv preprint arXiv:2105.15203}, year={2021} } ```