Add author citation, set MIT license

f95d623 12 days ago

4.31 kB

	---
	library_name: transformers
	license: mit
	tags:
	- image-segmentation
	- semantic-segmentation
	- segformer
	- facade
	- cmp
	- vision
	pipeline_tag: image-segmentation
	datasets:
	- Xpitfire/cmp_facade
	metrics:
	- mean_iou
	---

	# SegFormer-B0 Fine-Tuned on CMP Facade Dataset

	Custom semantic segmentation model for facade parsing: wall, window, door, and balcony detection on rectified building facades.

	## Model Details

	- Architecture: SegFormer-B0 (NVIDIA, ADE20K-pretrained)
	- Parameters: ~3.7M
	- Task: Semantic Segmentation
	- Input Size: 512×512
	- Classes: 6 unified facade classes

	## Class Mapping

	\| ID \| Class \| Description \|
	\|----\|-------\|-------------\|
	\| 0 \| `background` \| Sky, ground, non-facade regions \|
	\| 1 \| `facade_wall` \| Main wall surface + moldings, cornices, pillars, sills, deco \|
	\| 2 \| `window` \| Windows + blinds \|
	\| 3 \| `door` \| Doors + shopfronts \|
	\| 4 \| `balcony` \| Balconies \|
	\| 5 \| `vegetation_occluder` \| Vegetation (trained as background since CMP lacks this class) \|

	## Training

	- Dataset: [CMP Facade Database](https://huggingface.co/datasets/Xpitfire/cmp_facade) — 378 train, 114 test rectified facade images
	- Original Classes: 12 (facade, molding, cornice, pillar, window, door, sill, blind, balcony, shop, deco, background)
	- Mapping: 12 CMP classes → 6 unified classes (see mapping above)
	- Epochs: ~53 (best at epoch 38, mean IoU 0.4856)
	- Optimizer: AdamW, lr=6e-5
	- Batch Size: 4 per device (effective batch = 8 with grad accumulation)
	- Hardware: Tesla T4 GPU

	## Best Validation Metrics

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Mean IoU \| 0.4856 \|
	\| Facade Wall IoU \| 0.867 \|
	\| Window IoU \| 0.410 \|
	\| Door IoU \| 0.460 \|
	\| Balcony IoU \| 0.230 \|
	\| Background IoU \| 0.467 \|

	## Usage

	```python
	from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
	from PIL import Image
	import torch.nn as nn
	import torch

	# Load model
	processor = SegformerImageProcessor.from_pretrained("Marco333/segformer-b0-facade-cmp")
	model = SegformerForSemanticSegmentation.from_pretrained("Marco333/segformer-b0-facade-cmp")

	# Load image
	image = Image.open("facade.jpg").convert("RGB")

	# Inference
	inputs = processor(images=image, return_tensors="pt")
	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits

	# Upsample to original size
	upsampled = nn.functional.interpolate(
	logits, size=image.size[::-1], mode="bilinear", align_corners=False
	)
	pred_seg = upsampled.argmax(dim=1)[0].cpu().numpy()
	```

	## Intended Use

	- Primary: Second-pass segmentation of rectified facades (after homography rectification)
	- Secondary: First-pass facade detection on raw street photos (with expected lower accuracy due to lack of unrectified training data)

	## Pipeline Role

	This model is designed for use in a 2-pass facade segmentation pipeline:
	1. Pass 1: Segment raw street photo → find facade wall region
	2. Rectify facade via homography
	3. Pass 2: Re-run this model on rectified crop → parse windows, doors, balconies cleanly

	## Limitations

	- Trained only on rectified facade images from CMP. Performance on perspective-distorted street photos will be degraded.
	- No vegetation data in training set — `vegetation_occluder` class will detect as background.
	- Small dataset (378 images) — performance ceiling is moderate.

	## Citation

	Please cite this model if you use it:

	```bibtex
	@misc{corbetta_segformer_facade_cmp_2026,
	author = {Marco Corbetta},
	title = {segformer-b0-facade-cmp: SegFormer-B0 fine-tuned on CMP Facade},
	year = {2026},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/Marco333/segformer-b0-facade-cmp}}
	}
	```

	CMP Dataset:
	```bibtex
	@INPROCEEDINGS{Tylecek13,
	author = {Radim Tyle{\v c}ek and Radim {\v S}{\' a}ra},
	title = {Spatial Pattern Templates for Recognition of Objects with Regular Structure},
	booktitle = {Proc. GCPR},
	year = {2013},
	}
	```

	SegFormer:
	```bibtex
	@article{xie2021segformer,
	title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
	author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
	journal={arXiv preprint arXiv:2105.15203},
	year={2021}
	}
	```