Texture2LoD3: Enabling LoD3 Building Reconstruction With Panoramic Images
Paper β’ 2504.05249 β’ Published β’ 2
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Marco333/segformer-b0-facade-mixed", dtype="auto")Status: Training script ready. Run
train.pyto produce this model. See How to Train below.
A SegFormer-B0 model trained on mixed rectified and unrectified facade data for a 2-pass pipeline:
| Architecture | SegFormer-B0 (Mix Transformer encoder + all-MLP decoder) |
| Parameters | ~3.7 M |
| Input | RGB image, any resolution (resized to 512Γ512) |
| Output | 6-class pixel mask |
| Format | SafeTensors |
| Base model | Marco333/segformer-b0-facade-cmp |
| ID | Class | Function | Pass |
|---|---|---|---|
| 0 | background |
Sky, ground, non-facade regions | Both |
| 1 | facade_wall |
Main wall surface (merged: facade, molding, cornice, pillar, sill, deco) | Both |
| 2 | window |
Windows + blinds + shopfronts | Both |
| 3 | door |
Doors + shopfronts | Both |
| 4 | balcony |
Balconies | Both |
| 5 | vegetation_occluder |
Trees, plants occluding facade | Both |
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import torch.nn.functional as F
processor = SegformerImageProcessor.from_pretrained("Marco333/segformer-b0-facade-mixed")
model = SegformerForSemanticSegmentation.from_pretrained("Marco333/segformer-b0-facade-mixed")
# Pass 1: raw street photo
image = Image.open("street_photo.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
mask = F.interpolate(outputs.logits, size=image.size[::-1],
mode="bilinear", align_corners=False).argmax(dim=1)[0]
# Find biggest facade_wall blob (class 1), compute homography, rectify...
# Then Pass 2 on rectified crop:
rectified = Image.open("rectified_facade.jpg").convert("RGB")
inputs2 = processor(images=rectified, return_tensors="pt")
outputs2 = model(**inputs2)
mask2 = F.interpolate(outputs2.logits, size=rectified.size[::-1],
mode="bilinear", align_corners=False).argmax(dim=1)[0]
This repo contains the training script. Run it on any GPU (T4 or better):
pip install transformers datasets evaluate accelerate torch torchvision Pillow numpy
python train.py
What the script does:
segformer-b0-facade-cmp, 48.56% mIoU)Training time: ~4-6h on T4 GPU
| Dataset | Type | Images | Geometry | Classes (raw) | Classes (remapped) |
|---|---|---|---|---|---|
| CMP Facade | Primary | ~492 | Rectified | 12 | 6 (background, wall, window, door, balcony, ignore) |
| ADE20K scene_parse_150 | Augmentation | ~5K filtered | Unrectified (perspective) | 150 | 6 (same taxonomy) |
RandomPerspective, distortion=0.3, p=0.4) closes the geometric domain gapLiterature confirms the gap: Texture2LoD3 measured SegFormer drops ~10pp IoU on unrectified vs rectified facades. Perspective augmentation during training is the practical fix.
| Parameter | Value |
|---|---|
| Base checkpoint | Marco333/segformer-b0-facade-cmp |
| Optimizer | AdamW |
| Learning rate | 6 Γ 10β»β΅ |
| LR schedule | Polynomial decay |
| Warmup | 10% of steps |
| Weight decay | 0.01 |
| Effective batch size | 8 (4 Γ device Β· 2 grad accum) |
| Resolution | 512 Γ 512 |
| Precision | FP16 |
| Epochs | 80 |
| Augmentation | ColorJitter + RandomPerspective (p=0.4, distortion=0.3) |
| Selection metric | Highest mean IoU on validation |
| Capability | CMP-only (baseline) | Mixed (this model) |
|---|---|---|
| Rectified facades | β 48.6% mIoU | β Likely 55-70% (more data + transfer) |
| Unrectified street photos | β Untrained | β Trained on ADE20K perspective scenes |
| Perspective robustness | ~10pp IoU drop | Gap closed via augmentation |
CMP Facade:
@INPROCEEDINGS{Tylecek13,
author = {Radim Tyle{\v c}ek and Radim {\v S}{\' a}ra},
title = {Spatial Pattern Templates for Recognition of Objects with Regular Structure},
booktitle = {Proc. GCPR},
year = {2013},
}
ADE20K:
@article{zhou2017scene,
title={Scene Parsing through ADE20K Dataset},
author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
journal={CVPR},
year={2017}
}
SegFormer:
@article{xie2021segformer,
title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
journal={arXiv preprint arXiv:2105.15203},
year={2021}
}
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-segmentation", model="Marco333/segformer-b0-facade-mixed")