metadata
license: creativeml-openrail-m
language:
- en
tags:
- controlnet
- stable-diffusion
- urban-design
pipeline_tag: image-to-image
Stepwise Generative Urban Design
ControlNet-based diffusion models for automatic urban design generation, conditioned on site constraints and text descriptions.
Paper: Human-guided urban form generation using multimodal diffusion models, Building and Environment, 2026
Full paper; Arxiv; Code & documentation: GitHub
Models
Six checkpoints covering two cities × three pipeline steps:
| Checkpoint | City | Step |
|---|---|---|
checkpoints_step1_nyc |
New York City | Site constraints → Land use + road network |
checkpoints_step1_chi |
Chicago | Site constraints → Land use + road network |
checkpoints_step2_nyc |
New York City | Land use + roads → Building footprint layout |
checkpoints_step2_chi |
Chicago | Land use + roads → Building footprint layout |
checkpoints_step3_nyc |
New York City | Building footprints → Satellite image |
checkpoints_step3_chi |
Chicago | Building footprints → Satellite image |
Fine-tuned from runwayml/stable-diffusion-v1-5 + ControlNet. Checkpoints are FP16, ~2.9 GB each.
Citation
@article{he2025human,
title = {Human-guided urban form generation using multimodal diffusion models},
author = {He, Mingyi and Liang, Yuebing and Wang, Shenhao and Zheng, Yunhan
and Wang, Qingyi and Zhuang, Dingyi and Tian, Li and Zhao, Jinhua},
journal = {Building and Environment},
pages = {113892},
year = {2025},
doi = {10.1016/j.buildenv.2025.113892}
}
@article{he2025generative,
title = {Generative {AI} for urban design: a stepwise approach integrating
human expertise with multimodal diffusion models},
author = {He, Mingyi and Liang, Yuebing and Wang, Shenhao and Zheng, Yunhan
and Wang, Qingyi and Zhuang, Dingyi and Tian, Li and Zhao, Jinhua},
journal = {arXiv preprint arXiv:2505.24260},
year = {2025}
}