| --- |
| license: creativeml-openrail-m |
| language: |
| - en |
| tags: |
| - controlnet |
| - stable-diffusion |
| - urban-design |
| pipeline_tag: image-to-image |
| --- |
| |
| # Stepwise Generative Urban Design |
|
|
| ControlNet-based diffusion models for automatic urban design generation, conditioned on site constraints and text descriptions. |
|
|
| **Paper**: *Human-guided urban form generation using multimodal diffusion models*, Building and Environment, 2026 |
|
|
| [Full paper](https://doi.org/10.1016/j.buildenv.2025.113892); [Arxiv](https://arxiv.org/abs/2505.24260); |
| **Code & documentation**: [GitHub](https://github.com/Hemy17/Stepwise_GenerativeUrbanDesign) |
|
|
| ## Models |
|
|
| Six checkpoints covering two cities Γ three pipeline steps: |
|
|
| | Checkpoint | City | Step | |
| |------------|------|------| |
| | `checkpoints_step1_nyc` | New York City | Site constraints β Land use + road network | |
| | `checkpoints_step1_chi` | Chicago | Site constraints β Land use + road network | |
| | `checkpoints_step2_nyc` | New York City | Land use + roads β Building footprint layout | |
| | `checkpoints_step2_chi` | Chicago | Land use + roads β Building footprint layout | |
| | `checkpoints_step3_nyc` | New York City | Building footprints β Satellite image | |
| | `checkpoints_step3_chi` | Chicago | Building footprints β Satellite image | |
|
|
| Fine-tuned from [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) + ControlNet. Checkpoints are FP16, ~2.9 GB each. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{he2025human, |
| title = {Human-guided urban form generation using multimodal diffusion models}, |
| author = {He, Mingyi and Liang, Yuebing and Wang, Shenhao and Zheng, Yunhan |
| and Wang, Qingyi and Zhuang, Dingyi and Tian, Li and Zhao, Jinhua}, |
| journal = {Building and Environment}, |
| pages = {113892}, |
| year = {2025}, |
| doi = {10.1016/j.buildenv.2025.113892} |
| } |
| |
| @article{he2025generative, |
| title = {Generative {AI} for urban design: a stepwise approach integrating |
| human expertise with multimodal diffusion models}, |
| author = {He, Mingyi and Liang, Yuebing and Wang, Shenhao and Zheng, Yunhan |
| and Wang, Qingyi and Zhuang, Dingyi and Tian, Li and Zhao, Jinhua}, |
| journal = {arXiv preprint arXiv:2505.24260}, |
| year = {2025} |
| } |
| ``` |
|
|