Urban Expansion Detector — Qwen2.5-VL-72B
Fine-tuned Qwen2.5-VL-72B-Instruct on AMD MI300X for satellite imagery urban expansion detection. Analyzes Sentinel-2 overhead tiles, identifies built-up areas with normalized bounding boxes, estimates urban coverage fraction, and generates plain-language corridor reports for transit infrastructure monitoring.
Demonstrated on the Delhi-Meerut RRTS corridor — India's first operational regional rapid transit system.
Model Description
- Developed by: MohitML10
- Model type: Vision-Language Model (LoRA fine-tune)
- Language: English
- License: Apache 2.0
- Finetuned from: Qwen/Qwen2.5-VL-72B-Instruct
Model Sources
- Demo: HuggingFace Space
- Dataset: NuTonic/sat-bbox-metadata-sft-v1
Direct Use
Upload any overhead satellite image (Sentinel-2 or equivalent). The model returns:
- Built area fraction as a decimal (0.0 to 1.0)
- Bounding box coordinates for detected urban clusters in [x1, y1, x2, y2] normalized format
- Plain-language spatial analysis describing urban development patterns
Downstream Use
Multi-tile corridor analysis — feed sequential tiles along a transit route and the model synthesizes a corridor-level urban development summary with PDF export. Intended for urban planners, policy researchers, and transit infrastructure teams.
Out-of-Scope Use
- High-resolution aerial imagery (trained on Sentinel-2 resolution)
- Non-overhead ground-level photography
- Precise cadastral or property-level boundary detection
How to Get Started with the Model
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch
from PIL import Image
base_model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-72B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "MohitML10/urban-expansion-detector-72b-v3")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-72B-Instruct")
image = Image.open("your_satellite_tile.png").convert("RGB")
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Analyze this satellite image for urban expansion. Provide bounding boxes [x1,y1,x2,y2] normalized 0-1 for each urban cluster and estimate the built area fraction."}
]
}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Training Data
8,000 curated examples from NuTonic/sat-bbox-metadata-sft-v1 by Joseph Pollack, filtered to high built-fraction tiles (built_fraction >= 0.15). Dataset contains Sentinel-2 satellite imagery paired with TiM-style land cover analytics JSON and expert geospatial analysis text.
437 India-specific tiles included covering urbanizing regions across the subcontinent.
Training Hyperparameters
- Training regime: bfloat16
- LoRA rank: 16
- LoRA alpha: 32
- Target modules: q_proj, v_proj
- LoRA dropout: 0.05
- Trainable parameters: 32,768,000 (0.0446% of total)
- Total parameters: 73,443,545,344
- Learning rate: 2e-4
- Optimizer: AdamW
- Steps: 3,000
- Gradient clipping: 1.0
Speeds, Sizes, Times
- Hardware: AMD MI300X (192GB HBM3) via AMD Developer Cloud
- Framework: ROCm 6.2, PyTorch 2.5.1+rocm6.2
- Inference time: ~30 seconds per tile on MI300X
- Adapter size: 131MB
Results
Qualitative evaluation on Delhi-Meerut RRTS corridor tiles:
| Station | Built Fraction | Clusters Detected |
|---|---|---|
| Meerut South | 0.34 | 2 |
| Muradnagar | 0.36 | 2 |
| Sarai Kale Khan | 0.36 | 1 |
Model correctly identifies urban cluster concentration patterns and produces coherent corridor-level synthesis describing transit-induced urbanization gradients.
Environmental Impact
- Hardware Type: AMD MI300X
- Cloud Provider: AMD Developer Cloud (DigitalOcean)
- Compute Region: US East
- Hours used: ~29 hours total (training + inference testing)
Full Pipeline Capability
Input: one or more Sentinel-2 satellite tiles Output: annotated images with bounding boxes drawn over detected urban clusters, built area fraction per tile, plain-language spatial analysis, and a multi-page PDF corridor report synthesizing findings across all tiles.
Model Architecture
LoRA adapters applied to q_proj and v_proj layers of Qwen2.5-VL-72B-Instruct. Base model handles multimodal vision-language understanding; adapters steer output toward geospatial analytical format with normalized bounding box coordinates and built fraction estimates.
Compute Infrastructure
AMD MI300X — 192GB HBM3 unified memory. Required for loading 72B parameter model at bfloat16 (144GB) with headroom for LoRA adapter states and activations.
Citation
@misc{urban-expansion-detector-2026,
author = {MohitML10},
title = {Urban Expansion Detector: Fine-tuned Qwen2.5-VL-72B for Satellite Urban Expansion Detection},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/MohitML10/urban-expansion-detector-72b-v3}
}
Dataset citation:
@dataset{nutonic-sat-bbox-2024,
author = {Pollack, Joseph},
title = {sat-bbox-metadata-sft-v1},
publisher = {HuggingFace},
url = {https://huggingface.co/datasets/NuTonic/sat-bbox-metadata-sft-v1}
}
Model tree for MohitML10/urban-expansion-detector-72b-v3
Base model
Qwen/Qwen2.5-VL-72B-Instruct