Urban Expansion Detector — Qwen2.5-VL-72B

Fine-tuned Qwen2.5-VL-72B-Instruct on AMD MI300X for satellite imagery urban expansion detection. Analyzes Sentinel-2 overhead tiles, identifies built-up areas with normalized bounding boxes, estimates urban coverage fraction, and generates plain-language corridor reports for transit infrastructure monitoring.

Demonstrated on the Delhi-Meerut RRTS corridor — India's first operational regional rapid transit system.

Model Description

  • Developed by: MohitML10
  • Model type: Vision-Language Model (LoRA fine-tune)
  • Language: English
  • License: Apache 2.0
  • Finetuned from: Qwen/Qwen2.5-VL-72B-Instruct

Model Sources

Direct Use

Upload any overhead satellite image (Sentinel-2 or equivalent). The model returns:

  • Built area fraction as a decimal (0.0 to 1.0)
  • Bounding box coordinates for detected urban clusters in [x1, y1, x2, y2] normalized format
  • Plain-language spatial analysis describing urban development patterns

Downstream Use

Multi-tile corridor analysis — feed sequential tiles along a transit route and the model synthesizes a corridor-level urban development summary with PDF export. Intended for urban planners, policy researchers, and transit infrastructure teams.

Out-of-Scope Use

  • High-resolution aerial imagery (trained on Sentinel-2 resolution)
  • Non-overhead ground-level photography
  • Precise cadastral or property-level boundary detection

How to Get Started with the Model

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch
from PIL import Image

base_model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-72B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "MohitML10/urban-expansion-detector-72b-v3")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-72B-Instruct")

image = Image.open("your_satellite_tile.png").convert("RGB")
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Analyze this satellite image for urban expansion. Provide bounding boxes [x1,y1,x2,y2] normalized 0-1 for each urban cluster and estimate the built area fraction."}
    ]
}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training Data

8,000 curated examples from NuTonic/sat-bbox-metadata-sft-v1 by Joseph Pollack, filtered to high built-fraction tiles (built_fraction >= 0.15). Dataset contains Sentinel-2 satellite imagery paired with TiM-style land cover analytics JSON and expert geospatial analysis text.

437 India-specific tiles included covering urbanizing regions across the subcontinent.

Training Hyperparameters

  • Training regime: bfloat16
  • LoRA rank: 16
  • LoRA alpha: 32
  • Target modules: q_proj, v_proj
  • LoRA dropout: 0.05
  • Trainable parameters: 32,768,000 (0.0446% of total)
  • Total parameters: 73,443,545,344
  • Learning rate: 2e-4
  • Optimizer: AdamW
  • Steps: 3,000
  • Gradient clipping: 1.0

Speeds, Sizes, Times

  • Hardware: AMD MI300X (192GB HBM3) via AMD Developer Cloud
  • Framework: ROCm 6.2, PyTorch 2.5.1+rocm6.2
  • Inference time: ~30 seconds per tile on MI300X
  • Adapter size: 131MB

Results

Qualitative evaluation on Delhi-Meerut RRTS corridor tiles:

Station Built Fraction Clusters Detected
Meerut South 0.34 2
Muradnagar 0.36 2
Sarai Kale Khan 0.36 1

Model correctly identifies urban cluster concentration patterns and produces coherent corridor-level synthesis describing transit-induced urbanization gradients.

Environmental Impact

  • Hardware Type: AMD MI300X
  • Cloud Provider: AMD Developer Cloud (DigitalOcean)
  • Compute Region: US East
  • Hours used: ~29 hours total (training + inference testing)

Full Pipeline Capability

Input: one or more Sentinel-2 satellite tiles Output: annotated images with bounding boxes drawn over detected urban clusters, built area fraction per tile, plain-language spatial analysis, and a multi-page PDF corridor report synthesizing findings across all tiles.

Model Architecture

LoRA adapters applied to q_proj and v_proj layers of Qwen2.5-VL-72B-Instruct. Base model handles multimodal vision-language understanding; adapters steer output toward geospatial analytical format with normalized bounding box coordinates and built fraction estimates.

Compute Infrastructure

AMD MI300X — 192GB HBM3 unified memory. Required for loading 72B parameter model at bfloat16 (144GB) with headroom for LoRA adapter states and activations.

Citation

@misc{urban-expansion-detector-2026,
  author = {MohitML10},
  title = {Urban Expansion Detector: Fine-tuned Qwen2.5-VL-72B for Satellite Urban Expansion Detection},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/MohitML10/urban-expansion-detector-72b-v3}
}

Dataset citation:

@dataset{nutonic-sat-bbox-2024,
  author = {Pollack, Joseph},
  title = {sat-bbox-metadata-sft-v1},
  publisher = {HuggingFace},
  url = {https://huggingface.co/datasets/NuTonic/sat-bbox-metadata-sft-v1}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MohitML10/urban-expansion-detector-72b-v3

Adapter
(23)
this model

Dataset used to train MohitML10/urban-expansion-detector-72b-v3

Space using MohitML10/urban-expansion-detector-72b-v3 1