Spaces:

ibm-esa-geospatial
/

challenge

Running

App Files Files Community

TerraMind-Flood: DEM-Enhanced Flood Detection with Physics-Aware Learning

#15

by riskakuswati - opened Jan 27

Discussion

riskakuswati

Jan 27

•

edited Jan 27

TerraMind-Flood: DEM-Enhanced Flood Detection with Physics-Aware Learning

The Challenge: Flood Mapping at Scale

Flooding affects 1.5 billion people annually and causes over $40 billion in damages worldwide. Rapid, accurate flood extent mapping is critical for emergency response, yet traditional approaches struggle with cloud cover, limited labeled data, and poor generalization across geographic regions.

Our question: Can we leverage geospatial foundation models to create robust flood detection that works globally with minimal task-specific data?

Our Solution: TerraMind-Flood

We present TerraMind-Flood, a flood detection system that extends TerraMind's multimodal capabilities with elevation-aware reasoning. Our approach integrates Digital Elevation Model (DEM) information through cross-attention fusion, enabling the model to understand that water flows downhill—a fundamental physical constraint often ignored by purely data-driven methods.

Architecture Highlights

Frozen TerraMind Backbone (87.3M parameters): Preserves the rich geospatial representations learned during pre-training while enabling efficient adaptation.
Cross-Attention DEM Fusion: Rather than simple concatenation, we use cross-attention to let optical features query elevation information, learning spatially-varying relationships between terrain and flood susceptibility.
ControlNet-Style Adapter: Zero-initialized convolutions ensure DEM conditioning has no impact at initialization, allowing gradual learning of elevation-flood relationships without disrupting pre-trained representations.
Physics-Aware Loss: We incorporate a gradient consistency term encouraging flood predictions to align with downhill water flow patterns derived from DEM slope analysis.

Training on Real Flood Events

We train and evaluate on Sen1Floods11 (Bonafilia et al., CVPR 2020), a benchmark dataset of 446 hand-labeled flood events across 11 countries spanning six continents. Unlike synthetic or proxy-based flood data, Sen1Floods11 captures actual flood extent during real disaster events.

Geographic Split: We use a strict country-based train/validation split:

Train: Bolivia, Ghana, India, Nigeria, Pakistan, Paraguay, Somalia, Spain (237 samples)
Validation: Mekong, Sri Lanka, USA (110 samples)

This prevents data leakage and tests true geographic generalization—a critical requirement for operational deployment.

Results

Metric	Value	Description
IoU	58.3%	Intersection over Union for flood class
F1 Score	70.1%	Harmonic mean of precision and recall
POD	88.2%	Probability of Detection (recall)
FAR	39.2%	False Alarm Ratio

Our model correctly detects 88% of actual flooded areas while maintaining reasonable precision. The high POD is particularly important for disaster response, where missing flooded regions has severe consequences.

Training Curves

Figure 1: Training loss and IoU metrics over 76 epochs. Early stopping triggered at epoch 76 with best validation IoU of 44.6%.

Key Improvements Over Baseline

Training with class-weighted loss (9x weight for flood pixels) and strong augmentation (rotation, flip, brightness/contrast) improved IoU from 5% to 58%—an 11x improvement. This demonstrates that careful attention to class imbalance and data augmentation is essential when working with rare-event detection tasks.

Validation Predictions

Figure 2: Flood predictions on held-out validation countries (Mekong, Sri Lanka, USA). Columns show: RGB input, DEM proxy, ground truth (blue=flood, gray=nodata), model prediction, and overlay visualization.

Scientific Foundation

Our work builds on recent advances in geospatial foundation models. As Zhu et al. describe in their comprehensive framework for Earth foundation models (Nature Communications Earth & Environment, 2025), the ideal Earth FM should possess physical consistency—incorporating principles like conservation and causality to improve transferability and transparency. Our physics-aware loss directly addresses this by encoding the physical constraint that water accumulates in low-elevation areas.

TerraMind's multimodal architecture (Jakubik et al., 2025) provides an ideal backbone for this task, with its ability to process Sentinel-2 imagery and generate semantically meaningful representations across diverse geographic contexts. By freezing the backbone and training only the DEM fusion modules (~9.6M trainable parameters, 10% of total), we achieve efficient adaptation while preserving generalization.

Real-World Impact

Disaster Response: Near-real-time flood mapping enables emergency responders to prioritize rescue operations and resource allocation.

Climate Adaptation: As flood frequency increases with climate change, scalable monitoring tools become essential for infrastructure planning and risk assessment.

SDG Alignment:

SDG 11 (Sustainable Cities): Enabling communities to monitor and respond to flood hazards
SDG 13 (Climate Action): Supporting climate adaptation through improved environmental monitoring

What We Learned

Class imbalance matters: Floods typically cover only 10-15% of imagery. Without explicit handling, models predict "no flood" everywhere.
Geographic generalization is hard: Models trained on one region often fail elsewhere. Country-based validation splits reveal true generalization capability.
DEM integration helps: Cross-attention fusion outperforms simple concatenation by learning spatially-varying elevation-flood relationships.
Foundation models accelerate development: Fine-tuning TerraMind required only ~76 epochs on a single GPU, compared to training from scratch.

Reproducibility

All code, trained weights, and evaluation scripts are provided in my github (https://github.com/R1-AK/terramind-flood):

Notebook: TerraMind_Flood_Full_Implementation.ipynb
Dataset: Auto-downloads from Google Cloud Storage
Model: terramind_flood_sen1floods11_best.pth

The implementation runs end-to-end on Google Colab with a T4 GPU, making it accessible for researchers and practitioners without specialized infrastructure.

Future Directions

Real DEM Integration: Replace NIR-SWIR proxy with Copernicus DEM or SRTM data
Temporal Modeling: Incorporate pre-flood imagery for change detection
SAR Fusion: Add Sentinel-1 for cloud-penetrating flood detection
Uncertainty Quantification: Enable confidence-aware predictions for operational deployment

Contact:
Riska Kuswati, Geospatial Researcher (riska.kuswati@monash.edu)

Acknowledgments: This work uses the TerraMind foundation model and Sen1Floods11 dataset. We thank the IBM-ESA team for developing TerraMind and making it openly available for research.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment