Spaces:
Paused
title: MicroClimate-X
emoji: ๐ง๏ธ
colorFrom: blue
colorTo: green
sdk: docker
app_port: 8000
pinned: false
license: mit
short_description: Hybrid microclimate risk for complex terrain (FYP demo)
MicroClimate-X
Intelligent Meteorological Analysis System for Complex Terrain
้ขๅๅคๆๅฐๅฝข็ๆบ่ฝๆฐ่ฑกๅๆ็ณป็ป
Live demo / ๅจ็บฟๆผ็คบ: https://huggingface.co/spaces/W1nd5pac/microclimate-x
(Deployed as a Hugging Face Space โ Docker SDK. Seedocs/DEPLOY_HF.mdfor the deployment recipe.)
A Final Year Project at Universiti Kebangsaan Malaysia (UKM) โ Faculty of Information Science & Technology.
For thesis supervisors / ๅฏผๅธ้ ่ฏป่ทฏๅพ
| Step | Document | What it shows |
|---|---|---|
| 1. Dataset | docs/dataset.md |
Source ยท schema ยท Y derivation ยท train/test split |
| 2. Model | models/MODEL_CARD.md |
Intended use ยท metrics ยท limitations ยท ethics |
| 3. Evaluation | figures/ + figures/evaluation_summary.json |
6 publication figures, all reproducible via make evaluate |
| 4. Architecture | docs/architecture.md + docs/thresholds.md |
Hybrid engine, every threshold cited |
| 5. Pipeline order | docs/pipeline_order.md |
Explicit "dataset โ model โ app" sequence |
| 6. Meeting brief | docs/supervisor_meeting_brief.md |
Detailed bilingual EN/ZH script |
| 7. Cheat sheet | docs/MEETING_CHEAT_SHEET.md ยท HTML |
Open on screen during the meeting โ tab-order ยท demo script ยท Q&A ยท checklist |
1. Problem Statement / ็็น
Traditional weather forecasting relies on macro-scale grids (20 km ร 20 km) that fail catastrophically in complex terrain. A single forecast cell may cover a mountain peak, a valley floor, and a windward slope โ all of which have vastly different microclimates.
ไผ ็ปๅคฉๆฐ้ขๆฅไฝฟ็จ 20 km ร 20 km ๅฎ่ง็ฝๆ ผ๏ผๅจๅฑฑๅบไผไธฅ้ๅคฑ็ใๅไธ็ฝๆ ผๅ ๅฏ่ฝๅๆถๅ ๅซๅฑฑ้กถใ่ฐทๅบๅ่ฟ้ฃๅก๏ผไฝๅฎไปฌ็ๅพฎๆฐๅๅฎๅ จไธๅใ
2. Solution: The Hybrid Engine / ่งฃๅณๆนๆก
MicroClimate-X uses a dual-engine hybrid architecture combining a Machine Learning predictor with a topographic Rule-Based Expert System.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ User clicks a coordinate on the map (lat, lon) โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Open-Meteo (weather) + Open Topo Data (DEM) โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโผโโโโโโโโโโโ โโโโโโโโโโโโโโผโโโโโโโโโโโโ
โ Engine A โ โ Engine B โ
โ Random Forest โ probabilityโ Topographic Rules โ
โ (in-distribution โโโโโโโโโโโโโโโบโ + Veto Triggers โ
โ rain probability) โ โ (safety-critical) โ
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโฌโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโ
โ Risk Score 0-100 โ
โ + Bilingual Advice โ
โ + XAI Inference Log โ
โโโโโโโโโโโโโโโโโโโโโโโโโโ
Why Hybrid? / ไธบไปไนๆททๅ๏ผ
Pure ML can fail catastrophically out-of-distribution. Example: feed Mount Everest coordinates โ ML predicts 0% rain โ returns "Safe" โ ignoring -30ยฐC, hypoxia, gale-force winds.
Engine B's Veto mechanism provides bounded safety guarantees by overriding the ML score when physical thresholds are breached. This follows the Neuro-Symbolic AI paradigm (Garcez & Lamb, 2020).
Engine B internals โ one-to-one with D5 proposal ยง3.7 / P4
The rule engine is decomposed exactly along the lines of the thesis proposal so every line of code maps to a section number:
| Proposal step | Code | Output |
|---|---|---|
| P4.1 Load Dynamic Risk Rules | backend/config.py |
All thresholds, weights, and the R1-R4 decision table, each annotated with its academic citation |
| P4.2 Fetch User Context | ?activity=hiker|driver|construction|general |
Activity is plumbed into the request flow |
| P4.3 Evaluate Environmental Risks | Four score_*_risk() functions in rule_engine.py |
Rainfall / Fog / Wind-gust / Thunderstorm sub-scores (each 0-100) |
| ยง3.7.2 Table 4.2 Decision Table | apply_decision_table_3_7_2() |
Which of R1-R4 fired (hidden rain / no amplification / heavy downpour / standard rain) |
| Veto cascade | _collect_veto_triggers() |
Life-safety overrides (Mt-Everest type) โ capped at 100 |
| P4.4 Activity weighting | apply_activity_weighting() |
(activity ร hazard) weight matrix |
| P4.5 Composite score | Same | 0.80 ยท max(weighted) + 0.20 ยท mean(rest) โ dominant hazard wins |
| P4.6 Actionable advice | _normal_advice() / _veto_advice() |
Bilingual EN/ZH paragraph that names the dominant hazard |
Four hazard categories surfaced in the UI as four mini-gauges; the four R1-R4 indicators light up beside the score card whenever a rule fires.
3. Tech Stack / ๆๆฏๆ
| Layer | Technology |
|---|---|
| Frontend | Vue 3 (CDN) + Tailwind CSS + Leaflet.js + ECharts |
| Backend | Python 3.10+, FastAPI, Uvicorn |
| ML | Scikit-Learn (Random Forest), Pandas, NumPy |
| Storage | SQLite 3 (WAL mode, risk-adaptive TTL cache) |
| External | Open-Meteo Historical Archive (ERA5), Open Topo Data (SRTM DEM) |
4. Dataset / ๆฐๆฎ้
- Source: Open-Meteo Historical Weather API (ERA5 reanalysis)
- Region: Malaysian mountain areas (Genting Highlands, Cameron Highlands, Fraser's Hill, Klang Valley, Mount Kinabalu region)
- Time Range: 2020-01-01 to 2023-12-31 (hourly resolution, 5 sites ร ~35 000 hours each)
- Features (X):
elevation_m,temperature_c,humidity_pct,wind_speed_kmh,wind_direction_deg,surface_pressure_hpa - Target (Y):
is_rain_eventโ binary, 1 ifprecipitation(t+1h) > 0.1 mmelse 0 (per WMO trace-precipitation definition)
5. Quick Start / ๅฟซ้ๅผๅง
git clone https://github.com/KyoukoLi/microclimate-x.git
cd microclimate-x
# Fast path โ everything via the Makefile
make install-dev # 1. create venv + install runtime + dev deps
make synth # 2. generate synthetic dataset (offline)
# โฆor `make` nothing here and run `python scripts/1_download_dataset.py`
# to fetch real ERA5 data when network is available.
make preprocess # 3. feature engineering + Y derivation
make train # 4. RF training + time-based CV
make evaluate # 5. ROC / PR / calibration / threshold-sweep figures
make run # 6. uvicorn dev server on http://localhost:8000
# Then open frontend/index.html (or browse to http://localhost:8000/app/)
Docker one-liner
docker compose up --build
# API lives on http://localhost:8000 ยท frontend on http://localhost:8000/app/
Test it
make test # 70 tests, ~12 s
make lint # ruff โ zero errors expected
Training results on real ERA5 data / ็ๅฎ ERA5 ๆฐๆฎ่ฎญ็ป็ปๆ
Trained on 175 315 hourly samples from Open-Meteo Historical Archive
(ECMWF ERA5 reanalysis) covering five Malaysian mountain sites,
2020-01-01 โ 2024-12-31. Time-based split: last 20 % per site held out
(n = 35 063 test samples). See models/MODEL_CARD.md
for the full evaluation card and figures/ for publication-ready plots.
| Metric | Value | Source |
|---|---|---|
| Test ROC AUC | 0.871 | figures/01_roc_curve.png |
| Test PR Average Precision | 0.750 | figures/02_pr_curve.png |
| Brier score (calibration) | 0.138 | figures/03_calibration_curve.png |
| Best F2 @ ฯ = 0.20 | 0.778 | figures/04_threshold_sweep.png |
| Recall (at chosen ฯ = 0.20) | 0.934 โ safety-critical recall | |
| Class balance | 29.2 % positive (Malaysian mountain climatology) |
We deliberately operate at ฯ = 0.20, not the default 0.50, because in safety-critical settings a missed rain event (false negative) on a windward slope is dramatically worse than a false positive. F2 score weights recall 4ร higher than precision and is the principled metric for this regime.
5-fold time-series CV on the training fold gives AUC ranging 0.828-0.908 (mean โ 0.858), confirming the model is not over-fitting a single temporal slice.
Feature importance โ what the model actually learned
| Rank | Feature | Importance | Interpretation |
|---|---|---|---|
| 1 | precipitation_lag_1h |
37.1 % | Rain autocorrelation โ the well-documented "rain begets rain" persistence signal in short-term nowcasting (Wilson et al., 2010). |
| 2-3 | hour_cos, hour_sin |
18.6 % | Diurnal convective cycle โ Malaysian mountain rainfall peaks in late afternoon. |
| 4 | pressure_change_3h |
4.7 % | Falling pressure precedes incoming storms โ the classical synoptic-scale precursor. |
| 5-6 | wind_v, dew_point_c |
8.1 % | Moisture transport + saturation potential. |
| 7-14 | other meteorological X | 22 % | T, humidity, cloud cover, wind, dew-point depression, pressure. |
| 15-17 | month_*, elevation_m |
4 % | Low because the time-of-day and lag features already absorb most of the seasonal/static signal. |
| 18 | cape_jkg |
0.0 % | โ ๏ธ ERA5 archive CAPE values for these coordinates are predominantly zero โ a known coverage gap. The Veto-rule engine still uses CAPE thresholds directly from the live Open-Meteo forecast at inference time. |
Why F2 instead of accuracy?
Accuracy is misleading on imbalanced safety-critical classification. A model that predicts "no rain" 100 % of the time achieves 69.2 % accuracy here while being completely useless. F2 weights recall twice as heavily as precision, which is correct for a hiker-safety app where missing a real rain event (False Negative) is far worse than a false alarm (False Positive).
See models/training_report.json for the full 5-fold CV report.
6. Project Structure / ้กน็ฎ็ปๆ
microclimate-x/
โโโ backend/
โ โโโ main.py # FastAPI app + lifespan
โ โโโ ml_engine.py # Loads RF model, predict_proba
โ โโโ rule_engine.py # Veto rules + risk scoring + bilingual advice
โ โโโ terrain.py # DEM-based Valley/Slope/Flat classification
โ โโโ cache.py # SQLite WAL cache, risk-adaptive TTL
โ โโโ schemas.py # Pydantic request/response models
โ โโโ config.py # Thresholds + academic citations
โโโ scripts/
โ โโโ 1_download_dataset.py # Open-Meteo + Open-Topo-Data (real ERA5)
โ โโโ 1b_synth_dataset.py # physically-plausible offline fallback
โ โโโ 2_preprocess.py
โ โโโ 3_train_model.py
โโโ frontend/
โ โโโ index.html # Single-file Vue3 SPA
โโโ docs/
โ โโโ architecture.md
โ โโโ thresholds.md # Veto thresholds with academic citations
โโโ tests/
โ โโโ test_rule_engine.py
โโโ data/ # raw/processed CSVs (gitignored)
โโโ models/ # trained .pkl artifacts (gitignored)
โโโ requirements.txt
7. Key Design Decisions / ๅ ณ้ฎ่ฎพ่ฎก
| Decision | Rationale |
|---|---|
| Random Forest over SVM / Deep Learning | Handles non-linear weather-terrain interactions; outputs interpretable feature importance; no GPU needed; robust on tabular data |
Binary classification (is_rain_event) |
One-hour-ahead nowcasting matches the use case (hikers' immediate decisions) |
| Time-based train/test split | Random split would leak temporal correlation โ inflated metrics |
| Class-weight balanced | Rain is the minority class (~25% in Malaysian mountains) |
| Wind direction as u/v components | Raw degrees treat 0ยฐ and 360ยฐ as far apart โ mathematically incorrect |
| Risk-adaptive cache TTL | High-risk scenarios refresh faster (60 s) than safe ones (600 s) |
| SQLite WAL mode | Allows concurrent reads during writes โ critical for FastAPI async |
8. Academic References / ๅญฆๆฏๅ่
- Bhuiyan, M. A. E., et al. (2020). Improving satellite-based precipitation estimates over complex terrain using machine learning algorithms. Journal of Hydrology.
- Maclean, I. M., et al. (2018). Microclima: An R package for modelling meso- and microclimate. Methods in Ecology and Evolution.
- Garcez, A. d., & Lamb, L. C. (2020). Neurosymbolic AI: The 3rd Wave. arXiv:2012.05876.
- Luks, A. M., et al. (2019). Wilderness Medical Society Practice Guidelines for the Prevention and Treatment of Acute Altitude Illness.
- Vandal, T., et al. (2017). DeepSD: Generating high-resolution climate change projections through single image super-resolution. KDD.
See docs/thresholds.md for the full citation table per Veto threshold.
9. Roadmap
- Frontend dashboard with XAI inference log
- SQLite caching with WAL + risk-adaptive TTL
- Terrain detection engine (Valley / Slope / Flat)
- Rule-based Veto + 0-100 scoring engine (19/19 unit tests passing)
- Bilingual (EN/ZH) advice generation
- Dataset download script (Open-Meteo + Open Topo Data) + offline synthetic fallback
- Preprocessing pipeline (feature engineering + label
is_rain_event) - Random Forest training with time-based CV โ trained on real ERA5 data, test AUC = 0.871
- Model comparison (RFC vs LogReg vs XGBoost) โ thesis Chapter 5
- Hindcast validation against real Malaysian flood events
- PWA offline mode for low-network mountain use
10. License
MIT โ see LICENSE.
Developed by L.ZH @ Universiti Kebangsaan Malaysia (UKM) for the Final Year Project (FYP).