--- title: MicroClimate-X emoji: ๐ŸŒง๏ธ colorFrom: blue colorTo: green sdk: docker app_port: 8000 pinned: false license: mit short_description: Hybrid microclimate risk for complex terrain (FYP demo) --- # MicroClimate-X > Intelligent Meteorological Analysis System for Complex Terrain > ้ขๅ‘ๅคๆ‚ๅœฐๅฝข็š„ๆ™บ่ƒฝๆฐ”่ฑกๅˆ†ๆž็ณป็ปŸ > **Live demo / ๅœจ็บฟๆผ”็คบ**: > (Deployed as a Hugging Face Space โ€” Docker SDK. See [`docs/DEPLOY_HF.md`](docs/DEPLOY_HF.md) for the deployment recipe.) ![CI](https://github.com/KyoukoLi/microclimate-x/actions/workflows/ci.yml/badge.svg) ![Python](https://img.shields.io/badge/Python-3.9%20%7C%203.11%20%7C%203.12-blue) ![FastAPI](https://img.shields.io/badge/FastAPI-0.110%2B-009688) ![Vue3](https://img.shields.io/badge/Vue.js-3-4FC08D) ![ML](https://img.shields.io/badge/ML-RandomForest-orange) ![Coverage](https://img.shields.io/badge/coverage-97%25-brightgreen) ![Tests](https://img.shields.io/badge/tests-70%20passing-success) ![Docker](https://img.shields.io/badge/Docker-multi--stage-2496ED?logo=docker&logoColor=white) ![License](https://img.shields.io/badge/License-MIT-green) A Final Year Project at **Universiti Kebangsaan Malaysia (UKM)** โ€” Faculty of Information Science & Technology. ### For thesis supervisors / ๅฏผๅธˆ้˜…่ฏป่ทฏๅพ„ | Step | Document | What it shows | |---|---|---| | 1. Dataset | [`docs/dataset.md`](docs/dataset.md) | Source ยท schema ยท **Y derivation** ยท train/test split | | 2. Model | [`models/MODEL_CARD.md`](models/MODEL_CARD.md) | Intended use ยท metrics ยท limitations ยท ethics | | 3. Evaluation | [`figures/`](figures/) + [`figures/evaluation_summary.json`](figures/evaluation_summary.json) | 6 publication figures, all reproducible via `make evaluate` | | 4. Architecture | [`docs/architecture.md`](docs/architecture.md) + [`docs/thresholds.md`](docs/thresholds.md) | Hybrid engine, every threshold cited | | 5. Pipeline order | [`docs/pipeline_order.md`](docs/pipeline_order.md) | Explicit "dataset โ†’ model โ†’ app" sequence | | 6. Meeting brief | [`docs/supervisor_meeting_brief.md`](docs/supervisor_meeting_brief.md) | Detailed bilingual EN/ZH script | | 7. **Cheat sheet** | [`docs/MEETING_CHEAT_SHEET.md`](docs/MEETING_CHEAT_SHEET.md) ยท [HTML](docs/MEETING_CHEAT_SHEET.html) | **Open on screen during the meeting** โ€” tab-order ยท demo script ยท Q&A ยท checklist | --- ## 1. Problem Statement / ็—›็‚น Traditional weather forecasting relies on **macro-scale grids (20 km ร— 20 km)** that fail catastrophically in complex terrain. A single forecast cell may cover a mountain peak, a valley floor, and a windward slope โ€” all of which have vastly different microclimates. ไผ ็ปŸๅคฉๆฐ”้ข„ๆŠฅไฝฟ็”จ **20 km ร— 20 km ๅฎ่ง‚็ฝ‘ๆ ผ**๏ผŒๅœจๅฑฑๅŒบไผšไธฅ้‡ๅคฑ็œŸใ€‚ๅŒไธ€็ฝ‘ๆ ผๅ†…ๅฏ่ƒฝๅŒๆ—ถๅŒ…ๅซๅฑฑ้กถใ€่ฐทๅบ•ๅ’Œ่ฟŽ้ฃŽๅก๏ผŒไฝ†ๅฎƒไปฌ็š„ๅพฎๆฐ”ๅ€™ๅฎŒๅ…จไธๅŒใ€‚ ## 2. Solution: The Hybrid Engine / ่งฃๅ†ณๆ–นๆกˆ MicroClimate-X uses a **dual-engine hybrid architecture** combining a Machine Learning predictor with a topographic Rule-Based Expert System. ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ User clicks a coordinate on the map (lat, lon) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Open-Meteo (weather) + Open Topo Data (DEM) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Engine A โ”‚ โ”‚ Engine B โ”‚ โ”‚ Random Forest โ”‚ probabilityโ”‚ Topographic Rules โ”‚ โ”‚ (in-distribution โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚ + Veto Triggers โ”‚ โ”‚ rain probability) โ”‚ โ”‚ (safety-critical) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Risk Score 0-100 โ”‚ โ”‚ + Bilingual Advice โ”‚ โ”‚ + XAI Inference Log โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` ### Why Hybrid? / ไธบไป€ไนˆๆททๅˆ๏ผŸ Pure ML can fail catastrophically out-of-distribution. Example: feed Mount Everest coordinates โ†’ ML predicts 0% rain โ†’ returns "Safe" โ€” ignoring -30ยฐC, hypoxia, gale-force winds. **Engine B's Veto mechanism** provides bounded safety guarantees by overriding the ML score when physical thresholds are breached. This follows the **Neuro-Symbolic AI** paradigm (Garcez & Lamb, 2020). ### Engine B internals โ€” one-to-one with D5 proposal ยง3.7 / P4 The rule engine is decomposed exactly along the lines of the thesis proposal so every line of code maps to a section number: | Proposal step | Code | Output | |---|---|---| | **P4.1** Load Dynamic Risk Rules | `backend/config.py` | All thresholds, weights, and the R1-R4 decision table, each annotated with its academic citation | | **P4.2** Fetch User Context | `?activity=hiker\|driver\|construction\|general` | Activity is plumbed into the request flow | | **P4.3** Evaluate Environmental Risks | Four `score_*_risk()` functions in `rule_engine.py` | Rainfall / Fog / Wind-gust / Thunderstorm sub-scores (each 0-100) | | **ยง3.7.2 Table 4.2** Decision Table | `apply_decision_table_3_7_2()` | Which of R1-R4 fired (hidden rain / no amplification / heavy downpour / standard rain) | | Veto cascade | `_collect_veto_triggers()` | Life-safety overrides (Mt-Everest type) โ€” capped at 100 | | **P4.4** Activity weighting | `apply_activity_weighting()` | (activity ร— hazard) weight matrix | | **P4.5** Composite score | Same | `0.80 ยท max(weighted) + 0.20 ยท mean(rest)` โ€” dominant hazard wins | | **P4.6** Actionable advice | `_normal_advice()` / `_veto_advice()` | Bilingual EN/ZH paragraph that names the dominant hazard | Four hazard categories surfaced in the UI as four mini-gauges; the four R1-R4 indicators light up beside the score card whenever a rule fires. ## 3. Tech Stack / ๆŠ€ๆœฏๆ ˆ | Layer | Technology | |---|---| | Frontend | Vue 3 (CDN) + Tailwind CSS + Leaflet.js + ECharts | | Backend | Python 3.10+, FastAPI, Uvicorn | | ML | Scikit-Learn (Random Forest), Pandas, NumPy | | Storage | SQLite 3 (WAL mode, risk-adaptive TTL cache) | | External | Open-Meteo Historical Archive (ERA5), Open Topo Data (SRTM DEM) | ## 4. Dataset / ๆ•ฐๆฎ้›† - **Source**: [Open-Meteo Historical Weather API](https://open-meteo.com/en/docs/historical-weather-api) (ERA5 reanalysis) - **Region**: Malaysian mountain areas (Genting Highlands, Cameron Highlands, Fraser's Hill, Klang Valley, Mount Kinabalu region) - **Time Range**: 2020-01-01 to 2023-12-31 (hourly resolution, 5 sites ร— ~35 000 hours each) - **Features (X)**: `elevation_m`, `temperature_c`, `humidity_pct`, `wind_speed_kmh`, `wind_direction_deg`, `surface_pressure_hpa` - **Target (Y)**: `is_rain_event` โ€” binary, 1 if `precipitation(t+1h) > 0.1 mm` else 0 (per WMO trace-precipitation definition) ## 5. Quick Start / ๅฟซ้€Ÿๅผ€ๅง‹ ```bash git clone https://github.com/KyoukoLi/microclimate-x.git cd microclimate-x # Fast path โ€” everything via the Makefile make install-dev # 1. create venv + install runtime + dev deps make synth # 2. generate synthetic dataset (offline) # โ€ฆor `make` nothing here and run `python scripts/1_download_dataset.py` # to fetch real ERA5 data when network is available. make preprocess # 3. feature engineering + Y derivation make train # 4. RF training + time-based CV make evaluate # 5. ROC / PR / calibration / threshold-sweep figures make run # 6. uvicorn dev server on http://localhost:8000 # Then open frontend/index.html (or browse to http://localhost:8000/app/) ``` ### Docker one-liner ```bash docker compose up --build # API lives on http://localhost:8000 ยท frontend on http://localhost:8000/app/ ``` ### Test it ```bash make test # 70 tests, ~12 s make lint # ruff โ€” zero errors expected ``` ### Training results on real ERA5 data / ็œŸๅฎž ERA5 ๆ•ฐๆฎ่ฎญ็ปƒ็ป“ๆžœ Trained on **175 315 hourly samples** from Open-Meteo Historical Archive (ECMWF ERA5 reanalysis) covering five Malaysian mountain sites, 2020-01-01 โ†’ 2024-12-31. Time-based split: last 20 % per site held out (n = 35 063 test samples). See [`models/MODEL_CARD.md`](models/MODEL_CARD.md) for the full evaluation card and `figures/` for publication-ready plots. | Metric | Value | Source | |---|---|---| | Test ROC AUC | **0.871** | `figures/01_roc_curve.png` | | Test PR Average Precision | **0.750** | `figures/02_pr_curve.png` | | Brier score (calibration) | **0.138** | `figures/03_calibration_curve.png` | | Best F2 @ ฯ„ = 0.20 | **0.778** | `figures/04_threshold_sweep.png` | | Recall (at chosen ฯ„ = 0.20) | **0.934** โ€” safety-critical recall | | Class balance | 29.2 % positive (Malaysian mountain climatology) | We deliberately operate at **ฯ„ = 0.20**, not the default 0.50, because in safety-critical settings a missed rain event (false negative) on a windward slope is dramatically worse than a false positive. F2 score weights recall 4ร— higher than precision and is the principled metric for this regime. **5-fold time-series CV** on the training fold gives AUC ranging 0.828-0.908 (mean โ‰ˆ 0.858), confirming the model is not over-fitting a single temporal slice. #### Feature importance โ€” what the model actually learned | Rank | Feature | Importance | Interpretation | |---|---|---|---| | 1 | `precipitation_lag_1h` | 37.1 % | Rain autocorrelation โ€” the well-documented "rain begets rain" persistence signal in short-term nowcasting (Wilson et al., 2010). | | 2-3 | `hour_cos`, `hour_sin` | 18.6 % | Diurnal convective cycle โ€” Malaysian mountain rainfall peaks in late afternoon. | | 4 | `pressure_change_3h` | 4.7 % | Falling pressure precedes incoming storms โ€” the classical synoptic-scale precursor. | | 5-6 | `wind_v`, `dew_point_c` | 8.1 % | Moisture transport + saturation potential. | | 7-14 | other meteorological X | 22 % | T, humidity, cloud cover, wind, dew-point depression, pressure. | | 15-17 | `month_*`, `elevation_m` | 4 % | Low because the time-of-day and lag features already absorb most of the seasonal/static signal. | | 18 | `cape_jkg` | **0.0 %** | โš ๏ธ ERA5 archive CAPE values for these coordinates are predominantly zero โ€” a known coverage gap. The Veto-rule engine still uses CAPE thresholds directly from the live Open-Meteo forecast at inference time. | #### Why F2 instead of accuracy? Accuracy is misleading on imbalanced safety-critical classification. A model that predicts "no rain" 100 % of the time achieves **69.2 % accuracy** here while being completely useless. F2 weights recall twice as heavily as precision, which is correct for a hiker-safety app where missing a real rain event (False Negative) is far worse than a false alarm (False Positive). See `models/training_report.json` for the full 5-fold CV report. ## 6. Project Structure / ้กน็›ฎ็ป“ๆž„ ``` microclimate-x/ โ”œโ”€โ”€ backend/ โ”‚ โ”œโ”€โ”€ main.py # FastAPI app + lifespan โ”‚ โ”œโ”€โ”€ ml_engine.py # Loads RF model, predict_proba โ”‚ โ”œโ”€โ”€ rule_engine.py # Veto rules + risk scoring + bilingual advice โ”‚ โ”œโ”€โ”€ terrain.py # DEM-based Valley/Slope/Flat classification โ”‚ โ”œโ”€โ”€ cache.py # SQLite WAL cache, risk-adaptive TTL โ”‚ โ”œโ”€โ”€ schemas.py # Pydantic request/response models โ”‚ โ””โ”€โ”€ config.py # Thresholds + academic citations โ”œโ”€โ”€ scripts/ โ”‚ โ”œโ”€โ”€ 1_download_dataset.py # Open-Meteo + Open-Topo-Data (real ERA5) โ”‚ โ”œโ”€โ”€ 1b_synth_dataset.py # physically-plausible offline fallback โ”‚ โ”œโ”€โ”€ 2_preprocess.py โ”‚ โ””โ”€โ”€ 3_train_model.py โ”œโ”€โ”€ frontend/ โ”‚ โ””โ”€โ”€ index.html # Single-file Vue3 SPA โ”œโ”€โ”€ docs/ โ”‚ โ”œโ”€โ”€ architecture.md โ”‚ โ””โ”€โ”€ thresholds.md # Veto thresholds with academic citations โ”œโ”€โ”€ tests/ โ”‚ โ””โ”€โ”€ test_rule_engine.py โ”œโ”€โ”€ data/ # raw/processed CSVs (gitignored) โ”œโ”€โ”€ models/ # trained .pkl artifacts (gitignored) โ””โ”€โ”€ requirements.txt ``` ## 7. Key Design Decisions / ๅ…ณ้”ฎ่ฎพ่ฎก | Decision | Rationale | |---|---| | **Random Forest over SVM / Deep Learning** | Handles non-linear weather-terrain interactions; outputs interpretable feature importance; no GPU needed; robust on tabular data | | **Binary classification (`is_rain_event`)** | One-hour-ahead nowcasting matches the use case (hikers' immediate decisions) | | **Time-based train/test split** | Random split would leak temporal correlation โ†’ inflated metrics | | **Class-weight balanced** | Rain is the minority class (~25% in Malaysian mountains) | | **Wind direction as u/v components** | Raw degrees treat 0ยฐ and 360ยฐ as far apart โ€” mathematically incorrect | | **Risk-adaptive cache TTL** | High-risk scenarios refresh faster (60 s) than safe ones (600 s) | | **SQLite WAL mode** | Allows concurrent reads during writes โ€” critical for FastAPI async | ## 8. Academic References / ๅญฆๆœฏๅ‚่€ƒ 1. **Bhuiyan, M. A. E., et al.** (2020). *Improving satellite-based precipitation estimates over complex terrain using machine learning algorithms*. **Journal of Hydrology**. 2. **Maclean, I. M., et al.** (2018). *Microclima: An R package for modelling meso- and microclimate*. **Methods in Ecology and Evolution**. 3. **Garcez, A. d., & Lamb, L. C.** (2020). *Neurosymbolic AI: The 3rd Wave*. arXiv:2012.05876. 4. **Luks, A. M., et al.** (2019). *Wilderness Medical Society Practice Guidelines for the Prevention and Treatment of Acute Altitude Illness*. 5. **Vandal, T., et al.** (2017). *DeepSD: Generating high-resolution climate change projections through single image super-resolution*. **KDD**. See `docs/thresholds.md` for the full citation table per Veto threshold. ## 9. Roadmap - [x] Frontend dashboard with XAI inference log - [x] SQLite caching with WAL + risk-adaptive TTL - [x] Terrain detection engine (Valley / Slope / Flat) - [x] Rule-based Veto + 0-100 scoring engine (19/19 unit tests passing) - [x] Bilingual (EN/ZH) advice generation - [x] Dataset download script (Open-Meteo + Open Topo Data) + offline synthetic fallback - [x] Preprocessing pipeline (feature engineering + label `is_rain_event`) - [x] Random Forest training with time-based CV โ€” **trained on real ERA5 data, test AUC = 0.871** - [ ] Model comparison (RFC vs LogReg vs XGBoost) โ€” thesis Chapter 5 - [ ] Hindcast validation against real Malaysian flood events - [ ] PWA offline mode for low-network mountain use ## 10. License MIT โ€” see `LICENSE`. --- *Developed by L.ZH @ Universiti Kebangsaan Malaysia (UKM) for the Final Year Project (FYP).*