microclimate-x / models /MODEL_CARD.md
W1nd5pac's picture
Deploy 2026-05-20T06:52:08Z β€” 11e81c5
4eefabb verified
# Model Card β€” MicroClimate-X Rain Predictor (Random Forest v1.0)
> Following the *Model Card* methodology of Mitchell et al. (2019).
> Authored: 2026-05-11 Β· UKM Final Year Project Β· KyoukoLi
---
## 1. Model Details
| Field | Value |
|---|---|
| **Model name** | MicroClimate-X RF Rain Predictor |
| **Version** | 1.0.0 |
| **Architecture** | `sklearn.ensemble.RandomForestClassifier` |
| **Hyper-parameters** | `n_estimators=200, max_depth=None, class_weight='balanced', n_jobs=-1, random_state=42` |
| **Features (n=18)** | `elevation_m`, `temperature_c`, `humidity_pct`, `wind_speed_kmh`, `wind_direction_deg`, `pressure_hpa`, `dew_point_c`, `cloud_cover_pct`, `cape_jkg`, `visibility_m`, `wind_u`, `wind_v`, `hour_sin`, `hour_cos`, `month_sin`, `month_cos`, `dew_point_depression`, `pressure_change_3h`, `precipitation_lag_1h` |
| **Target** | `is_rain_event` ∈ {0, 1} β€” defined as `precipitation(t+1h) > 0.1 mm` |
| **Output** | `predict_proba(...)[:, 1]` β€” calibrated probability of rain in the next hour |
| **Author / Contact** | Li Zhenyue (`KyoukoLi`), Faculty of Information Science & Technology, UKM |
| **Licence** | MIT (see `LICENSE`) |
---
## 2. Intended Use
* **Primary use case**: terrain-aware rain-risk decision support inside the MicroClimate-X *hybrid* pipeline. The RF probability is one input among many β€” the topographic Rule Engine has *final authority* (Veto cascade + R1-R4 decision table).
* **Intended users**: hikers, drivers, construction crews, and other outdoor decision makers in complex terrain (initially Malaysian mountain regions).
* **Out-of-scope uses**:
* Lightning forecasting (CAPE β†’ thunderstorm risk is handled by the rule engine sub-scorer, not by this model).
* Multi-hour quantitative precipitation forecasting.
* Aviation, marine, or any life-critical use without the Rule Engine veto layer in the loop.
---
## 3. Training Data
| Field | Value |
|---|---|
| **Source** | ECMWF ERA5 Reanalysis (via Open-Meteo Historical Archive API) |
| **Spatial coverage** | 5 mountain sites in West Malaysia (Genting, Cameron, Brinchang, Korbu, Kinabalu) |
| **Temporal coverage** | 2019-01-01 β†’ 2024-12-31 (5 years, hourly) |
| **Total rows** | 175 315 |
| **Class balance** | 29.2 % positive (rain-event), 70.8 % negative |
| **Train / test split** | Time-based; 80 % oldest β†’ train; 20 % newest β†’ test. **No random shuffling** β€” would leak temporal autocorrelation. |
| **Synthetic fallback** | `scripts/1b_synth_dataset.py` generates a physically-plausible synthetic replacement when the Open-Meteo API is unreachable. The synthetic data set has the same schema and is sufficient for end-to-end pipeline verification but should **not** be used to ship a production model. |
---
## 4. Evaluation β€” Held-out 20 % temporal test set (n = 35 063)
Numbers below come from `figures/evaluation_summary.json`, reproducible via `make evaluate`.
### 4.1 Discrimination
| Metric | Value |
|---|---|
| ROC AUC | **0.871** |
| PR Average Precision | **0.750** |
| Test-set base rate | 0.292 |
### 4.2 Calibration
| Metric | Value |
|---|---|
| Brier score | **0.138** (lower is better; 0 is perfect, 0.25 is random) |
The reliability diagram (`figures/03_calibration_curve.png`) shows the predicted probability tracks the empirical frequency closely; no post-hoc calibration (Platt / isotonic) was deemed necessary.
### 4.3 Operating point β€” safety-critical threshold
| Threshold Ο„ | F1 | F2 | Precision | Recall |
|---|---|---|---|---|
| 0.50 (default) | 0.696 | 0.694 | 0.700 | 0.692 |
| **0.20 (chosen)** | 0.621 | **0.778** | 0.466 | **0.934** |
We adopt **Ο„ = 0.20** because the application is **safety-critical**: a missed rain event (false negative) on a windward slope can cascade into orographic flash flooding. F2 weights recall 4Γ— higher than precision and is the appropriate metric for this regime (Sasaki, 2007).
### 4.4 Confusion matrix at Ο„ = 0.20
| | Pred = 0 | Pred = 1 |
|---|---|---|
| **True = 0** | 13 877 (TN) | 10 950 (FP) |
| **True = 1** | 679 (FN) | 9 557 (TP) |
Recall = 9 557 / (9 557 + 679) = **93.4 %** β€” the operationally important metric for "do not let people walk into a storm".
### 4.5 Top feature importances
1. `precipitation_lag_1h` β€” recent rain is by far the strongest signal (rain begets rain).
2. `hour_cos` / `hour_sin` β€” diurnal cycle (afternoon convective storms in tropical climates).
3. `pressure_change_3h` β€” falling pressure is a classical storm precursor.
4. `wind_v` β€” meridional wind component, relevant for monsoon-driven precipitation.
5. `dew_point_c` / `dew_point_depression` / `temperature_c` β€” moisture saturation indicators.
---
## 5. Quantitative Limitations
* **Geographic generalisation** β€” the model has only seen West Malaysian mountains. Hindcast validation in other tropical mountainous regions is a planned thesis Chapter 5 contribution; until then, the Rule Engine Veto cascade is the only safety net for out-of-distribution coordinates (e.g. Himalayas).
* **Convective forecasting** β€” the model uses *current-hour* features to predict *next-hour* rain. Forecasting horizon > 1 h would degrade accuracy substantially.
* **Class imbalance** β€” addressed via `class_weight='balanced'` and the F2-optimal threshold, but precision at Ο„ = 0.20 is moderate (47 %). False positives are tolerable because they only inflate the *rainfall sub-score*; the composite-score formula combines this with three other hazards.
* **Calibration drift** β€” Brier = 0.138 in 2024 hold-out. Calibration should be re-checked annually as climate signals shift.
---
## 6. Ethical / Safety Considerations
* **Decision-support only.** The system is explicitly **not** a substitute for official meteorological forecasts; the disclaimer is shown in every UI footer.
* **Hidden risk surfaced, not hidden.** The R1 decision-table rule deliberately raises an alarm when *macro* model probability is low but local terrain inputs suggest hidden orographic rain β€” this is the OPPOSITE of the harmful failure mode where ML over-confidently says "safe".
* **Mt-Everest test (worst-case OOD).** When fed coordinates the model has never seen, the RF returns ~0 % rain probability β€” and the Rule Engine then immediately vetoes on `altitude_hypoxia + extreme_cold + gale_wind`. See `tests/test_rule_engine.py::test_mt_everest_veto_hypoxia`.
---
## 7. Reproducibility
```bash
# Full pipeline from scratch β€” works offline via the synthetic dataset.
make install-dev
make synth # OR: download real data via scripts/1_download_dataset.py
make preprocess
make train
make evaluate # writes figures/*.png + figures/evaluation_summary.json
```
The seed is fixed (`random_state=42`) and figures are written to `figures/` so the thesis can pull them in directly.
---
## 8. Citation
If you reference this model in academic work, please cite:
> Li Zhenyue (KyoukoLi). *MicroClimate-X: A Hybrid Microclimate Risk Engine for Complex Terrain*. Bachelor's Thesis, Universiti Kebangsaan Malaysia, Faculty of Information Science & Technology, 2026. GitHub: <https://github.com/KyoukoLi/microclimate-x>