File size: 7,169 Bytes
4eefabb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
# Model Card β€” MicroClimate-X Rain Predictor (Random Forest v1.0)

> Following the *Model Card* methodology of Mitchell et al. (2019).
> Authored: 2026-05-11 Β· UKM Final Year Project Β· KyoukoLi

---

## 1. Model Details

| Field | Value |
|---|---|
| **Model name** | MicroClimate-X RF Rain Predictor |
| **Version** | 1.0.0 |
| **Architecture** | `sklearn.ensemble.RandomForestClassifier` |
| **Hyper-parameters** | `n_estimators=200, max_depth=None, class_weight='balanced', n_jobs=-1, random_state=42` |
| **Features (n=18)** | `elevation_m`, `temperature_c`, `humidity_pct`, `wind_speed_kmh`, `wind_direction_deg`, `pressure_hpa`, `dew_point_c`, `cloud_cover_pct`, `cape_jkg`, `visibility_m`, `wind_u`, `wind_v`, `hour_sin`, `hour_cos`, `month_sin`, `month_cos`, `dew_point_depression`, `pressure_change_3h`, `precipitation_lag_1h` |
| **Target** | `is_rain_event` ∈ {0, 1} β€” defined as `precipitation(t+1h) > 0.1 mm` |
| **Output** | `predict_proba(...)[:, 1]` β€” calibrated probability of rain in the next hour |
| **Author / Contact** | Li Zhenyue (`KyoukoLi`), Faculty of Information Science & Technology, UKM |
| **Licence** | MIT (see `LICENSE`) |

---

## 2. Intended Use

* **Primary use case**: terrain-aware rain-risk decision support inside the MicroClimate-X *hybrid* pipeline. The RF probability is one input among many β€” the topographic Rule Engine has *final authority* (Veto cascade + R1-R4 decision table).
* **Intended users**: hikers, drivers, construction crews, and other outdoor decision makers in complex terrain (initially Malaysian mountain regions).
* **Out-of-scope uses**:
  * Lightning forecasting (CAPE β†’ thunderstorm risk is handled by the rule engine sub-scorer, not by this model).
  * Multi-hour quantitative precipitation forecasting.
  * Aviation, marine, or any life-critical use without the Rule Engine veto layer in the loop.

---

## 3. Training Data

| Field | Value |
|---|---|
| **Source** | ECMWF ERA5 Reanalysis (via Open-Meteo Historical Archive API) |
| **Spatial coverage** | 5 mountain sites in West Malaysia (Genting, Cameron, Brinchang, Korbu, Kinabalu) |
| **Temporal coverage** | 2019-01-01 β†’ 2024-12-31 (5 years, hourly) |
| **Total rows** | 175 315 |
| **Class balance** | 29.2 % positive (rain-event), 70.8 % negative |
| **Train / test split** | Time-based; 80 % oldest β†’ train; 20 % newest β†’ test. **No random shuffling** β€” would leak temporal autocorrelation. |
| **Synthetic fallback** | `scripts/1b_synth_dataset.py` generates a physically-plausible synthetic replacement when the Open-Meteo API is unreachable. The synthetic data set has the same schema and is sufficient for end-to-end pipeline verification but should **not** be used to ship a production model. |

---

## 4. Evaluation β€” Held-out 20 % temporal test set (n = 35 063)

Numbers below come from `figures/evaluation_summary.json`, reproducible via `make evaluate`.

### 4.1 Discrimination

| Metric | Value |
|---|---|
| ROC AUC | **0.871** |
| PR Average Precision | **0.750** |
| Test-set base rate | 0.292 |

### 4.2 Calibration

| Metric | Value |
|---|---|
| Brier score | **0.138** (lower is better; 0 is perfect, 0.25 is random) |

The reliability diagram (`figures/03_calibration_curve.png`) shows the predicted probability tracks the empirical frequency closely; no post-hoc calibration (Platt / isotonic) was deemed necessary.

### 4.3 Operating point β€” safety-critical threshold

| Threshold Ο„ | F1 | F2 | Precision | Recall |
|---|---|---|---|---|
| 0.50 (default) | 0.696 | 0.694 | 0.700 | 0.692 |
| **0.20 (chosen)** | 0.621 | **0.778** | 0.466 | **0.934** |

We adopt **Ο„ = 0.20** because the application is **safety-critical**: a missed rain event (false negative) on a windward slope can cascade into orographic flash flooding. F2 weights recall 4Γ— higher than precision and is the appropriate metric for this regime (Sasaki, 2007).

### 4.4 Confusion matrix at Ο„ = 0.20

|              | Pred = 0 | Pred = 1 |
|---|---|---|
| **True = 0** | 13 877 (TN) | 10 950 (FP) |
| **True = 1** | 679 (FN)    | 9 557 (TP) |

Recall = 9 557 / (9 557 + 679) = **93.4 %** β€” the operationally important metric for "do not let people walk into a storm".

### 4.5 Top feature importances

1. `precipitation_lag_1h` β€” recent rain is by far the strongest signal (rain begets rain).
2. `hour_cos` / `hour_sin` β€” diurnal cycle (afternoon convective storms in tropical climates).
3. `pressure_change_3h` β€” falling pressure is a classical storm precursor.
4. `wind_v` β€” meridional wind component, relevant for monsoon-driven precipitation.
5. `dew_point_c` / `dew_point_depression` / `temperature_c` β€” moisture saturation indicators.

---

## 5. Quantitative Limitations

* **Geographic generalisation** β€” the model has only seen West Malaysian mountains. Hindcast validation in other tropical mountainous regions is a planned thesis Chapter 5 contribution; until then, the Rule Engine Veto cascade is the only safety net for out-of-distribution coordinates (e.g. Himalayas).
* **Convective forecasting** β€” the model uses *current-hour* features to predict *next-hour* rain. Forecasting horizon > 1 h would degrade accuracy substantially.
* **Class imbalance** β€” addressed via `class_weight='balanced'` and the F2-optimal threshold, but precision at Ο„ = 0.20 is moderate (47 %). False positives are tolerable because they only inflate the *rainfall sub-score*; the composite-score formula combines this with three other hazards.
* **Calibration drift** β€” Brier = 0.138 in 2024 hold-out. Calibration should be re-checked annually as climate signals shift.

---

## 6. Ethical / Safety Considerations

* **Decision-support only.** The system is explicitly **not** a substitute for official meteorological forecasts; the disclaimer is shown in every UI footer.
* **Hidden risk surfaced, not hidden.** The R1 decision-table rule deliberately raises an alarm when *macro* model probability is low but local terrain inputs suggest hidden orographic rain β€” this is the OPPOSITE of the harmful failure mode where ML over-confidently says "safe".
* **Mt-Everest test (worst-case OOD).** When fed coordinates the model has never seen, the RF returns ~0 % rain probability β€” and the Rule Engine then immediately vetoes on `altitude_hypoxia + extreme_cold + gale_wind`. See `tests/test_rule_engine.py::test_mt_everest_veto_hypoxia`.

---

## 7. Reproducibility

```bash
# Full pipeline from scratch β€” works offline via the synthetic dataset.
make install-dev
make synth          # OR: download real data via scripts/1_download_dataset.py
make preprocess
make train
make evaluate       # writes figures/*.png + figures/evaluation_summary.json
```

The seed is fixed (`random_state=42`) and figures are written to `figures/` so the thesis can pull them in directly.

---

## 8. Citation

If you reference this model in academic work, please cite:

> Li Zhenyue (KyoukoLi). *MicroClimate-X: A Hybrid Microclimate Risk Engine for Complex Terrain*. Bachelor's Thesis, Universiti Kebangsaan Malaysia, Faculty of Information Science & Technology, 2026. GitHub: <https://github.com/KyoukoLi/microclimate-x>