# Project pipeline order — "App is the last"
# 项目流程顺序 —— "App 放在最后"

> Direct response to supervisor feedback 4/15: "First identify a dataset.
> And then train the model. And then predict it. Once everything is
> finished, you can develop the app. App is the last."
>
> 4/15 导师反馈直接回应：先 dataset，再 model，再 predict，最后才是 app。

---

## Current state (May 2026) / 当前状态（2026 年 5 月）

```
┌──────────────────────────────────────────────────────────────────────┐
│ STEP 1 — DATASET                                            ✅ DONE  │
│ ────────────────────────────────────────────                         │
│ Source       : Open-Meteo Historical Archive (ECMWF ERA5)            │
│ Coverage     : 5 Malaysian mountain sites, 5 years hourly            │
│ Rows         : 175 315                                               │
│ Target Y     : is_rain_event ∈ {0, 1}  (next-hour rain > 0.1 mm)     │
│ Code         : scripts/{1_download, 1b_synth, 2_preprocess}.py        │
│ Documentation: docs/dataset.md                                       │
└──────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌──────────────────────────────────────────────────────────────────────┐
│ STEP 2 — MODEL TRAINING                                     ✅ DONE  │
│ ────────────────────────────────────────────                         │
│ Algorithm    : Random Forest, class_weight='balanced'                │
│ Split        : Time-based, last 20% chronological holdout            │
│ CV           : 5-fold TimeSeriesSplit on training portion            │
│ Test results : ROC AUC 0.871 · PR AP 0.750 · Brier 0.138             │
│ Operating pt : τ = 0.20  →  F2 = 0.778, Recall = 0.934               │
│ Code         : scripts/3_train_model.py                              │
│ Documentation: models/MODEL_CARD.md                                  │
└──────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌──────────────────────────────────────────────────────────────────────┐
│ STEP 3 — MODEL EVALUATION                                   ✅ DONE  │
│ ────────────────────────────────────────────                         │
│ Figures      : 6 publication-quality PNGs in figures/                │
│   01_roc_curve.png         · ROC + AUC                               │
│   02_pr_curve.png          · Precision-Recall + AP                   │
│   03_calibration_curve.png · Reliability + Brier                     │
│   04_threshold_sweep.png   · F1/F2/Precision/Recall vs threshold     │
│   05_feature_importance.png· Top-20 features                         │
│   06_confusion_matrix.png  · CM at F2-optimal threshold              │
│ Summary      : figures/evaluation_summary.json                       │
│ Code         : scripts/4_evaluate_model.py                           │
└──────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌──────────────────────────────────────────────────────────────────────┐
│ STEP 4 — RULE ENGINE (D5 proposal §3.7 P4.1-P4.6)           ✅ DONE  │
│ ────────────────────────────────────────────                         │
│ P4.1 Load dynamic risk rules  → backend/config.py                    │
│ P4.2 Fetch user context        → ?activity= query parameter          │
│ P4.3 Evaluate environmental    → 4 score_*_risk() functions          │
│         risks (rainfall, fog, wind gust, thunderstorm)               │
│ §3.7.2  Decision table R1-R4   → apply_decision_table_3_7_2()        │
│ Veto cascade                   → _collect_veto_triggers()            │
│ P4.4 Activity weighting        → apply_activity_weighting()          │
│ P4.5 Composite risk score      → dominant-hazard + secondary         │
│ P4.6 Actionable advice         → _normal_advice / _veto_advice       │
│ Code         : backend/rule_engine.py                                │
│ Documentation: docs/architecture.md, docs/thresholds.md              │
└──────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌──────────────────────────────────────────────────────────────────────┐
│ STEP 5 — APP (LAST, as instructed)                          ✅ DONE  │
│ ────────────────────────────────────────────                         │
│ Backend     : FastAPI + uvicorn — wraps trained model from Step 2    │
│                + rule engine from Step 4                             │
│ Frontend    : Vue 3 SPA — bilingual EN/ZH, 4 mini-gauges,            │
│                R1-R4 indicators, demo scenarios, error toasts        │
│ Container   : Multi-stage Dockerfile + docker-compose.yml            │
│ Tests       : 70 tests, 97% backend coverage                         │
│ CI          : .github/workflows/ci.yml                               │
└──────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌──────────────────────────────────────────────────────────────────────┐
│ STEP 6 — EVALUATION FOR THESIS CHAPTER 5                    🔄 PLAN  │
│ ────────────────────────────────────────────                         │
│ 6a · Hindcast validation against NaDMA flood / landslide archives    │
│ 6b · Small user study with mountain hikers (1-month panel)           │
│ 6c · Comparative ablation: RF only vs Rule only vs Hybrid            │
│ 6d · Threshold sensitivity analysis (τ ∈ {0.10, 0.15, 0.20, 0.25})   │
└──────────────────────────────────────────────────────────────────────┘
```

## Reading order for the supervisor / 给导师过的阅读顺序

When walking the supervisor through the project, **strictly follow Steps 1 → 5**:

| # | Open this | Spend |
|---|---|---|
| 1 | `docs/dataset.md` §4 schema, §5 Y derivation | 60 s |
| 2 | `figures/01_roc_curve.png` + `figures/03_calibration_curve.png` | 30 s |
| 3 | `figures/04_threshold_sweep.png` + `figures/05_feature_importance.png` | 60 s |
| 4 | `docs/architecture.md` §"Engine B internals" — show P4.1→P4.6 mapping | 60 s |
| 5 | `frontend/index.html` running locally — demo with the Genting & Everest scenarios | 60-90 s |

Total ≈ 5 minutes before any Q&A. App is opened **last** as agreed.

按这个顺序给导师过，**严格按 1→5**，整体大概 5 分钟过完再进入 Q&A。**app 一定放最后开**，跟导师上次说的完全一致。