Supervisor Meeting Cheat Sheet 导师开会一页通 — MicroClimate-X 答辩准备

📅 2026-05-11 🎓 UKM FYP 🏛️ KyoukoLi/microclimate-x ✅ CI passing · 97% coverage · 70 tests

How to use this cheat sheet · 怎么用这份小抄

Keep this open on screen during the meeting. Don't read it aloud — glance at the relevant section when needed. Every key sentence is provided in both English and Chinese so you can default to whichever the supervisor speaks at that moment.

开会时打开在屏幕上做兜底。不要照念——需要时扫一眼对应小节。所有关键句子都给了中英对照，老师用什么语言你就用什么语言。

0 · Before the meeting (10 min before) 会前 10 分钟准备

Run these in a terminal, in order. Do not skip any.
在终端按顺序执行，一条都不能少：

cd ~/Projects/microclimate-x

# 1. Pull latest + verify clean working tree
git pull && git status        # should print "working tree clean"

# 2. Start the backend (leave running)
make run                      # uvicorn boots on http://localhost:8000

# 3. In a NEW terminal: verify API is alive + model is loaded
curl -s http://localhost:8000/api/health | python3 -m json.tool
# expect: "status": "ok", "ml_loaded": true

Browser tabs — open in this exact order / 浏览器按顺序开标签页

#	URL	Purpose
1	`file:///…/docs/MEETING_CHEAT_SHEET.html`	This cheat sheet (safety net)
2	`github.com/KyoukoLi/microclimate-x`	Green CI badge
3	`docs/dataset.md`	For Concern #1 + #2
4	`figures/01_roc_curve.png`	Concern #4 — ML metrics
5	`figures/03_calibration_curve.png`	Calibration
6	`figures/04_threshold_sweep.png`	F2 threshold
7	`figures/05_feature_importance.png`	What model learned
8	`docs/architecture.md`	Rule engine deep-dive
9	`http://localhost:8000/app/`	THE APP — OPEN LAST
10	`models/MODEL_CARD.md`	Q&A backup

🚨 Tab 9 must be opened LAST. If you accidentally show the app first, the supervisor will instantly remember last meeting's complaint ("app is last") and you lose credibility before you've said a word.
🚨 标签 9（app）一定要最后打开。不小心先打开 app，老师会立刻想起上次 "app is last" 的批评——还没开口就掉分。

1 · Opening (30 seconds) 开场 30 秒

"Sir, since our last meeting I have addressed every point of your feedback. May I walk you through them in the correct order — dataset first, then model, then app — as you instructed?"

"老师，按您上次反馈，我已经把每一条都改了。我按您要求的顺序——先 dataset，再 model，最后才是 app——给您过一遍可以吗？"

Why this works · 为什么有效: it directly quotes his words back to him. Watch him relax immediately.
直接复述了他自己的话——看着他立刻放松。

2 · Concern #1 — "Y is missing" 反馈一 · Y 列缺失

"Y is missing. I don't have the output variable. If you don't have target, you cannot train a machine learning model."

On screen → Tab 3 (`docs/dataset.md`) → §5 Target label derivation

df['is_rain_event'] = (df['precipitation'].shift(-1) > 0.1).astype(int)

"Sir, you were right — the raw Open-Meteo CSV has no Y column. I have engineered the target explicitly. The variable is `is_rain_event`: 1 if precipitation in the next hour exceeds 0.1 mm, else 0."	"老师您说得对，原始 CSV 没有 Y 列。我现在显式构造了目标变量 `is_rain_event`——下一小时降雨量 > 0.1 mm 则为 1，否则为 0。"
"Three things: (1) `.shift(-1)` uses future rain as label — features at hour t predict outcome at t+1h, so no temporal data leakage."	"三个要点：(1) `.shift(-1)` 表示用下一小时的降雨作标签，特征是 t 时刻、预测 t+1 小时——无时间泄漏。"
(2) "0.1 mm matches the WMO definition of trace precipitation — not an arbitrary choice."	(2) "0.1 mm 这个阈值不是我随便定的，对应 WMO 微量降水标准。"
(3) "It is binary classification, not regression, because the downstream user decision is binary — go or no-go."	(3) "是二分类不是回归，因为下游用户决策本身就是二元的——去 / 不去。"

3 · Concern #2 — "Features don't match Excel" 反馈二 · 文档特征和 CSV 列名对不上

"The features that you presented here, not... not mentioned in the Excel. So, it must be matched."

On screen → stay on Tab 3 → scroll up to §4 Schema

"Sir, that was also fair. I have rewritten the dataset specification so the documentation lists exactly the same column names as the CSV. One-to-one mapping in §4."	"老师，这条您也说得对。我已经重写了数据集文档——文档列出的就是 CSV 里的真实列名，一一对应，就在第 4 节。"
"Every row is one CSV column. The 'role' column says whether it is a feature (X), the target (Y), or metadata."	"表里每一行就是 CSV 一列，role 列写明它是 feature（X）、target（Y）还是 metadata。"

4 · Concern #3 — "Study the data source" 反馈三 · 研究数据源本身

"Please study the link. What is the purpose of the dataset? What is design for? What is the output variable?"

On screen → stay on Tab 3 → scroll up to §1-3

"I read Open-Meteo's documentation carefully. The dataset is the ERA5 reanalysis archive — ECMWF's gold-standard hourly reanalysis."	"我把 Open-Meteo 文档仔细读了。我用的是 ERA5 再分析数据，ECMWF 出的金标准同化产品。"
"It is not a forecast — it is a physically-consistent reconstruction of past weather. ECMWF themselves use ERA5 to validate other forecast models. That makes it the right dataset for ML training: reliable ground-truth labels."	"它不是预报，是对过去天气的物理一致重建。ECMWF 自己拿 ERA5 去校验别的预报模型——所以训练 ML 是合适的，标签是可靠的 ground truth。"
Spatial: 5 Malaysian mountain sites — Genting, Cameron, Fraser's Hill, Klang Valley, Kinabalu — elevations 100 m to 1865 m, terrain from valley to slope.	空间：5 个马来西亚山地点位——云顶、金马仑、福隆港、巴生谷、神山——海拔 100 m – 1865 m，地形从山谷到山坡。
Temporal: 5 years, hourly, 175 315 rows total.	时间：5 年，每小时一行，总共 175 315 行。

5 · Concern #4 — "App is the last" 反馈四 · App 最后做（最重要！）

"First identify a dataset. And then train the model. And then predict it. Once everything is finished, you can develop the app. App is the last."

🚨 This is the most important section. Pace yourself — 2-3 min total. Don't open the app until the end.
🚨 这是最重要的一节。控制节奏，总共 2-3 分钟。不要提前打开 app。

→ Tab 4 (figures/01_roc_curve.png)

"Step 2, model training. Test ROC AUC is 0.871 on 35 063 held-out hourly samples. Hold-out is the last 20 % chronologically, not random — random splits leak temporal autocorrelation and inflate accuracy by 5-15 pp."

"第二步，模型训练。测试集 35 063 行，ROC AUC = 0.871。划分用按时间排序的最后 20%，不是随机——随机划分会泄漏时间自相关，把准确率虚高 5-15 个百分点。"

→ Tab 5 (figures/03_calibration_curve.png)

"Brier score 0.138 — predicted probabilities are well-calibrated. When the model says 70 %, the actual rate is close to 70 %. No need for Platt scaling or isotonic post-hoc."

"Brier 分数 = 0.138，预测概率校准良好——模型说 70% 时实际频率接近 70%。不需要 Platt scaling 或 isotonic 校准。"

→ Tab 6 (figures/04_threshold_sweep.png)

"I optimised for F2 score, not F1 — this is safety-critical, a missed rain event on a windward slope can cause flash flooding. False negatives are far worse than false positives. F2 weights recall 4× over precision. Optimal τ = 0.20, F2 = 0.778, recall 93.4 %."

"我用 F2 分数而不是 F1——安全关键场景，漏报比误报严重得多。F2 把召回权重设为精度的 4 倍。最优阈值 τ = 0.20，F2 = 0.778，召回率 93.4%。"

→ Tab 7 (figures/05_feature_importance.png)

"Top 3 features: previous-hour rain, time-of-day cyclic encoding, 3-hour pressure tendency. These match the meteorology literature — autocorrelation, diurnal cycle, storm precursor. The model learned physically meaningful signal."

"最重要的 3 个特征：上一小时降水、时间周期编码、3 小时气压变化——跟气象文献吻合：自相关、日变化、风暴前兆。模型学到的是物理上有意义的信号。"

→ Tab 9 (http://localhost:8000/app/) — FINALLY the app

"Now, Step 3, the app. FastAPI + Vue using the trained model from Step 2 — not a separate model, not a placeholder. Click any coordinate, the system returns the probability and four hazard sub-scores per proposal §3.7."

"现在第三步，app。FastAPI + Vue 调用刚才第二步训好的模型——不是另一个模型、不是占位符。点地图任意一点，系统返回概率和四个分项灾害评分（按开题 §3.7）。"

🇲🇾 Demo A — Genting Highlands (in-distribution)

Click 🇲🇾 Genting Highlands · slope in the scenario dropdown (top right)
Wait ~1 second for the loading spinner
Point to the risk gauge (the main number)
Point to the 4 mini-gauges below (rainfall / fog / wind / thunderstorm)

"Genting is 1865 m slope. Model gives moderate rain probability, rule engine detects orographic lift on the windward side, composite reflects both. The 4 mini-gauges decompose risk by hazard type — user knows whether to worry about rain, fog, wind, or thunder specifically."

"云顶 1865 m 山坡。模型给出中等降雨概率，规则引擎检测到迎风坡地形抬升，最终评分综合两者。4 个 mini-gauge 把风险按类型拆解——用户清楚该担心降雨、雾、风、还是雷暴。"

🏔️ Demo B — Mt Everest (OUT-OF-DISTRIBUTION STRESS TEST)

Click 🏔️ Mt Everest · 8 848 m (OOD) in the dropdown
Wait for the result
Point to the Veto triggers section (red box)

"This is the critical test. The model was trained only on Malaysian mountains — it has never seen anything above 2000 m. A pure ML system would give a low probability here and falsely return 'safe'. A hiker could die."	"这是关键测试。模型只在马来西亚山地训练过——从未见过 2000 m 以上的地点。纯 ML 系统会给出低概率然后错误地返回"安全"——登山者可能因此遇难。"
"But the hybrid architecture intervenes: the Veto cascade fires three overrides — altitude > 3500 m triggers hypoxia veto, temperature ≤ −5 °C triggers frostbite veto, wind ≥ 40 km/h triggers gale veto. Composite is forced to 100 = Danger, regardless of the ML output. This is exactly the OOD safety net the rule engine provides."	"但混合架构介入了：Veto 级联触发了三个否决——海拔 > 3500 m（缺氧）、温度 ≤ −5°C（冻伤）、风速 ≥ 40 km/h（大风）。无论 ML 输出什么，综合评分被强制设为 100 = Danger。这就是规则引擎对 OOD 输入的安全网作用。"

🎯 The Everest demo is your strongest defensive argument. Pre-tested in tests/test_rule_engine.py::test_mt_everest_veto_hypoxia.
🎯 珠峰演示是你最强的辩护点。有单元测试覆盖（test_mt_everest_veto_hypoxia）。

6 · Concern #5 — "Regression or classification?" 反馈五 · 回归还是分类

"I don't think this is a classification problem because there is no class label. So I think this is a regression problem."

"Sir, when you first looked at the raw CSV, no class label existed — regression seemed the only option. I considered both. I chose binary classification for three reasons:"	"老师，您当时看 CSV 没有 class label，看上去像 regression。我两个都考虑过，最后选了二分类，三个理由："
(1) "Downstream decision is binary — go outside or don't. Regressing mm of rain would still need a threshold to convert to go/no-go — I would have to pick the threshold anyway."	(1) "下游决策本身就是二元——出门 vs 不出门。即使回归预测毫米数，最后也要拿阈值转成 go/no-go——那个阈值反正要选。"
(2) "Classification lets me optimise F2 score directly — the right metric for safety-critical recall. I cannot directly optimise F2 on a regression target."	(2) "做分类才能直接优化 F2 分数——安全关键场景下召回比精度更重要，这个指标只在分类任务下有意义。"
(3) "But I still expose the raw probability in the API response — any downstream component that needs a continuous score (e.g. the rule engine's rainfall sub-scorer) can still use it. Best of both worlds."	(3) "但 API 还是把原始概率暴露出来——下游需要连续分数的组件（例如规则引擎的降雨子评分器）照样能用。两全其美。"

7 · Anticipated Q&A 老师可能追问

Q1 — "Why Random Forest and not deep learning / LSTM?" / 为什么不是深度学习？

"Three reasons. (1) Interpretability — feature importance lets me defend predictions. Essential for safety-critical. Neural net is a black box."	"三个理由：(1) 可解释性——feature importance 让我能为每个预测辩护，安全关键应用必须，神经网络是黑盒。"
(2) "Data efficiency — with 175 K samples, RF reaches state-of-the-art. LSTM would need an order of magnitude more data to outperform it."	(2) "数据效率——17 万样本下 RF 已经 SOTA，LSTM 需要至少 10 倍数据才能超过它。"
(3) "Inference latency — RF inference is sub-millisecond, our FastAPI+cache architecture depends on it. LSTM would be 10× slower and need GPU at inference."	(3) "推理延迟——RF 推理 < 1 ms，FastAPI+缓存架构依赖这一点；LSTM 至少慢 10 倍且推理时需要 GPU。"

Q2 — "How do you handle out-of-distribution input?" / 分布外输入怎么处理？

→ Just show the Mt Everest demo from §5. That IS the answer. Don't theorise — let the system speak.
→ 直接展示第 5 节的珠峰 demo。那就是答案。不要讲理论——让系统说话。

Q3 — "What is the rule engine's contribution? Could you just use ML alone?" / 规则引擎的贡献？只用 ML 不行吗？

"Pure ML is statistical — learns averages. But terrain in complex mountains amplifies precipitation locally by orders of magnitude (Roe 2005, Annual Rev Earth & Planetary Sciences)."	"纯 ML 是统计性的——学的是平均值。但复杂山地的地形把降水局部放大几个数量级（Roe 2005, Annual Rev Earth & Planetary Sciences）。"
"R1 in our decision table captures exactly this: when macro rain probability is low but wind impinges on a windward slope with falling pressure, hidden rain risk emerges. ML would say 'safe'; rule engine fires R1 and warns."	"决策表 R1 抓的就是这点：宏观降雨概率低、但风正对迎风坡且气压下降时——存在隐藏的降雨风险。ML 会说"安全"；规则引擎触发 R1 警告。"
"This is the Neuro-Symbolic AI paradigm — learn what is learnable, hand-code what is physical."	"这就是 Neuro-Symbolic AI 范式——能学的让 ML 学，物理规律手工编码。"

Q4 — "Cross-validation? Overfitting check?" / 交叉验证？过拟合？

"Yes, Sir. Time-series 5-fold CV on the training portion — not random K-fold (would leak temporal info)."	"做了老师，时间序列 5 折交叉验证——不是随机 K 折（会泄漏时间信息）。"
"Fold AUCs range 0.828 to 0.908, mean ≈ 0.858 — close to held-out test AUC 0.871. Confirms no overfitting to a single temporal slice."	"各折 AUC 0.828–0.908，均值约 0.858——跟独立测试集 AUC 0.871 非常接近。没有对某个时间段过拟合。"
"All in `models/training_report.json` and the model card."	"全部在 `models/training_report.json` 和 model card 里。"

Q5 — "Real-world validation plan?" / 真实世界怎么验证？

"Chapter 5: two-pronged. (1) Hindcast validation — replay against publicly documented Malaysian floods/landslides from NaDMA archives; check if system would have produced Warning/Danger at the right time."	"Chapter 5 两条腿走路：(1) 历史事件回放——用 NaDMA 公开的马来西亚洪水/滑坡事件，看系统在事件发生时是否会给出 Warning 或 Danger。"
(2) User study — small panel of mountain hikers compare system's recommendations to their own field judgment over one month. Both are standard practice in operational meteorology.	(2) 用户研究——找一小批登山者，一个月内对比系统建议和他们自己的判断。两种方法都是业务气象学界标准做法。

Q6 — "Risk levels Safe/Caution/Warning/Danger?" / 四个等级怎么定？

"Thresholds 30 / 55 / 80 on 0-100 composite. Calibrated so the mean output across training data falls in mid-Caution — system uses full dynamic range. Each level maps to a different recommended action in bilingual advice."

"阈值 0-100 综合分上的 30 / 55 / 80。校准依据：训练集平均输出正好落在 Caution 区间中部——系统能用满整个动态范围。每个等级对应不同的双语建议行动。"

Q7 — "What if model or API fails in production?" / 生产环境挂了怎么办？

"Three layers of graceful degradation. (1) Model load fails → physics-motivated heuristic. (2) Internal exception → typed ErrorResponse JSON. (3) Rule engine's Veto cascade runs independently of ML — even if ML returns garbage, safety thresholds still fire." "三层降级：(1) 模型加载失败→物理启发式。(2) 内部异常→类型化的 ErrorResponse JSON。(3) 规则引擎 Veto 级联独立于 ML——即使 ML 返回乱码，安全阈值仍触发。"

8 · Closing (30 seconds) 收尾 30 秒

"Sir, to summarise: I have addressed every point of your feedback. The missing Y is now derived. Documentation matches the data. Model is trained and evaluated before the app. Choice of classification over regression is justified by the safety-critical nature of the application."	"老师，总结一下：您每条反馈我都已经回应——Y 已经构造好、文档跟数据完全对齐、模型在 app 之前就训好并评估过、分类而不是回归是因为应用本身就是安全关键。"
"Code is on GitHub at `KyoukoLi/microclimate-x`, CI passing, 97 % test coverage, published model card. May I have your guidance on the next priorities for Chapter 5?"	"代码在 GitHub `KyoukoLi/microclimate-x`，CI 全过、测试覆盖率 97%、有完整的 model card。请问 Chapter 5 接下来您建议我重点做哪部分？"

9 · Psychological reminders 心理建设 · 老师真正在意什么

Did you LISTEN to him? / 你听进去他的话了吗？
He asked "Do you understand my English?" multiple times. Reassure him by quoting his exact words back ("as you instructed: dataset first, then model, then app").
他反复问 "Understand my English?" 用复述他原话让他放心。

Do you understand basic ML? / 你懂 ML 基础吗？
He explained X/Y, rows/columns, "if-then is the target" — patiently, like a tutor. Don't open with hybrid / neuro-symbolic / TPI / CAPE. Start with: dataset, target, feature, train, predict. Earn the right to use fancy vocabulary by first speaking his language.
不要上来就抛 hybrid、neuro-symbolic、TPI、CAPE。先用他的词汇：dataset、target、feature、train、predict。先证明你懂基础再升级。

Did you follow his process? / 你按他的流程做了吗？
"App is the last" — he said it three times. The visual order in which you open tabs IS the answer. No app until the very end.
"app is the last" 他说了三次。你打开标签页的顺序就是答案。绝对不要提前打开 app。

Defensive lines if you get stuck / 答不出来时的兜底话术

Situation	EN	ZH
Don't know answer	"That is a good question, Sir. I haven't fully worked out the answer yet — may I prepare a written response by next meeting?"	"老师这是个好问题，我还没完全想清楚——能否下次开会前给您一份书面回复？"
He challenges a threshold	"Sir, the threshold is documented in `docs/thresholds.md` with the academic citation. Let me open it."	"老师，这个阈值的学术引用在 `docs/thresholds.md` 里，我打开给您看。"
"This doesn't match what I expected"	"Yes Sir — that is exactly what I want to confirm with you. Could you describe what you expected, so I can align?"	"老师这正是我想跟您确认的点——能否说说您预期的样子？我好对齐。"

10 · Backup plan / 设备出问题的备份方案

Problem	Fallback	中文
WiFi down	Synthetic dataset works offline — `make synth` already ran	合成数据集已经跑过，本地能演
`make run` fails	Show GitHub repo with green CI badge — same artefacts visible there	直接给 GitHub repo 看 CI 绿勾，artefact 一样能看
Demo doesn't load	Use cached responses — recent results in `cache.sqlite3`	用缓存的结果——最近查询都在 `cache.sqlite3` 里
Browser crashes	Open this cheat sheet on your phone — every key number is here	手机打开这份 cheat sheet——所有关键数字都在

11 · Pre-flight checklist (60 seconds before) 起飞前最后 60 秒自检

☐ Laptop ≥ 80 % battery, charger in bag / 笔记本电池 ≥ 80%，充电器在包里

☐ make run is running in a terminal (don't close it!) / make run 在另一个终端跑着（不要关！）

☐ /api/health returns ml_loaded: true / /api/health 返回 ml_loaded: true

☐ All 10 browser tabs open in correct order (app is LAST) / 10 个标签页按顺序开好（app 在最后）

☐ This cheat sheet open on screen — but NOT to be read word-for-word / 这份 cheat sheet 开着，但不要照念

☐ Phone on silent / 手机静音

☐ Deep breath. You have done the work. / 深呼吸。你已经做完了所有该做的工作。

Supervisor Meeting Cheat Sheet 导师开会一页通 — MicroClimate-X 答辩准备

0 · Before the meeting (10 min before) 会前 10 分钟准备

Browser tabs — open in this exact order / 浏览器按顺序开标签页

1 · Opening (30 seconds) 开场 30 秒

2 · Concern #1 — "Y is missing" 反馈一 · Y 列缺失

On screen → Tab 3 (docs/dataset.md) → §5 Target label derivation

3 · Concern #2 — "Features don't match Excel" 反馈二 · 文档特征和 CSV 列名对不上

On screen → stay on Tab 3 → scroll up to §4 Schema

4 · Concern #3 — "Study the data source" 反馈三 · 研究数据源本身

On screen → stay on Tab 3 → scroll up to §1-3

5 · Concern #4 — "App is the last" 反馈四 · App 最后做（最重要！）

6 · Concern #5 — "Regression or classification?" 反馈五 · 回归还是分类

7 · Anticipated Q&A 老师可能追问

Q1 — "Why Random Forest and not deep learning / LSTM?" / 为什么不是深度学习？

Q2 — "How do you handle out-of-distribution input?" / 分布外输入怎么处理？

Q3 — "What is the rule engine's contribution? Could you just use ML alone?" / 规则引擎的贡献？只用 ML 不行吗？

Q4 — "Cross-validation? Overfitting check?" / 交叉验证？过拟合？

Q5 — "Real-world validation plan?" / 真实世界怎么验证？

Q6 — "Risk levels Safe/Caution/Warning/Danger?" / 四个等级怎么定？

Q7 — "What if model or API fails in production?" / 生产环境挂了怎么办？

8 · Closing (30 seconds) 收尾 30 秒

9 · Psychological reminders 心理建设 · 老师真正在意什么

Defensive lines if you get stuck / 答不出来时的兜底话术

10 · Backup plan / 设备出问题的备份方案

11 · Pre-flight checklist (60 seconds before) 起飞前最后 60 秒自检

On screen → Tab 3 (`docs/dataset.md`) → §5 Target label derivation