# Supervisor Meeting Brief — bilingual script # 导师开会双语逐字稿 > Single-page meeting brief addressing every point of feedback from the > 4/15 supervisor session. Bring this document open on screen during the > meeting and walk through it in order. > > 一页式开会简报,逐条回应 4/15 导师 review 的所有反馈。开会时直接打开 > 此页,按顺序走一遍即可。 --- ## Opening 30 seconds / 开场 30 秒 | English (say this) | 中文(口头要点) | |---|---| | "Sir, since our last meeting I have addressed every point of your feedback. May I walk you through them in the correct order — dataset first, then model, then app — as you instructed?" | "老师,按您上次反馈,我已经把每一条都改了。我按您要求的顺序——**先 dataset,再 model,最后才是 app**——给您过一遍可以吗?" | **Why this opening works**: it explicitly *names* the supervisor's #1 process complaint ("app is last"). He'll relax immediately because he can see you listened. --- ## Concern #1 — Y target was missing ## 反馈一 · 缺少目标列 Y **His original words**: "Y is missing. I don't have the output variable. If you don't have target, you cannot train a machine learning model." | English (say this) | 中文(口头要点) | |---|---| | "Sir, you were right — the raw Open-Meteo CSV has no Y column. I have engineered the target explicitly. The variable is called `is_rain_event` and it is defined as 1 if the precipitation in the **next hour** is greater than 0.1 mm, else 0. The code is one line in `scripts/2_preprocess.py`." | "老师您说得对,原始 Open-Meteo CSV 确实没有 Y 列。我现在已经显式构造了目标变量,叫做 **`is_rain_event`**,定义是:**下一小时降雨量 > 0.1 mm 则为 1,否则为 0**。代码就一行,写在 `scripts/2_preprocess.py`。" | | [Show this code on screen:] `df['is_rain_event'] = (df['precipitation'].shift(-1) > 0.1).astype(int)` | (把这一行代码投出来给老师看) | | "Three things to notice: `.shift(-1)` means I use **future** rain as the label — features at hour t predict outcome at t+1h, so there is no temporal leakage. The 0.1 mm threshold matches the **WMO definition** of trace precipitation, not an arbitrary choice. And it is binary classification, not regression, because the downstream decision is binary." | "三个要点:(1) `.shift(-1)` 表示用**下一小时**的降雨作为标签,特征是 t 时刻、预测的是 t+1 小时——没有时间泄漏。(2) 0.1 mm 这个阈值不是我随便定的,对应 **WMO 微量降水标准**。(3) 是二分类不是回归,因为下游用户决策本身就是二元的(去 / 不去)。" | **Artefact to show**: `docs/dataset.md` §5 (Target label derivation) — has all three points written out. --- ## Concern #2 — features in the document did not match the Excel ## 反馈二 · 文档里的特征跟 CSV 列名对不上 **His original words**: "The features that you presented here, not... not mentioned in the Excel. So, it must be matched." | English (say this) | 中文(口头要点) | |---|---| | "Sir, that was also a fair point. I have rewritten the dataset specification so the documentation lists exactly the **same column names** that appear in the CSV. There is a one-to-one mapping in `docs/dataset.md` §4." | "老师,这条您也说对了。我已经把数据集文档完全重写,文档里列出的就是 CSV 里的**真实列名**,一一对应。在 `docs/dataset.md` 第 4 节。" | | [Open dataset.md §4 schema table] "Every row in this table is one column in the actual CSV. The role column says whether it is a feature (X), the target (Y), or just metadata." | (打开 dataset.md §4 列结构表)"表里每一行就是 CSV 里的一列,role 列写明了它是 feature(X)、target(Y)还是 metadata。" | **Artefact to show**: `docs/dataset.md` §4 — single canonical schema table. --- ## Concern #3 — study the data source ## 反馈三 · 研究数据源本身 **His original words**: "Please study the link. What is the purpose of the dataset? What is design for? What is the output variable?" | English (say this) | 中文(口头要点) | |---|---| | "I read the Open-Meteo API documentation carefully. The dataset I use is the **ERA5 reanalysis archive**, which is ECMWF's gold-standard hourly reanalysis — they use it to validate other forecast models. It is *not* a forecast, it is a physically-consistent reconstruction of past weather, which is why it is the right dataset for training: the labels are reliable ground truth." | "我把 Open-Meteo 文档仔细读了。我用的是 **ERA5 再分析数据**,是 ECMWF 出的同化产品,气象学界用它当作**真值**去校验别的预报模型。它**不是**预报,而是对过去天气的物理一致的重建。所以用来训练 ML 是合适的——标签是可靠的 ground truth。" | | "Spatial coverage: 5 Malaysian mountain sites — Genting, Cameron, Fraser's Hill, Klang Valley, Kinabalu — chosen to span elevations from 100 m to 1865 m and terrain types from valley to slope." | "空间覆盖 5 个马来西亚山地点位——云顶、金马仑、福隆港、巴生谷、神山——海拔从 100 m 到 1865 m,地形从山谷到山坡都有。" | | "Temporal coverage: 5 years, hourly, 175 315 rows in total." | "时间范围 5 年,每小时一行,总共 175 315 行。" | **Artefact to show**: `docs/dataset.md` §1-3, or open the Open-Meteo documentation page itself if he wants the original source. --- ## Concern #4 — process order was wrong: app should be last ## 反馈四 · 流程顺序错了,app 应该最后做 **His original words**: "First, identify a dataset. Identify a dataset. And then train the model. And then predict it. First. Once everything is finished... okay, you can develop the app. App is the last." | English (say this) | 中文(口头要点) | |---|---| | "Yes Sir, I followed your process. The current state is: Step 1 dataset is identified and documented. Step 2 the model is trained — let me show you the results before I open the app." | "好的老师,我严格按您的流程做的。当前状态是:**第一步 dataset 已确认并文档化**;**第二步模型已训练完毕**——在打开 app 之前,先给您看训练结果。" | | [Open `figures/01_roc_curve.png`] "Test ROC AUC is 0.871 on 35 063 held-out hourly samples. The hold-out is the **last 20 % chronologically**, not a random split — random splits leak temporal autocorrelation and would inflate accuracy unrealistically." | (打开 ROC 图)"测试集 35 063 行,ROC AUC = **0.871**。划分用的是**按时间排序的最后 20%**,不是随机划分——随机划分会泄漏时间自相关,把准确率虚高 5-15 个百分点。" | | [Open `figures/03_calibration_curve.png`] "Brier score is 0.138, which means the predicted probabilities are well-calibrated — when the model says 70 % chance of rain, the actual rate is close to 70 %." | (打开 calibration 图)"Brier 分数 = 0.138,说明预测概率**校准良好**——模型说 70% 下雨概率时,实际频率接近 70%。" | | [Open `figures/04_threshold_sweep.png`] "I optimised the decision threshold for **F2 score**, not F1, because in this safety-critical application a missed rain event on a windward slope can lead to flash flooding — false negatives are much worse than false positives. F2 weights recall four times more than precision. The optimal threshold is τ = 0.20, giving F2 = 0.778 and **93.4 % recall**." | (打开阈值扫描图)"我用 **F2 分数**而不是 F1 来选最优阈值——因为这是安全关键应用,**漏报**比误报严重得多(在迎风坡漏掉一次降雨可能引发山洪)。F2 把召回率的权重设为精度的 4 倍,最优阈值是 τ = 0.20,F2 = 0.778,**召回率 93.4%**。" | | [Open `figures/05_feature_importance.png`] "Top-3 features the model relies on: previous hour's rain, time-of-day cyclic encoding, and 3-hour pressure tendency. These match the meteorological literature — autocorrelation, diurnal cycle, and storm precursor." | (打开特征重要性图)"模型最看重的 3 个特征:上一小时降水、时间周期编码、3 小时气压变化。这跟气象文献吻合——自相关、日变化、风暴前兆。" | | **[NOW open the app]** "Step 3, the app. This is FastAPI + Vue using the trained model. When I click a coordinate, the system returns the probability and the four hazard sub-scores per the proposal §3.7." | **(这时才打开 app)**"第三步,app。这是 FastAPI + Vue 调用上面训好的模型。我点地图任意一点,系统返回概率和四个分项灾害评分(按开题 §3.7)。" | **Why this order matters**: he literally said "App is the last" three times. Showing dataset → ROC → calibration → threshold → importance → THEN the app is exactly the order he asked for. Each chart takes 20-30 seconds to explain; total before opening the app ≈ 2-3 minutes. --- ## Concern #5 — regression or classification? ## 反馈五 · 回归还是分类? **His original words**: "I don't think this is a classification problem because there is no class label. So I think this is a regression problem." | English (say this) | 中文(口头要点) | |---|---| | "Sir, when you first looked at the raw CSV, there was no class label, so regression looked like the only option. I considered both. I chose **binary classification** for three reasons:" | "老师,您当时看原始 CSV 的时候确实没有 class label,所以看上去像 regression。我两个都考虑过,最后选了**二分类**,三个理由:" | | **(1)** "The downstream decision is binary — go outside or don't. Regressing on mm of rain would still need a threshold to convert to a go/no-go output, so I would have to pick the threshold anyway." | **(1)** "下游决策本身就是二元的——出门 vs 不出门。即使做回归预测降雨毫米数,最后也要拿一个阈值转成 go/no-go,**那个阈值反正要选**。" | | **(2)** "Classification lets me optimise **F2 score**, which is the right metric for a safety-critical setting where recall matters more than precision. I cannot directly optimise F2 on a regression target." | **(2)** "做分类才能直接优化 **F2 分数**——安全关键场景下召回比精度更重要,**这个指标只在分类任务下有意义**。" | | **(3)** "But I still expose the **raw probability** in the API response, so any downstream component that needs a continuous score (e.g. the rule engine's rainfall sub-scorer) can still use it. So I keep the best of both worlds." | **(3)** "但 API 还是把**原始概率**暴露出来了,下游需要连续分数的组件(比如规则引擎的降雨子评分器)照样能用。**两全其美**。" | --- ## Likely follow-up questions / 老师可能追问的问题 ### Q1 — "Why Random Forest and not deep learning / LSTM?" ### Q1 ——为什么选 Random Forest 而不是深度学习 / LSTM? | English | 中文 | |---|---| | "Three reasons. First, **interpretability**: feature importance lets me defend why the model predicts what it predicts — essential for a safety-critical application. A neural net is a black box. Second, **data efficiency**: with 175 K samples, Random Forest reaches state-of-the-art performance; LSTM would need an order of magnitude more data to outperform it. Third, **inference latency**: RF inference is sub-millisecond, which the FastAPI + cache architecture depends on. LSTM would be at least 10× slower and require GPU at inference time." | "三个理由:(1) **可解释性**——feature importance 让我能为每个预测**辩护**,安全关键应用必须有这一点,神经网络是黑盒。(2) **数据效率**——17 万样本下 RF 已经达到 SOTA,LSTM 需要至少 10 倍数据才能超过它。(3) **推理延迟**——RF 推理 < 1 ms,FastAPI + 缓存架构依赖这一点;LSTM 至少慢 10 倍且推理时需要 GPU。" | ### Q2 — "How do you handle out-of-distribution input (e.g. Mt Everest)?" ### Q2 ——分布外输入怎么处理(比如珠峰)? | English | 中文 | |---|---| | "This is exactly what the **hybrid architecture** is for, Sir. The Random Forest only saw Malaysian mountains, so on Everest it returns a low probability. But the rule engine's Veto cascade catches three independent failures — altitude > 3500 m triggers hypoxia veto, temperature ≤ -5 °C triggers frostbite veto, and wind ≥ 40 km/h triggers gale veto. The composite output goes to Danger regardless of the ML probability. There is a unit test for exactly this scenario — `test_mt_everest_veto_hypoxia` in `tests/test_rule_engine.py`." | "老师,这正是我做**混合架构**的原因。RF 只见过马来西亚的山,所以在珠峰上会返回很低的概率。但**规则引擎的 Veto 级联**会捕获三个独立的失败:海拔 > 3500 m 触发缺氧 Veto,温度 ≤ -5°C 触发冻伤 Veto,风速 ≥ 40 km/h 触发大风 Veto。无论 ML 给什么概率,输出都被强制设为 Danger。我专门为这个场景写了单元测试 `test_mt_everest_veto_hypoxia`。" | ### Q3 — "What is the contribution of the topographic rule engine? Could you just use the ML model alone?" ### Q3 ——地形规则引擎的贡献是什么?只用 ML 不行吗? | English | 中文 | |---|---| | "ML alone is statistical — it learns averages. But terrain in complex mountainous regions amplifies precipitation locally by orders of magnitude (Roe, 2005, *Annual Review of Earth & Planetary Sciences*). The decision-table R1 in proposal §3.7.2 captures exactly this: when macro rain probability is low but the wind impinges on a windward slope with falling pressure, hidden rain risk emerges. The ML model would say 'safe' here; the rule engine fires R1 and warns the user. This is the **Neuro-Symbolic AI** paradigm — learn what is learnable, hand-code what is physical." | "纯 ML 是统计性的——它学的是平均值。但复杂山地的地形会把降水**局部放大几个数量级**(Roe 2005, Annual Review of Earth & Planetary Sciences)。开题 §3.7.2 的决策表 R1 抓住的正是这一点:宏观降雨概率低、但风正对迎风坡且气压在下降时——存在**隐藏的降雨风险**。ML 在这种情况下会说"安全";规则引擎会触发 R1 警告用户。这是 **Neuro-Symbolic AI** 范式——能学的让 ML 学,物理规律手工编码。" | ### Q4 — "Did you do cross-validation? Did you check for overfitting?" ### Q4 ——做过交叉验证吗?检查过过拟合吗? | English | 中文 | |---|---| | "Yes Sir, **time-series cross-validation** with 5 folds on the training portion — not random K-fold, which would leak temporal information. The fold AUCs range from 0.828 to 0.908, mean ≈ 0.858, which is very close to the held-out test AUC of 0.871. This consistency confirms the model is not overfitting to a single temporal slice. All fold metrics are in `models/training_report.json` and the model card." | "做了,老师。**时间序列交叉验证**,5 折,**不是**随机 K 折——随机划分会泄漏时间信息。各折 AUC 在 0.828 到 0.908 之间,均值约 0.858,跟独立测试集 AUC 0.871 非常接近——说明模型没有对某个时间段过拟合。所有指标都在 `models/training_report.json` 和 model card 里。" | ### Q5 — "How will you validate this in the real world?" ### Q5 ——你怎么在真实世界验证这套系统? | English | 中文 | |---|---| | "Two-pronged plan for Chapter 5 evaluation. First, **hindcast validation** — I will replay the system against publicly documented Malaysian flood and landslide events from NaDMA archives and check whether the system would have produced a Warning or Danger verdict at the right time. Second, **user study** — a small panel of mountain hikers will compare the system's recommendations against their own field judgment over a one-month period. Both methodologies follow standard practice in the operational meteorology literature." | "Chapter 5 评估两条腿走路:(1) **历史事件回放** —— 用 NaDMA 公开记录的马来西亚洪水/滑坡事件,看系统在事件发生时是否会给出 Warning 或 Danger。(2) **用户研究** —— 找一小批登山者,一个月内对比系统建议和他们自己的判断。两种方法都是业务气象学界的标准做法。" | --- ## Closing 30 seconds / 收尾 30 秒 | English (say this) | 中文(口头要点) | |---|---| | "Sir, to summarise: I have addressed every point of your feedback — the missing Y is now derived, the documentation matches the data, the model is trained and evaluated before the app, and the choice of classification over regression is justified by the safety-critical nature of the application. The code is on GitHub at `KyoukoLi/microclimate-x` with CI passing, 97 % test coverage, and a published model card. May I have your guidance on the next priorities for Chapter 5?" | "老师,总结一下:您每条反馈我都已经回应——Y 已经构造好、文档跟数据完全对齐、模型在 app 之前就训好并评估过、分类而不是回归是因为应用本身就是安全关键。代码在 GitHub `KyoukoLi/microclimate-x`,CI 全过、测试覆盖率 97%、有完整的 model card。请问 Chapter 5 接下来您建议我重点做哪部分?" | --- ## Materials checklist before walking in / 开会前自检清单 - [ ] Laptop charged, browser tab open to `docs/dataset.md`. - [ ] All 6 figures in `figures/` rendered to a quick-flip slide deck (or just keep the PNG files in a single Finder window). - [ ] GitHub repo page open in another tab, ready to show CI green badge + commit history. - [ ] Frontend `frontend/index.html` ready to demo (open `make run` in a terminal **before** the meeting — not during). - [ ] `models/MODEL_CARD.md` open in a third tab, in case the supervisor asks for written evidence of any number you quote. - [ ] This brief (`docs/supervisor_meeting_brief.md`) open on screen — but **don't read from it word-for-word**, treat it as your safety net only. 中文版: - [ ] 笔记本充满电,浏览器开好 `docs/dataset.md` - [ ] `figures/` 里 6 张图全部预先点开过一次(图片预览快进就行,避免临时加载) - [ ] GitHub repo 页面开另一个标签页,CI 绿勾 + commit 历史随时可看 - [ ] 前端 `make run` **提前**起好(不要开会时才起) - [ ] `models/MODEL_CARD.md` 第三个标签页,老师追问任何数字时打开它 - [ ] **本文档**开着但**不要照念**,当兜底用即可