Spaces:
Paused
Supervisor Meeting Brief — bilingual script
导师开会双语逐字稿
Single-page meeting brief addressing every point of feedback from the 4/15 supervisor session. Bring this document open on screen during the meeting and walk through it in order.
一页式开会简报,逐条回应 4/15 导师 review 的所有反馈。开会时直接打开 此页,按顺序走一遍即可。
Opening 30 seconds / 开场 30 秒
| English (say this) | 中文(口头要点) |
|---|---|
| "Sir, since our last meeting I have addressed every point of your feedback. May I walk you through them in the correct order — dataset first, then model, then app — as you instructed?" | "老师,按您上次反馈,我已经把每一条都改了。我按您要求的顺序——先 dataset,再 model,最后才是 app——给您过一遍可以吗?" |
Why this opening works: it explicitly names the supervisor's #1 process complaint ("app is last"). He'll relax immediately because he can see you listened.
Concern #1 — Y target was missing
反馈一 · 缺少目标列 Y
His original words: "Y is missing. I don't have the output variable. If you don't have target, you cannot train a machine learning model."
| English (say this) | 中文(口头要点) |
|---|---|
"Sir, you were right — the raw Open-Meteo CSV has no Y column. I have engineered the target explicitly. The variable is called is_rain_event and it is defined as 1 if the precipitation in the next hour is greater than 0.1 mm, else 0. The code is one line in scripts/2_preprocess.py." |
"老师您说得对,原始 Open-Meteo CSV 确实没有 Y 列。我现在已经显式构造了目标变量,叫做 is_rain_event,定义是:下一小时降雨量 > 0.1 mm 则为 1,否则为 0。代码就一行,写在 scripts/2_preprocess.py。" |
[Show this code on screen:] df['is_rain_event'] = (df['precipitation'].shift(-1) > 0.1).astype(int) |
(把这一行代码投出来给老师看) |
"Three things to notice: .shift(-1) means I use future rain as the label — features at hour t predict outcome at t+1h, so there is no temporal leakage. The 0.1 mm threshold matches the WMO definition of trace precipitation, not an arbitrary choice. And it is binary classification, not regression, because the downstream decision is binary." |
"三个要点:(1) .shift(-1) 表示用下一小时的降雨作为标签,特征是 t 时刻、预测的是 t+1 小时——没有时间泄漏。(2) 0.1 mm 这个阈值不是我随便定的,对应 WMO 微量降水标准。(3) 是二分类不是回归,因为下游用户决策本身就是二元的(去 / 不去)。" |
Artefact to show: docs/dataset.md §5 (Target label derivation) — has all three points written out.
Concern #2 — features in the document did not match the Excel
反馈二 · 文档里的特征跟 CSV 列名对不上
His original words: "The features that you presented here, not... not mentioned in the Excel. So, it must be matched."
| English (say this) | 中文(口头要点) |
|---|---|
"Sir, that was also a fair point. I have rewritten the dataset specification so the documentation lists exactly the same column names that appear in the CSV. There is a one-to-one mapping in docs/dataset.md §4." |
"老师,这条您也说对了。我已经把数据集文档完全重写,文档里列出的就是 CSV 里的真实列名,一一对应。在 docs/dataset.md 第 4 节。" |
| [Open dataset.md §4 schema table] "Every row in this table is one column in the actual CSV. The role column says whether it is a feature (X), the target (Y), or just metadata." | (打开 dataset.md §4 列结构表)"表里每一行就是 CSV 里的一列,role 列写明了它是 feature(X)、target(Y)还是 metadata。" |
Artefact to show: docs/dataset.md §4 — single canonical schema table.
Concern #3 — study the data source
反馈三 · 研究数据源本身
His original words: "Please study the link. What is the purpose of the dataset? What is design for? What is the output variable?"
| English (say this) | 中文(口头要点) |
|---|---|
| "I read the Open-Meteo API documentation carefully. The dataset I use is the ERA5 reanalysis archive, which is ECMWF's gold-standard hourly reanalysis — they use it to validate other forecast models. It is not a forecast, it is a physically-consistent reconstruction of past weather, which is why it is the right dataset for training: the labels are reliable ground truth." | "我把 Open-Meteo 文档仔细读了。我用的是 ERA5 再分析数据,是 ECMWF 出的同化产品,气象学界用它当作真值去校验别的预报模型。它不是预报,而是对过去天气的物理一致的重建。所以用来训练 ML 是合适的——标签是可靠的 ground truth。" |
| "Spatial coverage: 5 Malaysian mountain sites — Genting, Cameron, Fraser's Hill, Klang Valley, Kinabalu — chosen to span elevations from 100 m to 1865 m and terrain types from valley to slope." | "空间覆盖 5 个马来西亚山地点位——云顶、金马仑、福隆港、巴生谷、神山——海拔从 100 m 到 1865 m,地形从山谷到山坡都有。" |
| "Temporal coverage: 5 years, hourly, 175 315 rows in total." | "时间范围 5 年,每小时一行,总共 175 315 行。" |
Artefact to show: docs/dataset.md §1-3, or open the Open-Meteo documentation page itself if he wants the original source.
Concern #4 — process order was wrong: app should be last
反馈四 · 流程顺序错了,app 应该最后做
His original words: "First, identify a dataset. Identify a dataset. And then train the model. And then predict it. First. Once everything is finished... okay, you can develop the app. App is the last."
| English (say this) | 中文(口头要点) |
|---|---|
| "Yes Sir, I followed your process. The current state is: Step 1 dataset is identified and documented. Step 2 the model is trained — let me show you the results before I open the app." | "好的老师,我严格按您的流程做的。当前状态是:第一步 dataset 已确认并文档化;第二步模型已训练完毕——在打开 app 之前,先给您看训练结果。" |
[Open figures/01_roc_curve.png] "Test ROC AUC is 0.871 on 35 063 held-out hourly samples. The hold-out is the last 20 % chronologically, not a random split — random splits leak temporal autocorrelation and would inflate accuracy unrealistically." |
(打开 ROC 图)"测试集 35 063 行,ROC AUC = 0.871。划分用的是**按时间排序的最后 20%**,不是随机划分——随机划分会泄漏时间自相关,把准确率虚高 5-15 个百分点。" |
[Open figures/03_calibration_curve.png] "Brier score is 0.138, which means the predicted probabilities are well-calibrated — when the model says 70 % chance of rain, the actual rate is close to 70 %." |
(打开 calibration 图)"Brier 分数 = 0.138,说明预测概率校准良好——模型说 70% 下雨概率时,实际频率接近 70%。" |
[Open figures/04_threshold_sweep.png] "I optimised the decision threshold for F2 score, not F1, because in this safety-critical application a missed rain event on a windward slope can lead to flash flooding — false negatives are much worse than false positives. F2 weights recall four times more than precision. The optimal threshold is τ = 0.20, giving F2 = 0.778 and 93.4 % recall." |
(打开阈值扫描图)"我用 F2 分数而不是 F1 来选最优阈值——因为这是安全关键应用,漏报比误报严重得多(在迎风坡漏掉一次降雨可能引发山洪)。F2 把召回率的权重设为精度的 4 倍,最优阈值是 τ = 0.20,F2 = 0.778,**召回率 93.4%**。" |
[Open figures/05_feature_importance.png] "Top-3 features the model relies on: previous hour's rain, time-of-day cyclic encoding, and 3-hour pressure tendency. These match the meteorological literature — autocorrelation, diurnal cycle, and storm precursor." |
(打开特征重要性图)"模型最看重的 3 个特征:上一小时降水、时间周期编码、3 小时气压变化。这跟气象文献吻合——自相关、日变化、风暴前兆。" |
| [NOW open the app] "Step 3, the app. This is FastAPI + Vue using the trained model. When I click a coordinate, the system returns the probability and the four hazard sub-scores per the proposal §3.7." | (这时才打开 app)"第三步,app。这是 FastAPI + Vue 调用上面训好的模型。我点地图任意一点,系统返回概率和四个分项灾害评分(按开题 §3.7)。" |
Why this order matters: he literally said "App is the last" three times. Showing dataset → ROC → calibration → threshold → importance → THEN the app is exactly the order he asked for. Each chart takes 20-30 seconds to explain; total before opening the app ≈ 2-3 minutes.
Concern #5 — regression or classification?
反馈五 · 回归还是分类?
His original words: "I don't think this is a classification problem because there is no class label. So I think this is a regression problem."
| English (say this) | 中文(口头要点) |
|---|---|
| "Sir, when you first looked at the raw CSV, there was no class label, so regression looked like the only option. I considered both. I chose binary classification for three reasons:" | "老师,您当时看原始 CSV 的时候确实没有 class label,所以看上去像 regression。我两个都考虑过,最后选了二分类,三个理由:" |
| (1) "The downstream decision is binary — go outside or don't. Regressing on mm of rain would still need a threshold to convert to a go/no-go output, so I would have to pick the threshold anyway." | (1) "下游决策本身就是二元的——出门 vs 不出门。即使做回归预测降雨毫米数,最后也要拿一个阈值转成 go/no-go,那个阈值反正要选。" |
| (2) "Classification lets me optimise F2 score, which is the right metric for a safety-critical setting where recall matters more than precision. I cannot directly optimise F2 on a regression target." | (2) "做分类才能直接优化 F2 分数——安全关键场景下召回比精度更重要,这个指标只在分类任务下有意义。" |
| (3) "But I still expose the raw probability in the API response, so any downstream component that needs a continuous score (e.g. the rule engine's rainfall sub-scorer) can still use it. So I keep the best of both worlds." | (3) "但 API 还是把原始概率暴露出来了,下游需要连续分数的组件(比如规则引擎的降雨子评分器)照样能用。两全其美。" |
Likely follow-up questions / 老师可能追问的问题
Q1 — "Why Random Forest and not deep learning / LSTM?"
Q1 ——为什么选 Random Forest 而不是深度学习 / LSTM?
| English | 中文 |
|---|---|
| "Three reasons. First, interpretability: feature importance lets me defend why the model predicts what it predicts — essential for a safety-critical application. A neural net is a black box. Second, data efficiency: with 175 K samples, Random Forest reaches state-of-the-art performance; LSTM would need an order of magnitude more data to outperform it. Third, inference latency: RF inference is sub-millisecond, which the FastAPI + cache architecture depends on. LSTM would be at least 10× slower and require GPU at inference time." | "三个理由:(1) 可解释性——feature importance 让我能为每个预测辩护,安全关键应用必须有这一点,神经网络是黑盒。(2) 数据效率——17 万样本下 RF 已经达到 SOTA,LSTM 需要至少 10 倍数据才能超过它。(3) 推理延迟——RF 推理 < 1 ms,FastAPI + 缓存架构依赖这一点;LSTM 至少慢 10 倍且推理时需要 GPU。" |
Q2 — "How do you handle out-of-distribution input (e.g. Mt Everest)?"
Q2 ——分布外输入怎么处理(比如珠峰)?
| English | 中文 |
|---|---|
"This is exactly what the hybrid architecture is for, Sir. The Random Forest only saw Malaysian mountains, so on Everest it returns a low probability. But the rule engine's Veto cascade catches three independent failures — altitude > 3500 m triggers hypoxia veto, temperature ≤ -5 °C triggers frostbite veto, and wind ≥ 40 km/h triggers gale veto. The composite output goes to Danger regardless of the ML probability. There is a unit test for exactly this scenario — test_mt_everest_veto_hypoxia in tests/test_rule_engine.py." |
"老师,这正是我做混合架构的原因。RF 只见过马来西亚的山,所以在珠峰上会返回很低的概率。但规则引擎的 Veto 级联会捕获三个独立的失败:海拔 > 3500 m 触发缺氧 Veto,温度 ≤ -5°C 触发冻伤 Veto,风速 ≥ 40 km/h 触发大风 Veto。无论 ML 给什么概率,输出都被强制设为 Danger。我专门为这个场景写了单元测试 test_mt_everest_veto_hypoxia。" |
Q3 — "What is the contribution of the topographic rule engine? Could you just use the ML model alone?"
Q3 ——地形规则引擎的贡献是什么?只用 ML 不行吗?
| English | 中文 |
|---|---|
| "ML alone is statistical — it learns averages. But terrain in complex mountainous regions amplifies precipitation locally by orders of magnitude (Roe, 2005, Annual Review of Earth & Planetary Sciences). The decision-table R1 in proposal §3.7.2 captures exactly this: when macro rain probability is low but the wind impinges on a windward slope with falling pressure, hidden rain risk emerges. The ML model would say 'safe' here; the rule engine fires R1 and warns the user. This is the Neuro-Symbolic AI paradigm — learn what is learnable, hand-code what is physical." | "纯 ML 是统计性的——它学的是平均值。但复杂山地的地形会把降水局部放大几个数量级(Roe 2005, Annual Review of Earth & Planetary Sciences)。开题 §3.7.2 的决策表 R1 抓住的正是这一点:宏观降雨概率低、但风正对迎风坡且气压在下降时——存在隐藏的降雨风险。ML 在这种情况下会说"安全";规则引擎会触发 R1 警告用户。这是 Neuro-Symbolic AI 范式——能学的让 ML 学,物理规律手工编码。" |
Q4 — "Did you do cross-validation? Did you check for overfitting?"
Q4 ——做过交叉验证吗?检查过过拟合吗?
| English | 中文 |
|---|---|
"Yes Sir, time-series cross-validation with 5 folds on the training portion — not random K-fold, which would leak temporal information. The fold AUCs range from 0.828 to 0.908, mean ≈ 0.858, which is very close to the held-out test AUC of 0.871. This consistency confirms the model is not overfitting to a single temporal slice. All fold metrics are in models/training_report.json and the model card." |
"做了,老师。时间序列交叉验证,5 折,不是随机 K 折——随机划分会泄漏时间信息。各折 AUC 在 0.828 到 0.908 之间,均值约 0.858,跟独立测试集 AUC 0.871 非常接近——说明模型没有对某个时间段过拟合。所有指标都在 models/training_report.json 和 model card 里。" |
Q5 — "How will you validate this in the real world?"
Q5 ——你怎么在真实世界验证这套系统?
| English | 中文 |
|---|---|
| "Two-pronged plan for Chapter 5 evaluation. First, hindcast validation — I will replay the system against publicly documented Malaysian flood and landslide events from NaDMA archives and check whether the system would have produced a Warning or Danger verdict at the right time. Second, user study — a small panel of mountain hikers will compare the system's recommendations against their own field judgment over a one-month period. Both methodologies follow standard practice in the operational meteorology literature." | "Chapter 5 评估两条腿走路:(1) 历史事件回放 —— 用 NaDMA 公开记录的马来西亚洪水/滑坡事件,看系统在事件发生时是否会给出 Warning 或 Danger。(2) 用户研究 —— 找一小批登山者,一个月内对比系统建议和他们自己的判断。两种方法都是业务气象学界的标准做法。" |
Closing 30 seconds / 收尾 30 秒
| English (say this) | 中文(口头要点) |
|---|---|
"Sir, to summarise: I have addressed every point of your feedback — the missing Y is now derived, the documentation matches the data, the model is trained and evaluated before the app, and the choice of classification over regression is justified by the safety-critical nature of the application. The code is on GitHub at KyoukoLi/microclimate-x with CI passing, 97 % test coverage, and a published model card. May I have your guidance on the next priorities for Chapter 5?" |
"老师,总结一下:您每条反馈我都已经回应——Y 已经构造好、文档跟数据完全对齐、模型在 app 之前就训好并评估过、分类而不是回归是因为应用本身就是安全关键。代码在 GitHub KyoukoLi/microclimate-x,CI 全过、测试覆盖率 97%、有完整的 model card。请问 Chapter 5 接下来您建议我重点做哪部分?" |
Materials checklist before walking in / 开会前自检清单
- Laptop charged, browser tab open to
docs/dataset.md. - All 6 figures in
figures/rendered to a quick-flip slide deck (or just keep the PNG files in a single Finder window). - GitHub repo page open in another tab, ready to show CI green badge + commit history.
- Frontend
frontend/index.htmlready to demo (openmake runin a terminal before the meeting — not during). -
models/MODEL_CARD.mdopen in a third tab, in case the supervisor asks for written evidence of any number you quote. - This brief (
docs/supervisor_meeting_brief.md) open on screen — but don't read from it word-for-word, treat it as your safety net only.
中文版:
- 笔记本充满电,浏览器开好
docs/dataset.md -
figures/里 6 张图全部预先点开过一次(图片预览快进就行,避免临时加载) - GitHub repo 页面开另一个标签页,CI 绿勾 + commit 历史随时可看
- 前端
make run提前起好(不要开会时才起) -
models/MODEL_CARD.md第三个标签页,老师追问任何数字时打开它 - 本文档开着但不要照念,当兜底用即可