File size: 20,630 Bytes
a8358d8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 | # Supervisor Progress-Update Brief — bilingual script
# 导师进度汇报双语逐字稿
> Follow-up meeting after the v1.0.0 hardening pass on 2026-05-11.
> Walk-through order is unchanged: **dataset → model → app → next steps**.
> Open this file on screen during the meeting; do not read word-for-word.
>
> 紧接 2026-05-11 v1.0.0 强化提交之后的**进度汇报**会议。
> 顺序一律不变:**dataset → model → app → 下一步**。
> 开会时屏幕上打开本文档,**不要照念**,当兜底用即可。
---
## 0. What you need to do — three time windows
## 0. 你要做的事 —— 三个时间窗口
### 0.1 Before the meeting (T-15 min) / 会前 15 分钟
| ☐ | English | 中文 |
|---|---|---|
| ☐ | Charge laptop ≥ 80 %; charger in bag. | 笔记本充满 ≥ 80%,充电器带上。 |
| ☐ | `cd ~/Projects/microclimate-x && git pull && git status` — must print "working tree clean". | 拉最新代码,确认 working tree clean。 |
| ☐ | `make run` in **terminal A** (leave it running). | 终端 A 起后端,**不要关**。 |
| ☐ | `curl -s http://localhost:8000/api/health \| python3 -m json.tool` in **terminal B** — verify `"ml_loaded": true`. | 终端 B 验证健康检查,`ml_loaded` 必须为 `true`。 |
| ☐ | Open the 10 browser tabs in the order from `docs/MEETING_CHEAT_SHEET.md` §0 — **app tab is last**. | 按 cheat-sheet §0 顺序开 10 个标签页,**app 标签放最后**。 |
| ☐ | This file (`docs/progress_update_brief.md`) open on a separate screen / phone. | 把本文档单独开在副屏或手机上。 |
| ☐ | Phone on silent. Deep breath. | 手机静音,深呼吸。 |
### 0.2 During the meeting (≈ 8 minutes) / 会中 ≈ 8 分钟
| Block | EN heading | 中文标题 | Time |
|---|---|---|---|
| 1 | Opening 30 s | 开场 30 秒 | 0:00 → 0:30 |
| 2 | What changed since last meeting | 自上次会以来的进展 | 0:30 → 2:00 |
| 3 | Live demo — dataset → model → app | 现场演示(顺序不变) | 2:00 → 5:00 |
| 4 | Next steps for Chapter 5 | Chapter 5 下一步 | 5:00 → 6:30 |
| 5 | Asks + closing | 请示 + 收尾 | 6:30 → 8:00 |
### 0.3 After the meeting (T+24 h) / 会后 24 小时内
| ☐ | English | 中文 |
|---|---|---|
| ☐ | Write meeting minutes — capture every supervisor decision in `docs/meeting_log_<date>.md`. | 写会议纪要,把老师每条决定记到 `docs/meeting_log_<日期>.md`。 |
| ☐ | Open one GitHub issue per agreed action item (label: `chapter-5`). | 每个 action item 在 GitHub 开一个 issue,打 `chapter-5` 标签。 |
| ☐ | Email a 3-bullet summary back to the supervisor for written confirmation. | 给老师发 3 条要点的总结邮件,留书面确认。 |
| ☐ | Update `README.md` §9 Roadmap — tick boxes that were signed off. | 更新 `README.md` 第 9 节 Roadmap,把通过的项打勾。 |
| ☐ | Tag a new release if scope was confirmed (`git tag v1.1.0-rc.1`). | 如果范围确认了,打个新 tag (`v1.1.0-rc.1`)。 |
---
## 1. Opening 30 seconds / 开场 30 秒
| English (say this) | 中文(口头要点) |
|---|---|
| "Sir, thank you for your time. Following up on our last session, I've completed a production-grade hardening pass — version 1.0.0 — and the full pipeline is now reproducible end-to-end. May I walk you through what's new in the same order as before — dataset, then model, then app — and finish with my proposed plan for Chapter 5?" | "老师感谢您抽时间。接着上次的内容,我做完了**v1.0.0 工程化强化**,整条流水线现在可以**端到端复现**。我按上次的顺序——**dataset、model、app**——给您过一遍新的进展,最后讲我对 Chapter 5 的下一步计划,可以吗?" |
**Why this opening**: it (a) restates the supervisor's preferred process order without him asking, (b) signals you've made forward progress not just polish, and (c) ends with an explicit ask for direction on Chapter 5 — which is what *he* wants to talk about.
**为什么这样开场**:(a) 不用他提就主动按他的流程顺序;(b) 强调是**前进了**而不是只在抛光;(c) 用对 Chapter 5 的请示收尾,**这正是他想聊的话题**。
---
## 2. What changed since the last meeting / 自上次会议以来的进展
> ~ 90 seconds. Stay on the GitHub repo tab — point to the commit history,
> the green CI badge, the v1.0.0 release.
>
> ≈ 90 秒。停在 GitHub repo 标签页,指给老师看 commit 历史、CI 绿勾、v1.0.0 release。
| Area | English | 中文 |
|---|---|---|
| **Backend hardening** | "I added a request-ID middleware, a typed `ErrorResponse` contract so no bare HTML 500s leak, structured logging, and an enriched `/api/health` exposing uptime, cache stats, and the loaded ML feature schema." | "后端我加了 **request-ID 中间件**、**类型化错误协议** `ErrorResponse`(不再泄漏裸 HTML 500)、结构化日志、以及**升级版 `/api/health`**(暴露 uptime、缓存统计、ML 特征 schema)。" |
| **ML pipeline** | "I shipped `scripts/4_evaluate_model.py` which produces six publication-quality figures plus a machine-readable `evaluation_summary.json`. I also wrote a HuggingFace-style `MODEL_CARD.md` covering intended use, training data, metrics, limitations, and ethical considerations." | "ML 流水线加了 **评估脚本** `scripts/4_evaluate_model.py`,自动出 6 张论文级别图 + 一份 `evaluation_summary.json`。还写了 HuggingFace 风格的 **MODEL_CARD.md**,覆盖用途、训练数据、指标、局限、伦理考量。" |
| **Tests + CI** | "Total tests went from 19 to **70**, backend coverage is **97 %**. CI runs on Python 3.9 / 3.11 / 3.12 plus a Docker image-build smoke test." | "测试数从 19 涨到 **70**,**后端覆盖率 97%**。CI 跑 Python 3.9/3.11/3.12 矩阵,外加 Docker 镜像构建烟测。" |
| **Dev-ex** | "Multi-stage Dockerfile, docker-compose, Makefile single-word recipes, pre-commit hooks. The whole project is now `docker compose up --build` away from a clean machine." | "多阶段 Dockerfile + compose + Makefile 单词命令 + pre-commit hooks。**新机器一句 `docker compose up --build` 就能跑起来**。" |
| **Documentation** | "Three new docs — `architecture.md`, `thresholds.md` with citations for every Veto threshold, and `pipeline_order.md` which explicitly enforces the dataset → model → app order you asked for." | "三份新文档——`architecture.md`、`thresholds.md`(每个 Veto 阈值都附学术引用)、以及 `pipeline_order.md`(**显式按您要求的 dataset→model→app 顺序写死**)。" |
**Artefact to show**: the GitHub commit history page; the green CI badge on the README; `CHANGELOG.md` v1.0.0 entry.
**展示物**:GitHub commit 历史页;README 上的 CI 绿勾;`CHANGELOG.md` 中 v1.0.0 那一段。
---
## 3. Live demo — dataset → model → app / 现场演示(顺序不变)
> ~ 3 minutes. Same order as the 5/11 dry-run script — no surprises for the supervisor.
>
> ≈ 3 分钟。跟 5/11 的脚本完全一样的顺序,**老师不会被打乱节奏**。
### 3.1 Dataset (Tab `docs/dataset.md`) — 30 s
| EN | 中文 |
|---|---|
| "Same dataset as last time — ERA5 reanalysis, 5 Malaysian mountain sites, 175 315 hourly rows. The Y column `is_rain_event` is derived in one line and documented in §5. No change here, just confirming the foundation is unchanged." | "数据集跟上次一样——ERA5 再分析、马来西亚 5 个山地点位、17.5 万行小时数据。Y 列 `is_rain_event` 一行代码构造,文档在 §5。**这里没有变**,只是确认地基没动。" |
### 3.2 Model (Tabs `01_roc_curve.png` → `03_calibration_curve.png` → `04_threshold_sweep.png` → `05_feature_importance.png`) — 90 s
| EN | 中文 |
|---|---|
| "Same model as last time — Random Forest, time-based split, τ = 0.20. Test ROC AUC **0.871**, PR AP **0.750**, Brier **0.138**, recall **93.4 %**. What's new is the **6 figures plus the model card** — every number you see here is reproducible from `make evaluate`." | "模型跟上次一样——RF、时间序列切分、τ = 0.20。测试 AUC **0.871**、PR AP **0.750**、Brier **0.138**、召回率 **93.4%**。**新东西**是 6 张图 + model card——上面任何一个数字都可以用 `make evaluate` 复现。" |
### 3.3 App (Tab `http://localhost:8000/app/`) — 60-90 s
| EN | 中文 |
|---|---|
| "Step 3, the app — opened **last** as agreed. Two demo scenarios. First, Genting Highlands — a slope at 1865 m inside the training distribution. The model gives a moderate rain probability; the rule engine picks up orographic lift; the four mini-gauges decompose the risk by hazard type." | "第三步 app——按约定**最后才开**。两个 demo 场景。第一个云顶高原——1865 m 的山坡,**在训练分布之内**。模型给中等降雨概率,规则引擎检测到地形抬升,四个 mini-gauge 把风险按灾害类型拆解。" |
| "Second, Mt Everest — completely out of distribution. The model alone would say 'safe'. The Veto cascade fires three independent overrides — hypoxia, frostbite, gale — and the composite is forced to Danger. There's a unit test for exactly this: `test_mt_everest_veto_hypoxia`." | "第二个珠峰——**完全分布外**。光看模型会说"安全",但 Veto 级联触发**三个独立否决**——缺氧、冻伤、大风——综合分被强制设为 Danger。**专门有单元测试覆盖这个场景**:`test_mt_everest_veto_hypoxia`。" |
---
## 4. Next steps for Chapter 5 / Chapter 5 下一步
> ~ 90 seconds. **This is the section the supervisor will react to most.**
> Frame each item as a concrete deliverable + estimated time + dependency.
>
> ≈ 90 秒。**老师反应最强烈的就是这一节**。每一项都以"**交付物 + 估时 + 依赖**"形式呈现。
### 4.1 Proposed Chapter 5 work plan / Chapter 5 工作计划
| # | Deliverable | EN one-liner | 中文一句话 | Estimate |
|---|---|---|---|---|
| 5.1 | **Comparative ablation** | "Train LogReg + XGBoost on the same features and report ROC / PR / F2 side-by-side with RF — answers 'why RF?' empirically." | "在同一特征集上训 LogReg + XGBoost,对比 ROC / PR / F2,**用数据回答"为什么选 RF"**。" | 1 week |
| 5.2 | **Hindcast validation** | "Replay 2020-2024 NaDMA-documented Malaysian flood / landslide events and check whether the system would have raised Warning / Danger at the right time. Reports hit-rate, lead-time, false-alarm rate." | "把 2020-2024 NaDMA 公开的马来西亚洪水/滑坡事件**逐一回放**,看系统能否在事发前给出 Warning/Danger。报告命中率、提前量、误报率。" | 2 weeks |
| 5.3 | **Threshold sensitivity** | "Sweep τ ∈ {0.10, 0.15, 0.20, 0.25, 0.30}, plot precision-recall trade-off, and justify the operating point with a cost-of-error analysis." | "扫 τ ∈ {0.10, 0.15, 0.20, 0.25, 0.30},画精度-召回权衡曲线,用**误差代价分析**为最终选点辩护。" | 3 days |
| 5.4 | **Component ablation** | "Compare three system variants — RF only / Rule only / Hybrid — on the held-out test set and on the OOD Mt Everest case. Quantifies the rule-engine contribution." | "对比三个系统变体——**纯 RF / 纯规则 / 混合**——在测试集和 OOD 珠峰场景上的表现。**量化规则引擎的贡献**。" | 4 days |
| 5.5 | **Small user study** *(optional)* | "Recruit 5-8 mountain hikers, run a 4-week panel, log system advice vs. their field judgment. Reports inter-rater agreement (Cohen's κ)." | "招募 5-8 名登山者,4 周面板研究,记录系统建议 vs 他们现场判断,报告 Cohen's κ 一致性。" | 4 weeks |
| 5.6 | **Thesis Chapter 5 draft** | "Pull §5.1-5.5 into a single 12-15 page evaluation chapter with all figures, tables, and discussion." | "把 §5.1-5.5 整合成 12-15 页的评估章节,含全部图表和讨论。" | 1 week (after 5.1-5.4) |
### 4.2 Decision tree to ask the supervisor / 请示决策树
| Question to ask | EN | 中文 |
|---|---|---|
| **Q1** | "Sir, of the five evaluation tracks above, which two should I prioritise for the **next four weeks** before we converge on the Chapter 5 outline?" | "老师,上面 5 条评估方向,**未来四周**您建议我重点做哪两条,然后再收敛到 Chapter 5 大纲?" |
| **Q2** | "Do you want me to include the user study (5.5)? It is the longest item and depends on participant recruitment — I want your call before committing." | "**用户研究 (5.5) 您要不要做**?这一条最长、依赖招募——想请您拍板再投入。" |
| **Q3** | "For the comparative ablation, do you want the comparison framed as 'why RF wins' (defending current choice) or 'what if XGBoost wins' (open exploration)? The framing affects how I report inconclusive results." | "**对比实验**您希望框成"为什么 RF 胜出"(**捍卫现有选择**)还是"如果 XGBoost 更好怎么办"(**开放探索**)?两种 framing 对**模棱两可结果**的报告方式不同。" |
| **Q4** | "Should I treat the Mt Everest OOD test as a thesis-level contribution (a stand-alone subsection on safety) or just an appendix item?" | "**珠峰 OOD 测试**算论文级别的贡献(单独一节讲安全性),还是放附录就够?" |
---
## 5. Asks + closing 60 seconds / 请示 + 收尾 60 秒
| EN (say this) | 中文(口头要点) |
|---|---|
| "Sir, to summarise: since the last meeting I've shipped v1.0.0 — production-grade hardening, 70 tests at 97 % coverage, six evaluation figures, a published model card, full Docker reproducibility. The pipeline order is unchanged from what you asked: dataset, model, app. For Chapter 5 I have five evaluation tracks scoped; I'd like your guidance on which two to prioritise for the next four weeks." | "老师,总结:自上次会议以来交付了 **v1.0.0**——工程化强化、70 个测试 97% 覆盖率、6 张评估图、model card、Docker 全复现。流水线顺序按您要求**没动**:dataset、model、app。Chapter 5 我列了 5 条评估方向,**接下来四周您建议我先做哪两条**?" |
| "I'll send you a 3-bullet email summary by tomorrow morning so we have written agreement on the priorities. Thank you for your time." | "明早之前给您发 3 条要点的邮件总结,**留个书面确认**。谢谢老师。" |
---
## 6. Q&A defensive lines / Q&A 兜底话术
> Anticipated follow-up questions from this update specifically (not the
> classics from the 5/11 brief — those are still live, just don't repeat
> them here).
>
> **针对本次进度汇报**可能出现的追问(5/11 那份的经典 Q1-Q7 仍然有效,
> 不重复罗列)。
### Q-N1 — "Why are you spending time on tests and Docker instead of the thesis?"
### Q-N1 ——为什么你在写测试和 Docker 上花时间,不写论文?
| EN | 中文 |
|---|---|
| "Sir, the v1.0.0 hardening was a one-time investment to make every Chapter 5 number reproducible by the examiner with a single command. Without it, every evaluation result would be a black box — the examiner could not verify the AUC of 0.871 herself. With `make evaluate` reproducing all six figures byte-for-byte, the thesis claims become falsifiable. From this point on, all my time goes to evaluation and writing." | "老师,v1.0.0 的强化是**一次性投资**——为了让评审老师**用一行命令就能复现 Chapter 5 的每一个数字**。没有它,AUC = 0.871 就是黑盒,**评审无法独立验证**。现在 `make evaluate` 能把 6 张图按字节复现,论文的每个 claim 都**可证伪**。从今天起所有时间都给评估和写作。" |
### Q-N2 — "Why hasn't the model improved since last time?"
### Q-N2 ——模型为什么自上次以后没提升?
| EN | 中文 |
|---|---|
| "Two reasons. First, the supervisor's instruction was to *consolidate* dataset and model before adding more capacity — which is what I did. Second, the bottleneck right now is **not the model** but the **rule engine's coverage of OOD scenarios**, which is a Chapter 5 contribution rather than a hyperparameter tweak. I'd rather report a defensible 0.871 with a calibrated rule engine than chase 0.88 with an unprincipled stack." | "两个理由:(1) 您上次的指示是**先把 dataset 和 model 巩固好**再加复杂度——我严格照做了。(2) **当前瓶颈不是模型本身**,而是**规则引擎对 OOD 场景的覆盖**——这是 Chapter 5 的研究贡献,不是调超参。我宁愿报一个**可辩护的 0.871** 加一个校准好的规则引擎,**也不要不讲原理地堆栈到 0.88**。" |
### Q-N3 — "Show me one concrete weakness you have not yet fixed."
### Q-N3 ——给我说一个你目前**还没修**的具体弱点。
| EN | 中文 |
|---|---|
| "Honestly, Sir, the biggest one is `cape_jkg` — the ERA5 archive returns predominantly zero CAPE for these Malaysian coordinates, which is a known coverage gap. The Random Forest learns nothing from it (0 % importance). The rule engine still uses live Open-Meteo CAPE at inference time, so the production output is fine, but the *training* signal for thunderstorm risk is weaker than I'd like. I plan to address this in §5.4 ablation by quantifying how much it matters." | "老实说,老师,最大的弱点是 **`cape_jkg`**——ERA5 在这些马来西亚坐标上的 CAPE 几乎全为零(**已知覆盖缺口**),**RF 完全没学到东西**(特征重要性 0%)。规则引擎在推理时用的是 Open-Meteo 实时 CAPE,所以生产输出没问题,但**雷暴风险的训练信号**比我希望的弱。计划在 §5.4 消融实验里**量化它的影响**。" |
### Q-N4 — "When can I see the first draft of Chapter 5?"
### Q-N4 ——Chapter 5 初稿什么时候能给我看?
| EN | 中文 |
|---|---|
| "If you sign off on tracks 5.1 + 5.2 + 5.4 today, the data collection finishes in 3 weeks, writing takes 1 week, so you'd have a draft in **4 weeks from today**. If you also want 5.5 (user study), add 4 weeks. I'll lock the date the moment you confirm the scope." | "如果今天您拍板 **5.1 + 5.2 + 5.4** 三条,**3 周收数据 + 1 周写作 = 4 周后给您初稿**。如果再加 **5.5(用户研究)**,再加 4 周。**您一确认范围,我立刻锁定交稿日**。" |
---
## 7. Materials checklist before walking in / 开会前自检清单
```
☐ Laptop ≥ 80 % battery, charger in bag
☐ Terminal A: `make run` is running, do not close
☐ Terminal B: `curl /api/health` returned ml_loaded: true within last 5 min
☐ 10 browser tabs open in cheat-sheet §0 order — app tab is LAST
☐ This file open on a separate screen / phone, NOT to be read aloud
☐ docs/MEETING_CHEAT_SHEET.md open as a fall-back
☐ models/MODEL_CARD.md open in case any number is challenged
☐ figures/evaluation_summary.json downloadable on demand
☐ Phone on silent
☐ One deep breath. You shipped v1.0.0. You're prepared.
```
```
☐ 笔记本电池 ≥ 80%,充电器已带
☐ 终端 A:`make run` 跑着,不要关
☐ 终端 B:5 分钟内 `curl /api/health` 返回 ml_loaded: true
☐ 10 个浏览器标签页按 cheat-sheet §0 顺序开好——app 标签放最后
☐ 本文档开在副屏 / 手机,不要照念
☐ docs/MEETING_CHEAT_SHEET.md 开着兜底
☐ models/MODEL_CARD.md 开着,老师质疑任何数字立刻打开
☐ figures/evaluation_summary.json 随时可发
☐ 手机静音
☐ 深呼吸。v1.0.0 已经交付。你准备好了。
```
---
## 8. Cross-references / 相关文档索引
| Topic | File |
|---|---|
| Original 5/11 reply to 4/15 feedback | [`supervisor_meeting_brief.md`](supervisor_meeting_brief.md) |
| One-page cheat sheet (tab order, demo script) | [`MEETING_CHEAT_SHEET.md`](MEETING_CHEAT_SHEET.md) |
| Pipeline order ASCII chart | [`pipeline_order.md`](pipeline_order.md) |
| Dataset spec + Y derivation | [`dataset.md`](dataset.md) |
| Architecture deep-dive | [`architecture.md`](architecture.md) |
| Threshold citations | [`thresholds.md`](thresholds.md) |
| Model card | [`../models/MODEL_CARD.md`](../models/MODEL_CARD.md) |
| Evaluation summary JSON | [`../figures/evaluation_summary.json`](../figures/evaluation_summary.json) |
| What changed in v1.0.0 | [`../CHANGELOG.md`](../CHANGELOG.md) |
---
> *Generated 2026-05-13 for the MicroClimate-X progress-update meeting at UKM.
> 此页为 2026-05-13 UKM 毕业设计 MicroClimate-X 进度汇报准备文档。*
|