microclimate-x / docs /progress_update_brief.md
W1nd5pac's picture
Deploy 2026-05-20T06:52:08Z — 11e81c5
4eefabb verified

Supervisor Progress-Update Brief — bilingual script

导师进度汇报双语逐字稿

Follow-up meeting after the v1.0.0 hardening pass on 2026-05-11. Walk-through order is unchanged: dataset → model → app → next steps. Open this file on screen during the meeting; do not read word-for-word.

紧接 2026-05-11 v1.0.0 强化提交之后的进度汇报会议。 顺序一律不变:dataset → model → app → 下一步。 开会时屏幕上打开本文档,不要照念,当兜底用即可。


0. What you need to do — three time windows

0. 你要做的事 —— 三个时间窗口

0.1 Before the meeting (T-15 min) / 会前 15 分钟

English 中文
Charge laptop ≥ 80 %; charger in bag. 笔记本充满 ≥ 80%,充电器带上。
cd ~/Projects/microclimate-x && git pull && git status — must print "working tree clean". 拉最新代码,确认 working tree clean。
make run in terminal A (leave it running). 终端 A 起后端,不要关
curl -s http://localhost:8000/api/health | python3 -m json.tool in terminal B — verify "ml_loaded": true. 终端 B 验证健康检查,ml_loaded 必须为 true
Open the 10 browser tabs in the order from docs/MEETING_CHEAT_SHEET.md §0 — app tab is last. 按 cheat-sheet §0 顺序开 10 个标签页,app 标签放最后
This file (docs/progress_update_brief.md) open on a separate screen / phone. 把本文档单独开在副屏或手机上。
Phone on silent. Deep breath. 手机静音,深呼吸。

0.2 During the meeting (≈ 8 minutes) / 会中 ≈ 8 分钟

Block EN heading 中文标题 Time
1 Opening 30 s 开场 30 秒 0:00 → 0:30
2 What changed since last meeting 自上次会以来的进展 0:30 → 2:00
3 Live demo — dataset → model → app 现场演示(顺序不变) 2:00 → 5:00
4 Next steps for Chapter 5 Chapter 5 下一步 5:00 → 6:30
5 Asks + closing 请示 + 收尾 6:30 → 8:00

0.3 After the meeting (T+24 h) / 会后 24 小时内

English 中文
Write meeting minutes — capture every supervisor decision in docs/meeting_log_<date>.md. 写会议纪要,把老师每条决定记到 docs/meeting_log_<日期>.md
Open one GitHub issue per agreed action item (label: chapter-5). 每个 action item 在 GitHub 开一个 issue,打 chapter-5 标签。
Email a 3-bullet summary back to the supervisor for written confirmation. 给老师发 3 条要点的总结邮件,留书面确认。
Update README.md §9 Roadmap — tick boxes that were signed off. 更新 README.md 第 9 节 Roadmap,把通过的项打勾。
Tag a new release if scope was confirmed (git tag v1.1.0-rc.1). 如果范围确认了,打个新 tag (v1.1.0-rc.1)。

1. Opening 30 seconds / 开场 30 秒

English (say this) 中文(口头要点)
"Sir, thank you for your time. Following up on our last session, I've completed a production-grade hardening pass — version 1.0.0 — and the full pipeline is now reproducible end-to-end. May I walk you through what's new in the same order as before — dataset, then model, then app — and finish with my proposed plan for Chapter 5?" "老师感谢您抽时间。接着上次的内容,我做完了v1.0.0 工程化强化,整条流水线现在可以端到端复现。我按上次的顺序——dataset、model、app——给您过一遍新的进展,最后讲我对 Chapter 5 的下一步计划,可以吗?"

Why this opening: it (a) restates the supervisor's preferred process order without him asking, (b) signals you've made forward progress not just polish, and (c) ends with an explicit ask for direction on Chapter 5 — which is what he wants to talk about.

为什么这样开场:(a) 不用他提就主动按他的流程顺序;(b) 强调是前进了而不是只在抛光;(c) 用对 Chapter 5 的请示收尾,这正是他想聊的话题


2. What changed since the last meeting / 自上次会议以来的进展

~ 90 seconds. Stay on the GitHub repo tab — point to the commit history, the green CI badge, the v1.0.0 release.

≈ 90 秒。停在 GitHub repo 标签页,指给老师看 commit 历史、CI 绿勾、v1.0.0 release。

Area English 中文
Backend hardening "I added a request-ID middleware, a typed ErrorResponse contract so no bare HTML 500s leak, structured logging, and an enriched /api/health exposing uptime, cache stats, and the loaded ML feature schema." "后端我加了 request-ID 中间件类型化错误协议 ErrorResponse(不再泄漏裸 HTML 500)、结构化日志、以及**升级版 /api/health**(暴露 uptime、缓存统计、ML 特征 schema)。"
ML pipeline "I shipped scripts/4_evaluate_model.py which produces six publication-quality figures plus a machine-readable evaluation_summary.json. I also wrote a HuggingFace-style MODEL_CARD.md covering intended use, training data, metrics, limitations, and ethical considerations." "ML 流水线加了 评估脚本 scripts/4_evaluate_model.py,自动出 6 张论文级别图 + 一份 evaluation_summary.json。还写了 HuggingFace 风格的 MODEL_CARD.md,覆盖用途、训练数据、指标、局限、伦理考量。"
Tests + CI "Total tests went from 19 to 70, backend coverage is 97 %. CI runs on Python 3.9 / 3.11 / 3.12 plus a Docker image-build smoke test." "测试数从 19 涨到 70,**后端覆盖率 97%**。CI 跑 Python 3.9/3.11/3.12 矩阵,外加 Docker 镜像构建烟测。"
Dev-ex "Multi-stage Dockerfile, docker-compose, Makefile single-word recipes, pre-commit hooks. The whole project is now docker compose up --build away from a clean machine." "多阶段 Dockerfile + compose + Makefile 单词命令 + pre-commit hooks。新机器一句 docker compose up --build 就能跑起来。"
Documentation "Three new docs — architecture.md, thresholds.md with citations for every Veto threshold, and pipeline_order.md which explicitly enforces the dataset → model → app order you asked for." "三份新文档——architecture.mdthresholds.md(每个 Veto 阈值都附学术引用)、以及 pipeline_order.md显式按您要求的 dataset→model→app 顺序写死)。"

Artefact to show: the GitHub commit history page; the green CI badge on the README; CHANGELOG.md v1.0.0 entry.

展示物:GitHub commit 历史页;README 上的 CI 绿勾;CHANGELOG.md 中 v1.0.0 那一段。


3. Live demo — dataset → model → app / 现场演示(顺序不变)

~ 3 minutes. Same order as the 5/11 dry-run script — no surprises for the supervisor.

≈ 3 分钟。跟 5/11 的脚本完全一样的顺序,老师不会被打乱节奏

3.1 Dataset (Tab docs/dataset.md) — 30 s

EN 中文
"Same dataset as last time — ERA5 reanalysis, 5 Malaysian mountain sites, 175 315 hourly rows. The Y column is_rain_event is derived in one line and documented in §5. No change here, just confirming the foundation is unchanged." "数据集跟上次一样——ERA5 再分析、马来西亚 5 个山地点位、17.5 万行小时数据。Y 列 is_rain_event 一行代码构造,文档在 §5。这里没有变,只是确认地基没动。"

3.2 Model (Tabs 01_roc_curve.png03_calibration_curve.png04_threshold_sweep.png05_feature_importance.png) — 90 s

EN 中文
"Same model as last time — Random Forest, time-based split, τ = 0.20. Test ROC AUC 0.871, PR AP 0.750, Brier 0.138, recall 93.4 %. What's new is the 6 figures plus the model card — every number you see here is reproducible from make evaluate." "模型跟上次一样——RF、时间序列切分、τ = 0.20。测试 AUC 0.871、PR AP 0.750、Brier 0.138、召回率 93.4%新东西是 6 张图 + model card——上面任何一个数字都可以用 make evaluate 复现。"

3.3 App (Tab http://localhost:8000/app/) — 60-90 s

EN 中文
"Step 3, the app — opened last as agreed. Two demo scenarios. First, Genting Highlands — a slope at 1865 m inside the training distribution. The model gives a moderate rain probability; the rule engine picks up orographic lift; the four mini-gauges decompose the risk by hazard type." "第三步 app——按约定最后才开。两个 demo 场景。第一个云顶高原——1865 m 的山坡,在训练分布之内。模型给中等降雨概率,规则引擎检测到地形抬升,四个 mini-gauge 把风险按灾害类型拆解。"
"Second, Mt Everest — completely out of distribution. The model alone would say 'safe'. The Veto cascade fires three independent overrides — hypoxia, frostbite, gale — and the composite is forced to Danger. There's a unit test for exactly this: test_mt_everest_veto_hypoxia." "第二个珠峰——完全分布外。光看模型会说"安全",但 Veto 级联触发三个独立否决——缺氧、冻伤、大风——综合分被强制设为 Danger。专门有单元测试覆盖这个场景test_mt_everest_veto_hypoxia。"

4. Next steps for Chapter 5 / Chapter 5 下一步

~ 90 seconds. This is the section the supervisor will react to most. Frame each item as a concrete deliverable + estimated time + dependency.

≈ 90 秒。老师反应最强烈的就是这一节。每一项都以"交付物 + 估时 + 依赖"形式呈现。

4.1 Proposed Chapter 5 work plan / Chapter 5 工作计划

# Deliverable EN one-liner 中文一句话 Estimate
5.1 Comparative ablation "Train LogReg + XGBoost on the same features and report ROC / PR / F2 side-by-side with RF — answers 'why RF?' empirically." "在同一特征集上训 LogReg + XGBoost,对比 ROC / PR / F2,**用数据回答"为什么选 RF"**。" 1 week
5.2 Hindcast validation "Replay 2020-2024 NaDMA-documented Malaysian flood / landslide events and check whether the system would have raised Warning / Danger at the right time. Reports hit-rate, lead-time, false-alarm rate." "把 2020-2024 NaDMA 公开的马来西亚洪水/滑坡事件逐一回放,看系统能否在事发前给出 Warning/Danger。报告命中率、提前量、误报率。" 2 weeks
5.3 Threshold sensitivity "Sweep τ ∈ {0.10, 0.15, 0.20, 0.25, 0.30}, plot precision-recall trade-off, and justify the operating point with a cost-of-error analysis." "扫 τ ∈ {0.10, 0.15, 0.20, 0.25, 0.30},画精度-召回权衡曲线,用误差代价分析为最终选点辩护。" 3 days
5.4 Component ablation "Compare three system variants — RF only / Rule only / Hybrid — on the held-out test set and on the OOD Mt Everest case. Quantifies the rule-engine contribution." "对比三个系统变体——纯 RF / 纯规则 / 混合——在测试集和 OOD 珠峰场景上的表现。量化规则引擎的贡献。" 4 days
5.5 Small user study (optional) "Recruit 5-8 mountain hikers, run a 4-week panel, log system advice vs. their field judgment. Reports inter-rater agreement (Cohen's κ)." "招募 5-8 名登山者,4 周面板研究,记录系统建议 vs 他们现场判断,报告 Cohen's κ 一致性。" 4 weeks
5.6 Thesis Chapter 5 draft "Pull §5.1-5.5 into a single 12-15 page evaluation chapter with all figures, tables, and discussion." "把 §5.1-5.5 整合成 12-15 页的评估章节,含全部图表和讨论。" 1 week (after 5.1-5.4)

4.2 Decision tree to ask the supervisor / 请示决策树

Question to ask EN 中文
Q1 "Sir, of the five evaluation tracks above, which two should I prioritise for the next four weeks before we converge on the Chapter 5 outline?" "老师,上面 5 条评估方向,未来四周您建议我重点做哪两条,然后再收敛到 Chapter 5 大纲?"
Q2 "Do you want me to include the user study (5.5)? It is the longest item and depends on participant recruitment — I want your call before committing." "用户研究 (5.5) 您要不要做?这一条最长、依赖招募——想请您拍板再投入。"
Q3 "For the comparative ablation, do you want the comparison framed as 'why RF wins' (defending current choice) or 'what if XGBoost wins' (open exploration)? The framing affects how I report inconclusive results." "对比实验您希望框成"为什么 RF 胜出"(捍卫现有选择)还是"如果 XGBoost 更好怎么办"(开放探索)?两种 framing 对模棱两可结果的报告方式不同。"
Q4 "Should I treat the Mt Everest OOD test as a thesis-level contribution (a stand-alone subsection on safety) or just an appendix item?" "珠峰 OOD 测试算论文级别的贡献(单独一节讲安全性),还是放附录就够?"

5. Asks + closing 60 seconds / 请示 + 收尾 60 秒

EN (say this) 中文(口头要点)
"Sir, to summarise: since the last meeting I've shipped v1.0.0 — production-grade hardening, 70 tests at 97 % coverage, six evaluation figures, a published model card, full Docker reproducibility. The pipeline order is unchanged from what you asked: dataset, model, app. For Chapter 5 I have five evaluation tracks scoped; I'd like your guidance on which two to prioritise for the next four weeks." "老师,总结:自上次会议以来交付了 v1.0.0——工程化强化、70 个测试 97% 覆盖率、6 张评估图、model card、Docker 全复现。流水线顺序按您要求没动:dataset、model、app。Chapter 5 我列了 5 条评估方向,接下来四周您建议我先做哪两条?"
"I'll send you a 3-bullet email summary by tomorrow morning so we have written agreement on the priorities. Thank you for your time." "明早之前给您发 3 条要点的邮件总结,留个书面确认。谢谢老师。"

6. Q&A defensive lines / Q&A 兜底话术

Anticipated follow-up questions from this update specifically (not the classics from the 5/11 brief — those are still live, just don't repeat them here).

针对本次进度汇报可能出现的追问(5/11 那份的经典 Q1-Q7 仍然有效, 不重复罗列)。

Q-N1 — "Why are you spending time on tests and Docker instead of the thesis?"

Q-N1 ——为什么你在写测试和 Docker 上花时间,不写论文?

EN 中文
"Sir, the v1.0.0 hardening was a one-time investment to make every Chapter 5 number reproducible by the examiner with a single command. Without it, every evaluation result would be a black box — the examiner could not verify the AUC of 0.871 herself. With make evaluate reproducing all six figures byte-for-byte, the thesis claims become falsifiable. From this point on, all my time goes to evaluation and writing." "老师,v1.0.0 的强化是一次性投资——为了让评审老师用一行命令就能复现 Chapter 5 的每一个数字。没有它,AUC = 0.871 就是黑盒,评审无法独立验证。现在 make evaluate 能把 6 张图按字节复现,论文的每个 claim 都可证伪。从今天起所有时间都给评估和写作。"

Q-N2 — "Why hasn't the model improved since last time?"

Q-N2 ——模型为什么自上次以后没提升?

EN 中文
"Two reasons. First, the supervisor's instruction was to consolidate dataset and model before adding more capacity — which is what I did. Second, the bottleneck right now is not the model but the rule engine's coverage of OOD scenarios, which is a Chapter 5 contribution rather than a hyperparameter tweak. I'd rather report a defensible 0.871 with a calibrated rule engine than chase 0.88 with an unprincipled stack." "两个理由:(1) 您上次的指示是先把 dataset 和 model 巩固好再加复杂度——我严格照做了。(2) 当前瓶颈不是模型本身,而是规则引擎对 OOD 场景的覆盖——这是 Chapter 5 的研究贡献,不是调超参。我宁愿报一个可辩护的 0.871 加一个校准好的规则引擎,也不要不讲原理地堆栈到 0.88。"

Q-N3 — "Show me one concrete weakness you have not yet fixed."

Q-N3 ——给我说一个你目前还没修的具体弱点。

EN 中文
"Honestly, Sir, the biggest one is cape_jkg — the ERA5 archive returns predominantly zero CAPE for these Malaysian coordinates, which is a known coverage gap. The Random Forest learns nothing from it (0 % importance). The rule engine still uses live Open-Meteo CAPE at inference time, so the production output is fine, but the training signal for thunderstorm risk is weaker than I'd like. I plan to address this in §5.4 ablation by quantifying how much it matters." "老实说,老师,最大的弱点是 cape_jkg——ERA5 在这些马来西亚坐标上的 CAPE 几乎全为零(已知覆盖缺口),RF 完全没学到东西(特征重要性 0%)。规则引擎在推理时用的是 Open-Meteo 实时 CAPE,所以生产输出没问题,但雷暴风险的训练信号比我希望的弱。计划在 §5.4 消融实验里量化它的影响。"

Q-N4 — "When can I see the first draft of Chapter 5?"

Q-N4 ——Chapter 5 初稿什么时候能给我看?

EN 中文
"If you sign off on tracks 5.1 + 5.2 + 5.4 today, the data collection finishes in 3 weeks, writing takes 1 week, so you'd have a draft in 4 weeks from today. If you also want 5.5 (user study), add 4 weeks. I'll lock the date the moment you confirm the scope." "如果今天您拍板 5.1 + 5.2 + 5.4 三条,3 周收数据 + 1 周写作 = 4 周后给您初稿。如果再加 5.5(用户研究),再加 4 周。您一确认范围,我立刻锁定交稿日。"

7. Materials checklist before walking in / 开会前自检清单

☐ Laptop ≥ 80 % battery, charger in bag
☐ Terminal A: `make run` is running, do not close
☐ Terminal B: `curl /api/health` returned ml_loaded: true within last 5 min
☐ 10 browser tabs open in cheat-sheet §0 order — app tab is LAST
☐ This file open on a separate screen / phone, NOT to be read aloud
☐ docs/MEETING_CHEAT_SHEET.md open as a fall-back
☐ models/MODEL_CARD.md open in case any number is challenged
☐ figures/evaluation_summary.json downloadable on demand
☐ Phone on silent
☐ One deep breath. You shipped v1.0.0. You're prepared.
☐ 笔记本电池 ≥ 80%,充电器已带
☐ 终端 A:`make run` 跑着,不要关
☐ 终端 B:5 分钟内 `curl /api/health` 返回 ml_loaded: true
☐ 10 个浏览器标签页按 cheat-sheet §0 顺序开好——app 标签放最后
☐ 本文档开在副屏 / 手机,不要照念
☐ docs/MEETING_CHEAT_SHEET.md 开着兜底
☐ models/MODEL_CARD.md 开着,老师质疑任何数字立刻打开
☐ figures/evaluation_summary.json 随时可发
☐ 手机静音
☐ 深呼吸。v1.0.0 已经交付。你准备好了。

8. Cross-references / 相关文档索引

Topic File
Original 5/11 reply to 4/15 feedback supervisor_meeting_brief.md
One-page cheat sheet (tab order, demo script) MEETING_CHEAT_SHEET.md
Pipeline order ASCII chart pipeline_order.md
Dataset spec + Y derivation dataset.md
Architecture deep-dive architecture.md
Threshold citations thresholds.md
Model card ../models/MODEL_CARD.md
Evaluation summary JSON ../figures/evaluation_summary.json
What changed in v1.0.0 ../CHANGELOG.md

Generated 2026-05-13 for the MicroClimate-X progress-update meeting at UKM. 此页为 2026-05-13 UKM 毕业设计 MicroClimate-X 进度汇报准备文档。