| # Monday handoff (May 4, 2026) |
|
|
| State of the repo at end of Sunday May 3 / overnight into May 4. |
| Demo is **Sunday May 10**. |
|
|
| ## Overnight pass (Sunday evening → Monday) |
|
|
| Eight priorities closed against `audit/2026-05-03-evening-audit.md`: |
|
|
| 1. `pitch/cold_open.md` restored (was accidentally deleted in 1cb5ee6). |
| 2. Granite Guardian / refusal-classification leftovers removed. |
| Mellea is the sole grounding mechanism, period. |
| 3. **Trace UI is now clickable.** Click any specialist row to reveal |
| its raw structured output (formatted JSON, copy button, |
| max-height + scroll). This is the auditability contract: every |
| claim in the briefing is traceable to the specialist that produced |
| it directly inside the UI, not just the citation appendix. |
| 4. Buffered-footprint overlap for the three Point-geometry register |
| specialists. NYU Langone / Stuyvesant HS / P.S. 89 now correctly |
| register `inside_sandy_2012=true`. Each output records its |
| `footprint_buffer_m`. |
| 5. Map renders register-asset pins (subway / school / hospital / |
| NYCHA-centroid) coloured by Sandy exposure with click popups |
| showing name + `[doc_id]`. NYCHA polygon-fill is queued for when |
| `geometry_geojson` lands in the dataclass. |
| 6. **`floodnet_forecast` specialist**. TTM r2 forecast on the |
| nearest FloodNet sensor's flood-event recurrence. Reuses the |
| (512, 96) singleton already loaded for `ttm_311_forecast`. |
| *no new model class loaded into memory*. The strongest single |
| TTM win for the NYU CUSP audience. |
| 7. Trace UI groups TTM specialists under one parent node |
| `forecasting.granite-timeseries-ttm-r2 [N instances]` so the |
| "one foundation model, multiple data streams" architectural story |
| is legible without reading per-row metadata. |
| 8. `experiments/` cleanup: dropped two empty dirs (`05_sam2_promptable`, |
| `06_chronos_bolt_forecast`), renamed `05_terramind_finetune` → |
| `05a_terramind_finetune_micro` to dedupe with the active NYC |
| fine-tune dir, removed `Riprap.zip` from repo root. |
| |
| Commit chain: `a2143fc` … through `ed6ae9d`. Morning handoff doc |
| at `audit/2026-05-04-morning-handoff.md` summarises what to verify |
| and what's queued next. |
| |
| ## Where Sunday ended |
| |
| All four keep-list items resolved + 4 register specialists shipped + AMD |
| fine-tune prep green. |
| |
| | Item | Status | Path | |
| |---|---|---| |
| | Pitch cold-open locked | ✓ | `pitch/cold_open.md` | |
| | TerraMind-NYC fine-tune eval spec | ✓ | `experiments/05_terramind_nyc_finetune/eval/eval_spec.md` | |
| | 200-query adversarial set + refusal eval | ✓ (planner pivot) | `experiments/06_granite_guardian/` | |
| | Subway-entrance specialist (Sheepshead Bay) | ✓ | `experiments/07_mta_entrances/` | |
| | NYCHA-developments specialist (Red Hook) | ✓ | `experiments/08_nycha_developments/` | |
| | DOE-schools specialist (Coney Island) | ✓ | `experiments/09_doe_schools/` | |
| | DOH-hospitals specialist (Coney Island) | ✓ | `experiments/10_doh_hospitals/` | |
| | FSM integration of all 4 register specialists | ✓ | `app/registers/`, `app/fsm.py`, `app/reconcile.py`, `web/static/agent.js` | |
| | AMD droplet TerraMind smoke + STAC manifest | ✓ | `129.212.182.52:/root/terramind_nyc/` | |
| |
| End-to-end smoke on "Coney Island Brooklyn" produced citations |
| `[mta_entrance_56]`, `[nycha_dev_239]`, `[nycha_dev_166]` alongside |
| `[rag_mta]` and `[nyc311]`. Family-prefix chip routing works. |
| |
| Last commit: `86861be` (FSM integration of 4 register specialists). |
| |
| ## Decisions locked |
| |
| - **Refusal classification dropped entirely.** Planner-level |
| classifier hit FN=0% but FP=7% (gate was <5%). Granite Guardian |
| itself was already abandoned (laptop-infeasible). After the audit |
| surfaced that the planner shim was documented-but-never-wired, |
| the decision is now Option C: drop refusal handling. Cold-start |
| framing scopes the audience; Mellea rejection sampling enforces |
| grounding integrity; the four-tier glyph margin carries the |
| epistemic-honesty signal. The `GuardianRefusal.svelte` component |
| is deleted (was only ever rendered on a documentation page). |
| Demo's integrity beat is the **Mellea grounding-failure reroll on |
| the curated Hollis 0.19% → 19% case**. `experiments/06_granite_guardian/` |
| is preserved as a "considered and rejected" artifact for the |
| methodology paper. |
| - **AMD path: `129.212.182.52` is production**, not `165.245.134.44`. |
| CLAUDE.md says the latter; **fix CLAUDE.md to match reality**. |
| Production vLLM is on `.52`. The TerraMind container shares the |
| GPU with vLLM; both fit on one MI300X. |
| - **TerraMind manifest is 1028 paired chips**, 2021-05 → 2026-04, |
| NYC 5-borough hull +5 km, S2-cloud <30%, ≤3-day pair window. One |
| year (2022-05 → 2023-04) returned 0 due to PC API intermittency. |
| acceptable for the micro-fine-tune. |
|
|
| ## First thing Monday morning |
|
|
| 1. **Refresh Microsoft Planetary Computer signed URLs.** They have |
| ~1 hr TTL; the manifest from Sunday evening is stale by morning. |
| On the droplet: |
| ```bash |
| ssh root@129.212.182.52 |
| docker exec -it terramind bash |
| cd /root/terramind_nyc |
| python build_manifest.py --refresh-only manifest_train.jsonl |
| python build_manifest.py --refresh-only manifest_holdout.jsonl |
| ``` |
| (Recipe is in `/root/terramind_nyc/NOTES.md` on the droplet.) |
|
|
| 2. **Kick off TerraMind-NYC fine-tune.** Spec at |
| `experiments/05_terramind_nyc_finetune/eval/eval_spec.md`. Budget |
| is 30 GPU-hours; alarm at 25 (set on the droplet). Predicted |
| actual: ~0.16 GPU-hours at bs=8 / 3 epochs. Don't run anything |
| experimental until eval-spec gates pass on the held-out set. |
|
|
| 3. **Decide bucket** (A ship-in-demo / B publish-only / C revert): |
| - A: ship the fine-tuned checkpoint as a Riprap specialist. |
| - B: publish to HF as `msradam/TerraMind-1.0-NYC` with model card, |
| don't ship in demo. **Bucket B is fully acceptable** per the |
| spec. Civic-tech publication discipline is the durable goal. |
| - C: discard checkpoint, no public artefact. |
| |
| ## Working on Monday |
|
|
| - TerraMind-NYC fine-tune (above). |
| - **Mellea grounding-failure demo prep.** The pitch demo is the |
| Hollis 0.19% → 19% case where Granite emits a number with the |
| wrong order of magnitude and Mellea catches it. Demo script |
| needs to: |
| - Show the failed first attempt (banner: "Mellea reroll: numerics |
| grounding failed"). |
| - Show the second attempt with the corrected number. |
| - Show the audit panel with the pass/fail per-requirement. |
| - Show wall-clock for the reroll (target: under 30 s end-to-end). |
| - Currently reproducible via `scripts/probe_mellea.py --query |
| "Hollis" --runs 5`. The demo script is the *visual* version. |
| - **MTA Sandy-recovery citation layer.** Parse the MTA "Hurricane |
| Sandy: Three Years Later" report into per-station-id facts so |
| the subway-entrance specialist can emit |
| `[mta_recovery_<station_id>]` doc messages alongside the |
| exposure ones. |
| - **NYCHA polygon-fill on the map.** Overnight session shipped |
| NYCHA developments as centroid pins on the map (graded by |
| `pct_inside_sandy ≥ 50%`). The next tightening is to add a |
| `geometry_geojson` field to `app/registers/nycha.py`'s |
| `DevelopmentFinding` dataclass and route through SSE so |
| `register-polygons` actually renders graded fills (the layer + |
| source are already present in `RipMap.svelte`). |
| - **PLUTO/Building-Footprints join** for Stuyvesant Town etc. |
| Overnight pass shipped buffered-point overlap (NYU Langone, |
| Stuyvesant HS, P.S. 89 now correctly flip to |
| `inside_sandy_2012=true`). The 100m hospital buffer / 50m school |
| buffer is honest but coarse; PLUTO + actual building footprints |
| is the next step for the very-large-campus assets. |
| |
| ## Outstanding through Friday |
|
|
| In rough priority order: |
|
|
| 1. **More specialists**: |
| - FEMA OpenFEMA NFIP claims tract-aggregated (pending). |
| - NWS NWPS reach-level forecast + USGS NWIS Bronx / Saw Mill / |
| Hutchinson rivers. |
| - NYC DEP CSO outfalls + Bluebelt + Green Infrastructure |
| specialist (CSS-vs-MS4 distinction for ASCE). |
| - Three more TTM r2 specialists (USGS streamgage stage, NWS |
| rainfall accumulation, NYC 311 sewer-backup citywide rate). |
| **FloodNet forecast already shipped in the overnight pass.** |
| 2. **Visual identity refresh**: Carto Positron, IBM Plex, four-tier |
| epistemic palette, WeasyPrint PDF export, trace UI as `<details>` |
| tree. |
| 3. **WCAG 2.2 AA pass.** |
| 4. **Methodology paper draft** (6-8 page PDF). Goal: Saturday May 9. |
| 5. **Historical-event mode**. Vintage-cutoff queries. Saturday. |
| 6. **Five Build-in-Public posts** through the week. |
| 7. **5-minute hackathon pitch + 3 demo queries.** Friday rehearsal. |
| 8. **ASCE talk materials**. May 13 (post-hackathon). |
| |
| ## Sharp edges to remember |
|
|
| - **Static assets cache hard.** When iterating on Svelte or |
| agent.js, hard-reload (⌘⇧R). No cache-busting in place. |
| - **HF Space sleeps after idle.** Free tier; first request after |
| sleep is a 30-90 s cold start. Ping the space before any demo. |
| - **vLLM cold compile.** First few requests against a fresh |
| `vllm serve` log surprisingly low throughput while ROCm kernels |
| JIT. Run benchmarks 3+ times before believing them. |
| - **Sandy GeoJSON has self-intersection issues** that blow up |
| `unary_union`. Use `buffer(0)` (caught and fixed for NYCHA; |
| may surface again for any new polygon-overlap specialist). |
| - **DEP column is `Flooding_Category` (int16)**, not `depth_class`. |
| Documented in NYCHA RESULTS.md. |
| - **Centroid-edge join false-negatives** on NYU Langone / Stuyvesant |
| / P.S. 89 because their centroid points lie just outside the OEM |
| Sandy polygon despite real 2012 basement flooding. PLUTO |
| footprint join is the queued fix. |
| - **Don't restart uvicorn while a model is mid-generation.** Ollama |
| keeps the request alive but the FastAPI handler dies, leaving |
| the user staring at a dead stream. |
|
|
| ## Files to read in order on Monday morning |
|
|
| 1. This file. |
| 2. `experiments/05_terramind_nyc_finetune/eval/eval_spec.md`. The |
| contract for what training output triggers ship/publish/revert. |
| 3. `experiments/06_granite_guardian/RESULTS.md`. The Guardian → |
| planner pivot decision record (so you know why Guardian is in |
| the repo but not on the demo path). |
| 4. `experiments/07_mta_entrances/RESULTS.md`. The canonical |
| register-specialist pattern (the other three follow it). |
| 5. `CLAUDE.md`. Fix the AMD droplet IP (165.245.134.44 → |
| 129.212.182.52) at the same time as the first edit of the day. |
|
|
| ## Status as of 2026-05-03 ~12:50 ET |
|
|
| - Both git remotes (origin + huggingface) up-to-date through |
| `86861be`. |
| - HF Space rebuild was *not* triggered on the FSM-integration |
| commit; do `git push huggingface main` when you want to deploy. |
| (You may want to wait until Monday afternoon so a broken HF |
| rebuild doesn't eat morning time.) |
| - Local Ollama has both `granite4.1:3b` and `granite4.1:8b` warm. |
| - AMD droplet `129.212.182.52` has the `terramind` container |
| running with TerraTorch 1.2.7 + pystac-client + planetary- |
| computer installed in system Python; HF cache populated. |
| - 200-query adversarial set + planner-pivot eval results |
| reproducible from `experiments/06_granite_guardian/` in ~3 min. |
| - Mellea probe still works: `scripts/probe_mellea.py --query |
| "Hollis" --runs 5`. |
| |