# Monday handoff (May 4, 2026) State of the repo at end of Sunday May 3 / overnight into May 4. Demo is **Sunday May 10**. ## Overnight pass (Sunday evening → Monday) Eight priorities closed against `audit/2026-05-03-evening-audit.md`: 1. `pitch/cold_open.md` restored (was accidentally deleted in 1cb5ee6). 2. Granite Guardian / refusal-classification leftovers removed. Mellea is the sole grounding mechanism, period. 3. **Trace UI is now clickable.** Click any specialist row to reveal its raw structured output (formatted JSON, copy button, max-height + scroll). This is the auditability contract: every claim in the briefing is traceable to the specialist that produced it directly inside the UI, not just the citation appendix. 4. Buffered-footprint overlap for the three Point-geometry register specialists. NYU Langone / Stuyvesant HS / P.S. 89 now correctly register `inside_sandy_2012=true`. Each output records its `footprint_buffer_m`. 5. Map renders register-asset pins (subway / school / hospital / NYCHA-centroid) coloured by Sandy exposure with click popups showing name + `[doc_id]`. NYCHA polygon-fill is queued for when `geometry_geojson` lands in the dataclass. 6. **`floodnet_forecast` specialist**. TTM r2 forecast on the nearest FloodNet sensor's flood-event recurrence. Reuses the (512, 96) singleton already loaded for `ttm_311_forecast`. *no new model class loaded into memory*. The strongest single TTM win for the NYU CUSP audience. 7. Trace UI groups TTM specialists under one parent node `forecasting.granite-timeseries-ttm-r2 [N instances]` so the "one foundation model, multiple data streams" architectural story is legible without reading per-row metadata. 8. `experiments/` cleanup: dropped two empty dirs (`05_sam2_promptable`, `06_chronos_bolt_forecast`), renamed `05_terramind_finetune` → `05a_terramind_finetune_micro` to dedupe with the active NYC fine-tune dir, removed `Riprap.zip` from repo root. Commit chain: `a2143fc` … through `ed6ae9d`. Morning handoff doc at `audit/2026-05-04-morning-handoff.md` summarises what to verify and what's queued next. ## Where Sunday ended All four keep-list items resolved + 4 register specialists shipped + AMD fine-tune prep green. | Item | Status | Path | |---|---|---| | Pitch cold-open locked | ✓ | `pitch/cold_open.md` | | TerraMind-NYC fine-tune eval spec | ✓ | `experiments/05_terramind_nyc_finetune/eval/eval_spec.md` | | 200-query adversarial set + refusal eval | ✓ (planner pivot) | `experiments/06_granite_guardian/` | | Subway-entrance specialist (Sheepshead Bay) | ✓ | `experiments/07_mta_entrances/` | | NYCHA-developments specialist (Red Hook) | ✓ | `experiments/08_nycha_developments/` | | DOE-schools specialist (Coney Island) | ✓ | `experiments/09_doe_schools/` | | DOH-hospitals specialist (Coney Island) | ✓ | `experiments/10_doh_hospitals/` | | FSM integration of all 4 register specialists | ✓ | `app/registers/`, `app/fsm.py`, `app/reconcile.py`, `web/static/agent.js` | | AMD droplet TerraMind smoke + STAC manifest | ✓ | `129.212.182.52:/root/terramind_nyc/` | End-to-end smoke on "Coney Island Brooklyn" produced citations `[mta_entrance_56]`, `[nycha_dev_239]`, `[nycha_dev_166]` alongside `[rag_mta]` and `[nyc311]`. Family-prefix chip routing works. Last commit: `86861be` (FSM integration of 4 register specialists). ## Decisions locked - **Refusal classification dropped entirely.** Planner-level classifier hit FN=0% but FP=7% (gate was <5%). Granite Guardian itself was already abandoned (laptop-infeasible). After the audit surfaced that the planner shim was documented-but-never-wired, the decision is now Option C: drop refusal handling. Cold-start framing scopes the audience; Mellea rejection sampling enforces grounding integrity; the four-tier glyph margin carries the epistemic-honesty signal. The `GuardianRefusal.svelte` component is deleted (was only ever rendered on a documentation page). Demo's integrity beat is the **Mellea grounding-failure reroll on the curated Hollis 0.19% → 19% case**. `experiments/06_granite_guardian/` is preserved as a "considered and rejected" artifact for the methodology paper. - **AMD path: `129.212.182.52` is production**, not `165.245.134.44`. CLAUDE.md says the latter; **fix CLAUDE.md to match reality**. Production vLLM is on `.52`. The TerraMind container shares the GPU with vLLM; both fit on one MI300X. - **TerraMind manifest is 1028 paired chips**, 2021-05 → 2026-04, NYC 5-borough hull +5 km, S2-cloud <30%, ≤3-day pair window. One year (2022-05 → 2023-04) returned 0 due to PC API intermittency. acceptable for the micro-fine-tune. ## First thing Monday morning 1. **Refresh Microsoft Planetary Computer signed URLs.** They have ~1 hr TTL; the manifest from Sunday evening is stale by morning. On the droplet: ```bash ssh root@129.212.182.52 docker exec -it terramind bash cd /root/terramind_nyc python build_manifest.py --refresh-only manifest_train.jsonl python build_manifest.py --refresh-only manifest_holdout.jsonl ``` (Recipe is in `/root/terramind_nyc/NOTES.md` on the droplet.) 2. **Kick off TerraMind-NYC fine-tune.** Spec at `experiments/05_terramind_nyc_finetune/eval/eval_spec.md`. Budget is 30 GPU-hours; alarm at 25 (set on the droplet). Predicted actual: ~0.16 GPU-hours at bs=8 / 3 epochs. Don't run anything experimental until eval-spec gates pass on the held-out set. 3. **Decide bucket** (A ship-in-demo / B publish-only / C revert): - A: ship the fine-tuned checkpoint as a Riprap specialist. - B: publish to HF as `msradam/TerraMind-1.0-NYC` with model card, don't ship in demo. **Bucket B is fully acceptable** per the spec. Civic-tech publication discipline is the durable goal. - C: discard checkpoint, no public artefact. ## Working on Monday - TerraMind-NYC fine-tune (above). - **Mellea grounding-failure demo prep.** The pitch demo is the Hollis 0.19% → 19% case where Granite emits a number with the wrong order of magnitude and Mellea catches it. Demo script needs to: - Show the failed first attempt (banner: "Mellea reroll: numerics grounding failed"). - Show the second attempt with the corrected number. - Show the audit panel with the pass/fail per-requirement. - Show wall-clock for the reroll (target: under 30 s end-to-end). - Currently reproducible via `scripts/probe_mellea.py --query "Hollis" --runs 5`. The demo script is the *visual* version. - **MTA Sandy-recovery citation layer.** Parse the MTA "Hurricane Sandy: Three Years Later" report into per-station-id facts so the subway-entrance specialist can emit `[mta_recovery_]` doc messages alongside the exposure ones. - **NYCHA polygon-fill on the map.** Overnight session shipped NYCHA developments as centroid pins on the map (graded by `pct_inside_sandy ≥ 50%`). The next tightening is to add a `geometry_geojson` field to `app/registers/nycha.py`'s `DevelopmentFinding` dataclass and route through SSE so `register-polygons` actually renders graded fills (the layer + source are already present in `RipMap.svelte`). - **PLUTO/Building-Footprints join** for Stuyvesant Town etc. Overnight pass shipped buffered-point overlap (NYU Langone, Stuyvesant HS, P.S. 89 now correctly flip to `inside_sandy_2012=true`). The 100m hospital buffer / 50m school buffer is honest but coarse; PLUTO + actual building footprints is the next step for the very-large-campus assets. ## Outstanding through Friday In rough priority order: 1. **More specialists**: - FEMA OpenFEMA NFIP claims tract-aggregated (pending). - NWS NWPS reach-level forecast + USGS NWIS Bronx / Saw Mill / Hutchinson rivers. - NYC DEP CSO outfalls + Bluebelt + Green Infrastructure specialist (CSS-vs-MS4 distinction for ASCE). - Three more TTM r2 specialists (USGS streamgage stage, NWS rainfall accumulation, NYC 311 sewer-backup citywide rate). **FloodNet forecast already shipped in the overnight pass.** 2. **Visual identity refresh**: Carto Positron, IBM Plex, four-tier epistemic palette, WeasyPrint PDF export, trace UI as `
` tree. 3. **WCAG 2.2 AA pass.** 4. **Methodology paper draft** (6-8 page PDF). Goal: Saturday May 9. 5. **Historical-event mode**. Vintage-cutoff queries. Saturday. 6. **Five Build-in-Public posts** through the week. 7. **5-minute hackathon pitch + 3 demo queries.** Friday rehearsal. 8. **ASCE talk materials**. May 13 (post-hackathon). ## Sharp edges to remember - **Static assets cache hard.** When iterating on Svelte or agent.js, hard-reload (⌘⇧R). No cache-busting in place. - **HF Space sleeps after idle.** Free tier; first request after sleep is a 30-90 s cold start. Ping the space before any demo. - **vLLM cold compile.** First few requests against a fresh `vllm serve` log surprisingly low throughput while ROCm kernels JIT. Run benchmarks 3+ times before believing them. - **Sandy GeoJSON has self-intersection issues** that blow up `unary_union`. Use `buffer(0)` (caught and fixed for NYCHA; may surface again for any new polygon-overlap specialist). - **DEP column is `Flooding_Category` (int16)**, not `depth_class`. Documented in NYCHA RESULTS.md. - **Centroid-edge join false-negatives** on NYU Langone / Stuyvesant / P.S. 89 because their centroid points lie just outside the OEM Sandy polygon despite real 2012 basement flooding. PLUTO footprint join is the queued fix. - **Don't restart uvicorn while a model is mid-generation.** Ollama keeps the request alive but the FastAPI handler dies, leaving the user staring at a dead stream. ## Files to read in order on Monday morning 1. This file. 2. `experiments/05_terramind_nyc_finetune/eval/eval_spec.md`. The contract for what training output triggers ship/publish/revert. 3. `experiments/06_granite_guardian/RESULTS.md`. The Guardian → planner pivot decision record (so you know why Guardian is in the repo but not on the demo path). 4. `experiments/07_mta_entrances/RESULTS.md`. The canonical register-specialist pattern (the other three follow it). 5. `CLAUDE.md`. Fix the AMD droplet IP (165.245.134.44 → 129.212.182.52) at the same time as the first edit of the day. ## Status as of 2026-05-03 ~12:50 ET - Both git remotes (origin + huggingface) up-to-date through `86861be`. - HF Space rebuild was *not* triggered on the FSM-integration commit; do `git push huggingface main` when you want to deploy. (You may want to wait until Monday afternoon so a broken HF rebuild doesn't eat morning time.) - Local Ollama has both `granite4.1:3b` and `granite4.1:8b` warm. - AMD droplet `129.212.182.52` has the `terramind` container running with TerraTorch 1.2.7 + pystac-client + planetary- computer installed in system Python; HF cache populated. - 200-query adversarial set + planner-pivot eval results reproducible from `experiments/06_granite_guardian/` in ~3 min. - Mellea probe still works: `scripts/probe_mellea.py --query "Hollis" --runs 5`.