File size: 11,176 Bytes
6a82282
 
 
 
 
 
 
 
 
 
f6423e1
6a82282
 
 
 
 
 
 
 
 
 
 
 
 
 
f6423e1
6a82282
f6423e1
6a82282
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f6423e1
6a82282
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f6423e1
6a82282
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f6423e1
6a82282
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f6423e1
6a82282
 
f6423e1
6a82282
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f6423e1
6a82282
f6423e1
6a82282
 
f6423e1
6a82282
f6423e1
6a82282
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
# Monday handoff (May 4, 2026)

State of the repo at end of Sunday May 3 / overnight into May 4.
Demo is **Sunday May 10**.

## Overnight pass (Sunday evening β†’ Monday)

Eight priorities closed against `audit/2026-05-03-evening-audit.md`:

1. `pitch/cold_open.md` restored (was accidentally deleted in 1cb5ee6).
2. Granite Guardian / refusal-classification leftovers removed.
   Mellea is the sole grounding mechanism, period.
3. **Trace UI is now clickable.** Click any specialist row to reveal
   its raw structured output (formatted JSON, copy button,
   max-height + scroll). This is the auditability contract: every
   claim in the briefing is traceable to the specialist that produced
   it directly inside the UI, not just the citation appendix.
4. Buffered-footprint overlap for the three Point-geometry register
   specialists. NYU Langone / Stuyvesant HS / P.S. 89 now correctly
   register `inside_sandy_2012=true`. Each output records its
   `footprint_buffer_m`.
5. Map renders register-asset pins (subway / school / hospital /
   NYCHA-centroid) coloured by Sandy exposure with click popups
   showing name + `[doc_id]`. NYCHA polygon-fill is queued for when
   `geometry_geojson` lands in the dataclass.
6. **`floodnet_forecast` specialist**. TTM r2 forecast on the
   nearest FloodNet sensor's flood-event recurrence. Reuses the
   (512, 96) singleton already loaded for `ttm_311_forecast`.
   *no new model class loaded into memory*. The strongest single
   TTM win for the NYU CUSP audience.
7. Trace UI groups TTM specialists under one parent node
   `forecasting.granite-timeseries-ttm-r2 [N instances]` so the
   "one foundation model, multiple data streams" architectural story
   is legible without reading per-row metadata.
8. `experiments/` cleanup: dropped two empty dirs (`05_sam2_promptable`,
   `06_chronos_bolt_forecast`), renamed `05_terramind_finetune` β†’
   `05a_terramind_finetune_micro` to dedupe with the active NYC
   fine-tune dir, removed `Riprap.zip` from repo root.

Commit chain: `a2143fc` … through `ed6ae9d`. Morning handoff doc
at `audit/2026-05-04-morning-handoff.md` summarises what to verify
and what's queued next.

## Where Sunday ended

All four keep-list items resolved + 4 register specialists shipped + AMD
fine-tune prep green.

| Item | Status | Path |
|---|---|---|
| Pitch cold-open locked | βœ“ | `pitch/cold_open.md` |
| TerraMind-NYC fine-tune eval spec | βœ“ | `experiments/05_terramind_nyc_finetune/eval/eval_spec.md` |
| 200-query adversarial set + refusal eval | βœ“ (planner pivot) | `experiments/06_granite_guardian/` |
| Subway-entrance specialist (Sheepshead Bay) | βœ“ | `experiments/07_mta_entrances/` |
| NYCHA-developments specialist (Red Hook) | βœ“ | `experiments/08_nycha_developments/` |
| DOE-schools specialist (Coney Island) | βœ“ | `experiments/09_doe_schools/` |
| DOH-hospitals specialist (Coney Island) | βœ“ | `experiments/10_doh_hospitals/` |
| FSM integration of all 4 register specialists | βœ“ | `app/registers/`, `app/fsm.py`, `app/reconcile.py`, `web/static/agent.js` |
| AMD droplet TerraMind smoke + STAC manifest | βœ“ | `129.212.182.52:/root/terramind_nyc/` |

End-to-end smoke on "Coney Island Brooklyn" produced citations
`[mta_entrance_56]`, `[nycha_dev_239]`, `[nycha_dev_166]` alongside
`[rag_mta]` and `[nyc311]`. Family-prefix chip routing works.

Last commit: `86861be` (FSM integration of 4 register specialists).

## Decisions locked

- **Refusal classification dropped entirely.** Planner-level
  classifier hit FN=0% but FP=7% (gate was <5%). Granite Guardian
  itself was already abandoned (laptop-infeasible). After the audit
  surfaced that the planner shim was documented-but-never-wired,
  the decision is now Option C: drop refusal handling. Cold-start
  framing scopes the audience; Mellea rejection sampling enforces
  grounding integrity; the four-tier glyph margin carries the
  epistemic-honesty signal. The `GuardianRefusal.svelte` component
  is deleted (was only ever rendered on a documentation page).
  Demo's integrity beat is the **Mellea grounding-failure reroll on
  the curated Hollis 0.19% β†’ 19% case**. `experiments/06_granite_guardian/`
  is preserved as a "considered and rejected" artifact for the
  methodology paper.
- **AMD path: `129.212.182.52` is production**, not `165.245.134.44`.
  CLAUDE.md says the latter; **fix CLAUDE.md to match reality**.
  Production vLLM is on `.52`. The TerraMind container shares the
  GPU with vLLM; both fit on one MI300X.
- **TerraMind manifest is 1028 paired chips**, 2021-05 β†’ 2026-04,
  NYC 5-borough hull +5 km, S2-cloud <30%, ≀3-day pair window. One
  year (2022-05 β†’ 2023-04) returned 0 due to PC API intermittency.
  acceptable for the micro-fine-tune.

## First thing Monday morning

1. **Refresh Microsoft Planetary Computer signed URLs.** They have
   ~1 hr TTL; the manifest from Sunday evening is stale by morning.
   On the droplet:
   ```bash
   ssh root@129.212.182.52
   docker exec -it terramind bash
   cd /root/terramind_nyc
   python build_manifest.py --refresh-only manifest_train.jsonl
   python build_manifest.py --refresh-only manifest_holdout.jsonl
   ```
   (Recipe is in `/root/terramind_nyc/NOTES.md` on the droplet.)

2. **Kick off TerraMind-NYC fine-tune.** Spec at
   `experiments/05_terramind_nyc_finetune/eval/eval_spec.md`. Budget
   is 30 GPU-hours; alarm at 25 (set on the droplet). Predicted
   actual: ~0.16 GPU-hours at bs=8 / 3 epochs. Don't run anything
   experimental until eval-spec gates pass on the held-out set.

3. **Decide bucket** (A ship-in-demo / B publish-only / C revert):
   - A: ship the fine-tuned checkpoint as a Riprap specialist.
   - B: publish to HF as `msradam/TerraMind-1.0-NYC` with model card,
     don't ship in demo. **Bucket B is fully acceptable** per the
     spec. Civic-tech publication discipline is the durable goal.
   - C: discard checkpoint, no public artefact.

## Working on Monday

- TerraMind-NYC fine-tune (above).
- **Mellea grounding-failure demo prep.** The pitch demo is the
  Hollis 0.19% β†’ 19% case where Granite emits a number with the
  wrong order of magnitude and Mellea catches it. Demo script
  needs to:
  - Show the failed first attempt (banner: "Mellea reroll: numerics
    grounding failed").
  - Show the second attempt with the corrected number.
  - Show the audit panel with the pass/fail per-requirement.
  - Show wall-clock for the reroll (target: under 30 s end-to-end).
  - Currently reproducible via `scripts/probe_mellea.py --query
    "Hollis" --runs 5`. The demo script is the *visual* version.
- **MTA Sandy-recovery citation layer.** Parse the MTA "Hurricane
  Sandy: Three Years Later" report into per-station-id facts so
  the subway-entrance specialist can emit
  `[mta_recovery_<station_id>]` doc messages alongside the
  exposure ones.
- **NYCHA polygon-fill on the map.** Overnight session shipped
  NYCHA developments as centroid pins on the map (graded by
  `pct_inside_sandy β‰₯ 50%`). The next tightening is to add a
  `geometry_geojson` field to `app/registers/nycha.py`'s
  `DevelopmentFinding` dataclass and route through SSE so
  `register-polygons` actually renders graded fills (the layer +
  source are already present in `RipMap.svelte`).
- **PLUTO/Building-Footprints join** for Stuyvesant Town etc.
  Overnight pass shipped buffered-point overlap (NYU Langone,
  Stuyvesant HS, P.S. 89 now correctly flip to
  `inside_sandy_2012=true`). The 100m hospital buffer / 50m school
  buffer is honest but coarse; PLUTO + actual building footprints
  is the next step for the very-large-campus assets.

## Outstanding through Friday

In rough priority order:

1. **More specialists**:
   - FEMA OpenFEMA NFIP claims tract-aggregated (pending).
   - NWS NWPS reach-level forecast + USGS NWIS Bronx / Saw Mill /
     Hutchinson rivers.
   - NYC DEP CSO outfalls + Bluebelt + Green Infrastructure
     specialist (CSS-vs-MS4 distinction for ASCE).
   - Three more TTM r2 specialists (USGS streamgage stage, NWS
     rainfall accumulation, NYC 311 sewer-backup citywide rate).
     **FloodNet forecast already shipped in the overnight pass.**
2. **Visual identity refresh**: Carto Positron, IBM Plex, four-tier
   epistemic palette, WeasyPrint PDF export, trace UI as `<details>`
   tree.
3. **WCAG 2.2 AA pass.**
4. **Methodology paper draft** (6-8 page PDF). Goal: Saturday May 9.
5. **Historical-event mode**. Vintage-cutoff queries. Saturday.
6. **Five Build-in-Public posts** through the week.
7. **5-minute hackathon pitch + 3 demo queries.** Friday rehearsal.
8. **ASCE talk materials**. May 13 (post-hackathon).

## Sharp edges to remember

- **Static assets cache hard.** When iterating on Svelte or
  agent.js, hard-reload (βŒ˜β‡§R). No cache-busting in place.
- **HF Space sleeps after idle.** Free tier; first request after
  sleep is a 30-90 s cold start. Ping the space before any demo.
- **vLLM cold compile.** First few requests against a fresh
  `vllm serve` log surprisingly low throughput while ROCm kernels
  JIT. Run benchmarks 3+ times before believing them.
- **Sandy GeoJSON has self-intersection issues** that blow up
  `unary_union`. Use `buffer(0)` (caught and fixed for NYCHA;
  may surface again for any new polygon-overlap specialist).
- **DEP column is `Flooding_Category` (int16)**, not `depth_class`.
  Documented in NYCHA RESULTS.md.
- **Centroid-edge join false-negatives** on NYU Langone / Stuyvesant
  / P.S. 89 because their centroid points lie just outside the OEM
  Sandy polygon despite real 2012 basement flooding. PLUTO
  footprint join is the queued fix.
- **Don't restart uvicorn while a model is mid-generation.** Ollama
  keeps the request alive but the FastAPI handler dies, leaving
  the user staring at a dead stream.

## Files to read in order on Monday morning

1. This file.
2. `experiments/05_terramind_nyc_finetune/eval/eval_spec.md`. The
   contract for what training output triggers ship/publish/revert.
3. `experiments/06_granite_guardian/RESULTS.md`. The Guardian β†’
   planner pivot decision record (so you know why Guardian is in
   the repo but not on the demo path).
4. `experiments/07_mta_entrances/RESULTS.md`. The canonical
   register-specialist pattern (the other three follow it).
5. `CLAUDE.md`. Fix the AMD droplet IP (165.245.134.44 β†’
   129.212.182.52) at the same time as the first edit of the day.

## Status as of 2026-05-03 ~12:50 ET

- Both git remotes (origin + huggingface) up-to-date through
  `86861be`.
- HF Space rebuild was *not* triggered on the FSM-integration
  commit; do `git push huggingface main` when you want to deploy.
  (You may want to wait until Monday afternoon so a broken HF
  rebuild doesn't eat morning time.)
- Local Ollama has both `granite4.1:3b` and `granite4.1:8b` warm.
- AMD droplet `129.212.182.52` has the `terramind` container
  running with TerraTorch 1.2.7 + pystac-client + planetary-
  computer installed in system Python; HF cache populated.
- 200-query adversarial set + planner-pivot eval results
  reproducible from `experiments/06_granite_guardian/` in ~3 min.
- Mellea probe still works: `scripts/probe_mellea.py --query
  "Hollis" --runs 5`.