Spaces:

lablab-ai-amd-developer-hackathon
/

riprap-nyc

Running

App Files Files Community

riprap-nyc / MONDAY.md

seriffic

Voice pass: strip em-dashes from user-facing docs

f6423e1 2 days ago

preview code

raw

history blame contribute delete

11.2 kB

Monday handoff (May 4, 2026)

State of the repo at end of Sunday May 3 / overnight into May 4. Demo is Sunday May 10.

Overnight pass (Sunday evening → Monday)

Eight priorities closed against audit/2026-05-03-evening-audit.md:

pitch/cold_open.md restored (was accidentally deleted in 1cb5ee6).
Granite Guardian / refusal-classification leftovers removed. Mellea is the sole grounding mechanism, period.
Trace UI is now clickable. Click any specialist row to reveal its raw structured output (formatted JSON, copy button, max-height + scroll). This is the auditability contract: every claim in the briefing is traceable to the specialist that produced it directly inside the UI, not just the citation appendix.
Buffered-footprint overlap for the three Point-geometry register specialists. NYU Langone / Stuyvesant HS / P.S. 89 now correctly register inside_sandy_2012=true. Each output records its footprint_buffer_m.
Map renders register-asset pins (subway / school / hospital / NYCHA-centroid) coloured by Sandy exposure with click popups showing name + [doc_id]. NYCHA polygon-fill is queued for when geometry_geojson lands in the dataclass.
floodnet_forecast specialist. TTM r2 forecast on the nearest FloodNet sensor's flood-event recurrence. Reuses the (512, 96) singleton already loaded for ttm_311_forecast. no new model class loaded into memory. The strongest single TTM win for the NYU CUSP audience.
Trace UI groups TTM specialists under one parent node forecasting.granite-timeseries-ttm-r2 [N instances] so the "one foundation model, multiple data streams" architectural story is legible without reading per-row metadata.
experiments/ cleanup: dropped two empty dirs (05_sam2_promptable, 06_chronos_bolt_forecast), renamed 05_terramind_finetune → 05a_terramind_finetune_micro to dedupe with the active NYC fine-tune dir, removed Riprap.zip from repo root.

Commit chain: a2143fc … through ed6ae9d. Morning handoff doc at audit/2026-05-04-morning-handoff.md summarises what to verify and what's queued next.

Where Sunday ended

All four keep-list items resolved + 4 register specialists shipped + AMD fine-tune prep green.

Item	Status	Path
Pitch cold-open locked	✓	`pitch/cold_open.md`
TerraMind-NYC fine-tune eval spec	✓	`experiments/05_terramind_nyc_finetune/eval/eval_spec.md`
200-query adversarial set + refusal eval	✓ (planner pivot)	`experiments/06_granite_guardian/`
Subway-entrance specialist (Sheepshead Bay)	✓	`experiments/07_mta_entrances/`
NYCHA-developments specialist (Red Hook)	✓	`experiments/08_nycha_developments/`
DOE-schools specialist (Coney Island)	✓	`experiments/09_doe_schools/`
DOH-hospitals specialist (Coney Island)	✓	`experiments/10_doh_hospitals/`
FSM integration of all 4 register specialists	✓	`app/registers/`, `app/fsm.py`, `app/reconcile.py`, `web/static/agent.js`
AMD droplet TerraMind smoke + STAC manifest	✓	`129.212.182.52:/root/terramind_nyc/`

End-to-end smoke on "Coney Island Brooklyn" produced citations [mta_entrance_56], [nycha_dev_239], [nycha_dev_166] alongside [rag_mta] and [nyc311]. Family-prefix chip routing works.

Last commit: 86861be (FSM integration of 4 register specialists).

Decisions locked

Refusal classification dropped entirely. Planner-level classifier hit FN=0% but FP=7% (gate was <5%). Granite Guardian itself was already abandoned (laptop-infeasible). After the audit surfaced that the planner shim was documented-but-never-wired, the decision is now Option C: drop refusal handling. Cold-start framing scopes the audience; Mellea rejection sampling enforces grounding integrity; the four-tier glyph margin carries the epistemic-honesty signal. The GuardianRefusal.svelte component is deleted (was only ever rendered on a documentation page). Demo's integrity beat is the Mellea grounding-failure reroll on the curated Hollis 0.19% → 19% case. experiments/06_granite_guardian/ is preserved as a "considered and rejected" artifact for the methodology paper.
AMD path: 129.212.182.52 is production, not 165.245.134.44. CLAUDE.md says the latter; fix CLAUDE.md to match reality. Production vLLM is on .52. The TerraMind container shares the GPU with vLLM; both fit on one MI300X.
TerraMind manifest is 1028 paired chips, 2021-05 → 2026-04, NYC 5-borough hull +5 km, S2-cloud <30%, ≤3-day pair window. One year (2022-05 → 2023-04) returned 0 due to PC API intermittency. acceptable for the micro-fine-tune.

First thing Monday morning

Refresh Microsoft Planetary Computer signed URLs. They have ~1 hr TTL; the manifest from Sunday evening is stale by morning. On the droplet:

ssh root@129.212.182.52
docker exec -it terramind bash
cd /root/terramind_nyc
python build_manifest.py --refresh-only manifest_train.jsonl
python build_manifest.py --refresh-only manifest_holdout.jsonl

(Recipe is in /root/terramind_nyc/NOTES.md on the droplet.)

Kick off TerraMind-NYC fine-tune. Spec at experiments/05_terramind_nyc_finetune/eval/eval_spec.md. Budget is 30 GPU-hours; alarm at 25 (set on the droplet). Predicted actual: ~0.16 GPU-hours at bs=8 / 3 epochs. Don't run anything experimental until eval-spec gates pass on the held-out set.
Decide bucket (A ship-in-demo / B publish-only / C revert):
- A: ship the fine-tuned checkpoint as a Riprap specialist.
- B: publish to HF as msradam/TerraMind-1.0-NYC with model card, don't ship in demo. Bucket B is fully acceptable per the spec. Civic-tech publication discipline is the durable goal.
- C: discard checkpoint, no public artefact.

Working on Monday

TerraMind-NYC fine-tune (above).
Mellea grounding-failure demo prep. The pitch demo is the Hollis 0.19% → 19% case where Granite emits a number with the wrong order of magnitude and Mellea catches it. Demo script needs to:
- Show the failed first attempt (banner: "Mellea reroll: numerics grounding failed").
- Show the second attempt with the corrected number.
- Show the audit panel with the pass/fail per-requirement.
- Show wall-clock for the reroll (target: under 30 s end-to-end).
- Currently reproducible via scripts/probe_mellea.py --query "Hollis" --runs 5. The demo script is the visual version.
MTA Sandy-recovery citation layer. Parse the MTA "Hurricane Sandy: Three Years Later" report into per-station-id facts so the subway-entrance specialist can emit [mta_recovery_<station_id>] doc messages alongside the exposure ones.
NYCHA polygon-fill on the map. Overnight session shipped NYCHA developments as centroid pins on the map (graded by pct_inside_sandy ≥ 50%). The next tightening is to add a geometry_geojson field to app/registers/nycha.py's DevelopmentFinding dataclass and route through SSE so register-polygons actually renders graded fills (the layer + source are already present in RipMap.svelte).
PLUTO/Building-Footprints join for Stuyvesant Town etc. Overnight pass shipped buffered-point overlap (NYU Langone, Stuyvesant HS, P.S. 89 now correctly flip to inside_sandy_2012=true). The 100m hospital buffer / 50m school buffer is honest but coarse; PLUTO + actual building footprints is the next step for the very-large-campus assets.

Outstanding through Friday

In rough priority order:

More specialists:
- FEMA OpenFEMA NFIP claims tract-aggregated (pending).
- NWS NWPS reach-level forecast + USGS NWIS Bronx / Saw Mill / Hutchinson rivers.
- NYC DEP CSO outfalls + Bluebelt + Green Infrastructure specialist (CSS-vs-MS4 distinction for ASCE).
- Three more TTM r2 specialists (USGS streamgage stage, NWS rainfall accumulation, NYC 311 sewer-backup citywide rate). FloodNet forecast already shipped in the overnight pass.
Visual identity refresh: Carto Positron, IBM Plex, four-tier epistemic palette, WeasyPrint PDF export, trace UI as <details> tree.
WCAG 2.2 AA pass.
Methodology paper draft (6-8 page PDF). Goal: Saturday May 9.
Historical-event mode. Vintage-cutoff queries. Saturday.
Five Build-in-Public posts through the week.
5-minute hackathon pitch + 3 demo queries. Friday rehearsal.
ASCE talk materials. May 13 (post-hackathon).

Sharp edges to remember

Static assets cache hard. When iterating on Svelte or agent.js, hard-reload (⌘⇧R). No cache-busting in place.
HF Space sleeps after idle. Free tier; first request after sleep is a 30-90 s cold start. Ping the space before any demo.
vLLM cold compile. First few requests against a fresh vllm serve log surprisingly low throughput while ROCm kernels JIT. Run benchmarks 3+ times before believing them.
Sandy GeoJSON has self-intersection issues that blow up unary_union. Use buffer(0) (caught and fixed for NYCHA; may surface again for any new polygon-overlap specialist).
DEP column is Flooding_Category (int16), not depth_class. Documented in NYCHA RESULTS.md.
Centroid-edge join false-negatives on NYU Langone / Stuyvesant / P.S. 89 because their centroid points lie just outside the OEM Sandy polygon despite real 2012 basement flooding. PLUTO footprint join is the queued fix.
Don't restart uvicorn while a model is mid-generation. Ollama keeps the request alive but the FastAPI handler dies, leaving the user staring at a dead stream.

Files to read in order on Monday morning

This file.
experiments/05_terramind_nyc_finetune/eval/eval_spec.md. The contract for what training output triggers ship/publish/revert.
experiments/06_granite_guardian/RESULTS.md. The Guardian → planner pivot decision record (so you know why Guardian is in the repo but not on the demo path).
experiments/07_mta_entrances/RESULTS.md. The canonical register-specialist pattern (the other three follow it).
CLAUDE.md. Fix the AMD droplet IP (165.245.134.44 → 129.212.182.52) at the same time as the first edit of the day.

Status as of 2026-05-03 ~12:50 ET

Both git remotes (origin + huggingface) up-to-date through 86861be.
HF Space rebuild was not triggered on the FSM-integration commit; do git push huggingface main when you want to deploy. (You may want to wait until Monday afternoon so a broken HF rebuild doesn't eat morning time.)
Local Ollama has both granite4.1:3b and granite4.1:8b warm.
AMD droplet 129.212.182.52 has the terramind container running with TerraTorch 1.2.7 + pystac-client + planetary- computer installed in system Python; HF cache populated.
200-query adversarial set + planner-pivot eval results reproducible from experiments/06_granite_guardian/ in ~3 min.
Mellea probe still works: scripts/probe_mellea.py --query "Hollis" --runs 5.