riprap-nyc / docs /design_handoff /V0.4.5_SPEC.md
seriffic's picture
docs: replace handoff bundle with v0.4.5 spec
79cf005

Riprap v0.4.5 · Polish on v0.4.4 (Findings region)

Status: design spec, ready for Claude Code implementation. Frame: surgical polish on the live local SvelteKit app. Nothing structural changes. Reuse v0.4.4 components; apply nine deltas.

The v0.4.4 Findings region (five Stones, evidence cards, raster thumbnails, time-series, Capstone meta-card) is shipping correctly. Real queries (e.g., 80 Pioneer Street, Red Hook) render Stones producing genuinely different evidence-card kinds, with provenance collapsing under each Stone, the briefing prose with citations resolving cleanly, and the map with three tier-encoded layers and the address pin. The work below corrects nine specific issues observed in production-shaped local runs.

What v0.4.5 must NOT change

  • Card grammar (header / body / footer / tier badge / source link)
  • The four-section briefing prose structure
  • The Mellea reroll status strip ("Regenerating to satisfy citation grounding · attempt N of N · previous draft dimmed below")
  • The four-tier color palette and glyphs
  • The cold-start state with the v0.4.3 Stones one-liner
  • The trust-signal footer
  • The PDF template's core layout (Stone color print handling is the only new consideration)
  • The map's base style and the existing layer rendering for Sandy / FEMA / 311

What v0.4.5 must NOT introduce

  • New card body variants beyond what v0.4.4 specified
  • Card-level decoration competing with tier badges (Stone color hints are at the Stone region level — at the card level only as a designer-chosen subtle touch like a tinted left-border)
  • Animations on card render
  • New typography or spacing tokens unless something genuinely new emerges from the fixes
  • Mascots, icons, or thematic visual associations for individual Stones beyond their color hint

1. Status semantics: split conflated states

Problem. The top tally currently reads "5 Stones · 15/18 functions fired · 9 evidence cards · 24.0s · 3 error" on a query where Mellea grounding passed 4/4 and the briefing came out clean. Two of those three "errors" are not errors at all:

  • Keystone's mta_entrance_exposure returned "no entrances within radius" — that's the specialist working correctly. Absence of nearby subway entrances at 80 Pioneer is a true and useful finding, not a failure.
  • Lodestone's floodnet_forecast hit its silent-floor (sensor has only 2 historical events, <5 required) — the four-tier discipline working as intended.
  • Lodestone's ttm_311_forecast actually failed (311 history fetch error). That is the only real error.

Three different epistemic states are being rendered as one red square.

Fix. Split the FSM specialist status into five values:

status meaning provenance row treatment counts toward
fired completed and produced output the reconciler used tier-colored square "fired" tally
silent_by_design completed and correctly produced no output open square in neutral tone, italicized message "silent" tally
warned output produced with a non-fatal warning tier-colored square + small warn sidemark "fired" tally + "warnings" count
errored failed to complete, no usable output red square; 1-line collapsed summary; full trace behind click-to-expand (v0.4.2 drilldown pattern) "errored" tally
not_invoked FSM skipped the specialist (precondition unmet) hollow gray square; one-line reason "not invoked" count

Aggregate count rules. Top tally:

5 Stones · 15 fired · 5 silent · 1 errored · 24.0s

(no more rolled-up "errors" number)

Per-Stone summary (replaces "0 cards · 0 fired · 1 error · 30ms"):

Keystone · 0 cards · 5 silent · 30ms
Lodestone · 1 card · 3 fired · 1 silent · 1 errored · 1.5s

Status messages — voice. Engineering-honest, no euphemism. Examples to copy:

  • "no entrances within radius"
  • "sensor has only 2 historical events; forecast omitted (silent-floor: 5)"
  • "PLUTO join skipped: queried address not in NYC PLUTO dataset"
  • "311 history fetch failed: HTTP 503 at NYC OpenData (3 retries)"

Match v0.4.1–v0.4.4 voice — precise, slightly understated, comfortable with technical detail, engineer-to-engineer.

This is the most important fix in v0.4.5. It is the one actively misrepresenting system integrity. Do this first.


2. Capstone meta-card field-mapping

Problem. The Capstone meta-card displays:

mellea reroll: 0 attempts
grounding checks: 0/4 passed
citations resolved: 0
wall-clock: 24.0 s

But the briefing has 4 resolved citations ([1] Sandy, [2] NYC311, [3] Ida, [4] Microtopo); the Mellea status strip explicitly shows "attempt 2 of 2"; the briefing rendered clean (so grounding passed). Three of four metrics are showing zero on a clean run. State plumbing, not logic.

Fix. Wire the four metrics to the reconciler's actual state fields:

display reads from expected for this query
mellea reroll: N attempts per-query reroll count from Mellea (likely melleaState.rerollCount) 1 (one reroll on top of initial)
grounding checks: N/4 passed per-query grounding-check results array → count true 4/4
citations resolved: N count of resolved citations in the final briefing payload 4
wall-clock: NN.N s already correct — no change 24.0 s

Acceptance. The Capstone meta-card on a clean Red Hook run shows 1 / 4/4 / 4 / 24.0 s. On a failed-grounding run it shows the actual numbers (e.g. 2 / 3/4 / 4 / 31.0 s) so the meta-card honestly reports what happened.

The Capstone meta-card is the integrity-narration UI for the entire pipeline. Zero on a clean run undersells the system's integrity. Accurate numbers turn it into a proof point — "this query went through 1 reroll to satisfy 4/4 grounding checks, resolved 4 citations to primary sources, and took 24 seconds end-to-end."


3. Provenance roster: always show full inventory

Problem. Expanded Keystone provenance shows only step-16 · mta_entrance_exposure — no entrances within radius (30ms). Keystone fires five specialists (MTA, NYCHA, DOE, DOH, PLUTO). Four are missing from the UI.

Fix. Each Stone's provenance expander shows the complete roster of specialists the Stone could have fired, with one row per specialist and its status from §1. A reader who expands a Stone sees the full inventory — never a filtered subset. This is the auditability contract.

If a specialist genuinely didn't run, that's not_invoked with a one-line reason. The reader sees the intended roster and understands every absence.

Implementation. The Stone definition (in src/lib/data/stones.ts or wherever the registry lives) should list its full specialist roster. The provenance component renders one row per registry entry, joining against the run's actual specialist outputs. Specialists missing from the run output → not_invoked status, one-line reason from the FSM's skip log.


4. Touchstone — five cards, not three

Problem. Touchstone renders three cards (FloodNet, NYC 311, NOAA CO-OPS). v0.4.4 specified five: those three plus TerraMind LULC (SYN tier) and Prithvi-NYC-Pluvial (MOD tier).

Fix. Wire the structured outputs of step_terramind_lulc and step_prithvi_live (now Prithvi v2) to Touchstone cards.

TerraMind LULC card (Touchstone, SYN tier — synthetic prior, dashed top-rule):

  • Title: "Land use / land cover · TerraMind v1.2"
  • Body variant: existing raster thumbnail (240×120, segmentation overlay) + a horizontal stacked class-mix bar below the thumbnail showing percentage by LULC class. Use the conventional LULC palette (urban / water / vegetation / barren / wetland) — those colors are visual conventions for the layer itself, not new tier signals.
  • Source: TerraMind v1.2
  • Agency: IBM TerraMind v1.2 · Sentinel-2 inputs
  • Vintage: latest Sentinel scene date for the AOI (e.g. Sentinel-2 · 2024-09-18)
  • mapLayer: "terramind-lulc"

Prithvi-NYC-Pluvial card (Touchstone, MOD tier):

  • Title: "Pluvial flood prediction · Prithvi-NYC-Pluvial"
  • Body variant: raster thumbnail (flood-mask overlay) plus a headline scalar above showing flood percentage of AOI.
  • Source: Prithvi-NYC-Pluvial
  • Agency: NASA-IBM Prithvi v2 · NYC fine-tune
  • Vintage: prediction time + Sentinel scene date
  • mapLayer: "prithvi-pluvial"

Layout. v0.4.4's horizontal scrolling treatment handles five cards on wide viewports; on narrow viewports they stack vertically as designed. No new layout work.

If the specialists aren't firing yet (UPDATE_STONES.md commits 4 + 5 may still be in flight): surface that as a backend issue, but the card definitions land in v0.4.5 so they're ready when the data is.


5. Lodestone — both TTM cards, not just the zero-shot

Problem. Lodestone shows one time-series card: zero-shot Granite TTM r2 surge nowcast (9.6h, 6-min cadence). The footer correctly notes "Distinct from the fine-tuned Battery surge nowcast" — but that fine-tuned card is missing.

Fix. Add the fine-tuned TTM card. Both cards live in Lodestone:

Zero-shot card (existing) Fine-tuned card (new)
Title "Storm surge nowcast at The Battery — 9.6 h horizon (regional)" "Storm surge nowcast at The Battery — 96 h horizon (NYC-specialized fine-tune)"
Horizon 9.6 h 96 h
Cadence 6-min hourly
Tier MOD MOD
Body timeseries (existing) timeseries (96-h forecast)
Source label Granite TTM r2 (zero-shot) msradam/Granite-TTM-r2-Battery-Surge
Footer extras "regional disclosure" tag model-card link to HF artifact: huggingface.co/msradam/Granite-TTM-r2-Battery-Surge, RMSE 0.157 m, −35% vs persistence, AMD MI300X badge

The fine-tuned card is a load-bearing piece of the Riprap-on-AMD story for the hackathon submission. Its presence as a Lodestone card alongside the zero-shot card is the visible "this system uses NYC-specialized fine-tuned models you can verify on HuggingFace" claim that closes the loop.

step_ttm_battery_surge in production trace fires at ~1.5s with context_h=1024 · horizon_h=96. If it's firing but not producing a structured output the card layer can render, surface that as a backend issue before v0.4.5 implementation.


6. The "anomaly" tag

Problem. Provenance expanders show Hide provenance · 1 function · anomaly (Keystone) and Hide provenance · 5 functions · anomaly (Lodestone). Overloaded label: doing different work in each context, partially duplicating row-level info.

Decision: drop the "anomaly" tag.

After §1 lands, Stone summaries say Keystone · 0 cards · 5 silent · 30ms (expected behavior — no exposed assets at this address) or Lodestone · 1 card · 3 fired · 1 silent · 1 errored · 1.5s (one specialist errored, visible in the count). The status counts make "anomaly" redundant. Less is more.

(Alternative — retained for record but not recommended: keep "anomaly" and define it as "Stone-level outcome differs from typical: 0 cards landed despite specialists firing, OR ≥1 specialist errored, OR Stone retried." If chosen, document the rule and apply mechanically. But §1's count breakdown carries the same information without the extra label.)


7. LAYERS panel — restructure to mirror Stones

Problem. Production shows the existing three layers (Sandy / FEMA-DEP / 311) in a flat LAYERS panel without Stone grouping. The four new raster layers (TerraMind LULC, TerraMind Buildings, Prithvi-NYC-Pluvial) don't have a home.

Fix. Restructure the LAYERS panel to mirror the Findings Stones structure:

LAYERS

▾ Cornerstone — what NYC's ground remembers
  ◧ Sandy Inundation Zone (2012)        EMP   [on]
  ◨ FEMA / DEP scenarios                MOD   [on]
  ◧ Ida HWM points (2021)               EMP   [off]
  ▥ Microtopography (HAND/TWI)          PRX   [off]

▾ Keystone — what's exposed
  ◉ MTA subway entrances                EMP   [on]
  ▭ NYCHA developments                  EMP   [on]
  ✕ DOE schools                         EMP   [on]
  ● DOH hospitals                       EMP   [on]
  ▦ TerraMind Buildings (current)       SYN   [off]

▾ Touchstone — what's happening now
  ● 311 flood complaints                PRX   [on]
  ◉ FloodNet sensors                    EMP   [on]
  ▥ TerraMind LULC (current)            SYN   [off]
  ▥ Prithvi-NYC-Pluvial flood pred.     MOD   [off]

▾ Lodestone — what's coming
  (no map layers — see Findings cards)

▾ Capstone — synthesis
  (not a map layer)

New layer specs:

  • TerraMind LULC (SYN tier): conventional LULC palette (urban/water/vegetation/barren/wetland). Default off. Label includes Sentinel scene date.
  • TerraMind Buildings (SYN tier): synthetic-prior glyph. Default off.
  • Prithvi-NYC-Pluvial (MOD tier): Modeled-tier color treatment. Label includes Sentinel scene date.

The "no map layers — see Findings cards" label under Lodestone is explicit by design: the TTM Battery Surge is a Lodestone card, not a map layer. Naming the absence prevents the reader from looking for it.


8. Card-to-map hover linking

Problem. v0.4.4 specified hover-to-highlight from card → map element. Production may or may not have this wired.

Fix. Verify and ship the connection:

Card type Map link Hover treatment
FEMA card FEMA AE polygon layer fill opacity bump (0.4 → 0.6), 100ms
HWM tabular HWM contour + points line weight 1px → 2px
Register cards (NYCHA, schools, etc.) corresponding pins pins gain 2px accent ring; on click, fitBounds()
FloodNet sensor card sensor pin pin glow + accent outline
Raster prediction (TerraMind, Prithvi) the matching raster layer layer fill opacity bump
Address card address pin pin pulse (single, 200ms)
TTM Battery Surge (none — not spatial) no map behavior
Capstone meta-card (none) no map behavior

Implementation. Page-level linkedKey state (already specced in v0.4.4 README). Cards set it on pointerenter / focus, clear on pointerleave / blur. Map watches $derived of linkedKey and applies layer-specific class + label badge ("linked: {layer}") bottom-right.

Click-to-fit-bounds for register cards is new in v0.4.5: when clicking a register row inside a register card, the map calls fitBounds() on the affected feature(s) with 80px padding, 400ms easing.


9. Stone-tinted accent colors — light theming

Problem. v0.4.3/v0.4.4 prohibited per-Stone color coding to avoid competing with the four-tier epistemic palette. That prohibition is being relaxed: each Stone gets a single muted accent color the design system can apply as a hint.

Five Stone accent tokens (proposed values — designer can adjust within the constraints below):

token hex name rationale
--stone-cornerstone #7C6F5E warm taupe grounded, soil-adjacent without being literally brown
--stone-keystone #5E6E7C cool slate structural / architectural reading
--stone-touchstone #6B7C66 muted sage present-tense, alive without being signal-green
--stone-lodestone #7C6E5E softened ochre forward-pointing warmth without competing with --accent
--stone-capstone #5E5E6E neutral indigo-gray synthetic, cool, narrative

All five sit at L≈45 in OKLCH, chroma ≤0.04. Verified non-overlap with the four tier hues (which sit at chroma 0–0.10 in different L regions).

Constraints (non-negotiable)

  1. One color per Stone, five total.
  2. Hint-level, not feature-level. These are not signal-carrying like the tier palette. They are decoration that helps a reader navigate the five-region structure on first encounter.
  3. Must not compete with the four-tier palette. Tier badges (filled/open square / dotted ring / hatched square in #0B5394 / #2A6FA8 / #6B6B6B / #2A6FA8+stripe) are the load-bearing epistemic signal. Stone colors must be muted enough or applied lightly enough that they read as decoration. If a reader could mistake a Stone color for a tier badge, it's wrong.
  4. Not technicolor. Carto Positron base + IBM Plex + restrained ink palette is quiet and serious. Five primary-bright colors would shatter that.
  5. Not mascot. No Stone gets an icon paired with its color. No "Cornerstone is brown because earthy" thematic mapping. The colors differentiate five regions; they don't express character.
  6. Print/PDF: all five degrade to neutral gray (#999) in @media print. Each token has a print-media override.
  7. WCAG: any text in or on a Stone color passes contrast. As tints (low-opacity backgrounds), body text stays in standard --ink. As foreground accents, contrast against --paper is verified.

Recommended placement (one to three locations — pick conservatively)

Primary (recommend): a 3px left-rule on each <StoneRegion> header strip, in that Stone's color. Subtle, navigational, hard to confuse with tier badges (which are inside cards, not on region headers).

Secondary (recommend): a 6px colored dot beside each Stone name in the cold-start "How Riprap is built" five-line list. Helps recognition when the reader later sees the Stones in a query result.

Tertiary (optional, designer's call): a Stone-tinted 2px left-border on cards within that Stone's strip. Risk: this brings color to the card chrome, which is where tier badges live. Apply only if the designer judges the chroma low enough that the tint reads as a region cue, not a tier cue. If in doubt, skip.

Methodology page (recommend): the 5×4 tier-Stone matrix uses Stone tints on its row labels (background tint at ~10% opacity, body text in standard ink). The matrix is where Stones-as-architecture is most visually present, and tints help readers parse the rows.

Designer's veto

If the designer judges that adding Stone colors anywhere makes the UI busier without making it more legible, they may recommend keeping the existing zero-color Stone treatment with documented rationale. This is permission, not requirement.

Print-media override

@media print {
  :root {
    --stone-cornerstone: #999;
    --stone-keystone: #999;
    --stone-touchstone: #999;
    --stone-lodestone: #999;
    --stone-capstone: #999;
  }
}

The PDF template's hierarchy is preserved by structure (Stone headings, type scale, rules), not by color. Stone tints are ignored in print.


"v0.4.5 ready" looks like

A query at 80 Pioneer Street, Red Hook produces a Findings region where:

  • Top tally reads 5 Stones · 15 fired · 5 silent · 1 errored · 24.0s (no more 3-error mismatch).
  • Capstone meta-card shows 1 reroll · 4/4 grounding · 4 citations · 24.0s (no more zeroes).
  • Every Stone's expanded provenance shows its full specialist roster, with each row carrying one of the five status states.
  • Touchstone shows five cards (FloodNet, NYC 311, NOAA CO-OPS, TerraMind LULC, Prithvi-NYC-Pluvial).
  • Lodestone shows two TTM time-series cards (zero-shot 9.6h, fine-tuned 96h with HF model-card link).
  • LAYERS panel is grouped by Stone, with the four new raster layers wired and default-off.
  • Hovering any spatial card lights up the corresponding map element; hovering a map layer outlines the corresponding card.
  • "anomaly" tag is gone.
  • Each Stone region carries a 3px left-rule in its accent color; the cold-start list shows colored dots beside Stone names. No card-level chrome shouts color.

The hackathon submission goes in (when public re-hosting completes) with a system that's not just impressive in capability but transparent in operational state and visually composed as a deliberate architecture.