File size: 41,286 Bytes
f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 caa28aa 6a82282 caa28aa 6a82282 caa28aa 6a82282 caa28aa 6a82282 caa28aa 6a82282 f6423e1 caa28aa 6a82282 f6423e1 6a82282 caa28aa 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 caa28aa 6a82282 caa28aa 6a82282 caa28aa 6a82282 f6423e1 6a82282 f6423e1 6a82282 f6423e1 6a82282 caa28aa 6a82282 caa28aa 6a82282 caa28aa 6a82282 caa28aa 6a82282 f6423e1 6a82282 caa28aa 6a82282 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 | # Riprap architecture
> **What it is.** A web tool that takes any NYC address and produces a
> short, citation-grounded **flood-exposure briefing**. A tier (1β4)
> with a paragraph of evidence, where every numeric claim links back to
> the specific dataset, agency report, or model output it came from.
>
> **Who it's for.** Urban planners, journalists on deadline, NYCEM
> grant writers filing FEMA BRIC sub-applications, agency capital
> planners, researchers under FOIL/IRB constraints. Not consumers
> shopping for flood insurance.
>
> **Why local foundation models.** A newsroom with FOIL'd documents
> can't paste them into a vendor LLM. We run Granite 4.1 (3 B-param
> chat model), Granite Embedding 278M (RAG), Prithvi-EO 2.0 (300 M-param
> Earth-observation model, offline pre-compute) and Granite TimeSeries
> TTM r2 (1.5 M-param zero-shot forecaster) inside one container. No
> vendor LLM is contacted at runtime.
---
## 1. A 60-second primer on NYC flooding
Skip if you already know this. Most architecture docs assume you do.
This one doesn't.
### 1.1 Three kinds of flood
NYC gets hit by three flood mechanisms that look completely different
on a map and are caused by different physics:
- **Coastal / surge flooding**. The ocean rises into the city.
Driven by storm surge (wind pushing water against the coast),
astronomical high tide, and wave run-up. Affects the **shoreline:**
Brighton Beach, Coney Island, Red Hook, Lower Manhattan, the
Rockaways, Staten Island east shore. **Hurricane Sandy 2012** is
the canonical event. Water came over the seawall and flooded
subway tunnels, hospitals, and electrical substations. Affects
buildings that were dry that morning.
- **Pluvial / stormwater flooding**. Rain falls faster than the
drainage system can carry it away. Affects **inland low points,
basement apartments, and chronically under-sewered neighborhoods**:
Hollis (Queens), Carroll Gardens (Brooklyn), Jamaica. **Hurricane
Ida 2021** is the canonical event for NYC. Most of the deaths
were in basement apartments far from any coast. Optical satellites
largely *can't see* this kind of flooding because the water drains
fast and is often sub-surface.
- **Compound flooding**. Coastal + pluvial happening at the same
time, with groundwater rising too. Currently the active research
frontier (NPCC4 Ch. 3 calls it out explicitly). Most agencies model
these mechanisms separately; reality combines them.
A good civic flood tool has to cover all three and be honest about
what each signal can and cannot see. Riprap surfaces evidence for all
three but **doesn't predict damage**. See scope below.
### 1.2 Empirical vs modeled vs proxy
Each piece of flood evidence falls into one of three classes, and the
distinction matters for how much weight to give it:
- **Empirical**. Something flooded a place and was measured. USGS
high-water marks (people went out after Hurricane Ida and surveyed
where water reached on building walls). The 2012 Sandy Inundation
Zone (mapped by the city after the storm). FloodNet ultrasonic
sensors that recorded an actual depth. **Highest-confidence**: this
flood happened here.
- **Modeled scenarios**. Hydraulic models simulate "what if" cases.
FEMA's regulatory floodplains (1 % and 0.2 % annual chance). NYC
DEP's Stormwater Maps (modeled water depth under three rainfall
scenarios with varying sea-level-rise assumptions). **Useful but
scenario-bounded**: this could happen here under those conditions.
- **Proxy signals**. Indirect indicators of flooding. NYC 311
complaints ("street flooding", "sewer backup") clustering around an
address. Topographic indices (HAND, TWI) suggesting water *would*
pool here based on terrain. **Useful but biased**: 311 reflects
civic engagement as well as flooding; terrain says nothing about
drainage capacity.
Riprap surfaces all three classes. The score weights them in that
order (empirical > modeled > proxy), with empirical hits granted a
**floor rule**. See [Β§5](#5-the-scoring-rubric).
### 1.3 Hydrology indices used in this app
Two terrain-derived numbers come up repeatedly. They're cheap to
compute from a Digital Elevation Model (DEM) and they're the
hydrological literature's canonical exposure proxies:
- **HAND (Height Above Nearest Drainage)**. Vertical distance from
the address up to the nearest river/drainage channel. **<1 m** = at
drainage level (water *will* reach here in flood). **>10 m** =
hillslope (very dry). Nobre et al. 2011.
- **TWI (Topographic Wetness Index)**. `ln(catchment_area / tan
slope)`. **High TWI** = water tends to accumulate here (large
contributing area, gentle slope). Beven & Kirkby 1979.
Neither is a flood prediction; both are exposure indicators that say
"water *would* pool here based on terrain alone."
---
## 2. What Riprap actually produces
For a given address (or any of three modes; see [Β§4](#4-three-user-modes)),
Riprap returns:
1. **A tier 1β4** computed by a deterministic, published rubric
([Β§5](#5-the-scoring-rubric)). Tier 1 = "high exposure"; Tier 4 =
"limited exposure"; Tier 0 = "no flagged exposure."
2. **A 4-section briefing paragraph** synthesised by Granite 4.1 with
`[doc_id]` citations after every numeric claim. Sections:
*Status*, *Empirical evidence*, *Modeled scenarios*, *Policy
context*. A section is omitted entirely if no specialist fired for
it (silence-over-confabulation contract).
3. **Evidence cards**. One per fired specialist, with the raw values
and a link to the source dataset.
4. **Map overlay**. The address pinned, with the empirical and
modeled flood extents that overlap it.
5. **Live "right now" signals**. Active NWS flood alerts, current
tide residual at the Battery, recent precipitation at the nearest
ASOS, and a Granite TTM short-horizon forecast of the surge
residual. **These do not modify the tier** (per IPCC AR6 WG II's
distinction between exposure and event occurrence).
The full output is a JSON blob with all specialist outputs preserved,
so a journalist or planner can audit every number that appears in the
prose.
---
## 3. The Burr FSM and how the specialists chain
Riprap is a **state machine**, a Burr FSM (DAGWorks), that walks
through a fixed list of "specialist" functions in order. Each
specialist either produces a structured fact or stays silent. At the
end, the reconciler reads all the produced facts and writes the
paragraph.
The full chain, in execution order:
```
βββββββββββββββββββββββββββββββ
query βββΊ β 1. geocode (DCP Geosearch) β address text β lat/lon, BBL, borough
ββββββββββββββ¬βββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β STATIC EMPIRICAL + REGULATORY LAYERS β
β (snapshot of city-published flood layers) β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β 2. sandy in 2012 Sandy zone? Y/N β empirical
β 3. dep_stormwater in 3 modeled scenarios? β modeled
β 4. floodnet live sensor history β empirical
β 5. nyc311 flood complaints in 200m β proxy
ββββββββββββββ¬βββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β LIVE "RIGHT NOW" LAYER β
β (out of static score; reported separately) β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β 6. noaa_tides Battery / Kings Pt level β live, 6-min
β 7. nws_alerts active flood-relevant β live
β 8. nws_obs nearest ASOS recent precip β live
β 9. ttm_forecast 9.6h surge-residual nowcastβ Granite TTM r2
ββββββββββββββ¬βββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β TERRAIN + EVENT-LEVEL EMPIRICAL LAYERS β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β 10. microtopo DEM + TWI + HAND at point β proxy
β 11. ida_hwm USGS Ida 2021 HWM proximityβ empirical
β 12. prithvi Prithvi-EO Ida flood polys β empirical (model-derived)
ββββββββββββββ¬βββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β 10. microtopo DEM + TWI + HAND at point β proxy
β 11. ida_hwm USGS Ida 2021 HWM proximityβ empirical
β 12. mta_entrance MTA subway entrance exposureβ empirical
β 13. prithvi_v2 Prithvi Ida flood polys β empirical (model-derived)
β 14. prithvi_live live Prithvi inference β (gpu-only; skipped cpu-basic)
β 15. terramind TerraMind LULC synthesis β (gpu-only; skipped cpu-basic)
ββββββββββββββ¬βββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β 16. rag (Granite Embedding 278M) β retrieves policy paragraphs
β query corpus of 5 NYC agency PDFs β relevant to this address
β 17. gliner_extract (GLiNER medium-v2.1) β entity extraction over RAG hits
ββββββββββββββ¬βββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β 18. reconcile (Granite 4.1 :8b) β document-grounded synthesis
β reads all "documents" produced by 1-17 β β 4-section cited paragraph
β Mellea rejects ungrounded outputs β β audit trail
ββββββββββββββ¬βββββββββββββββββββββββββββββββββ
βΌ
cited briefing
+ tier badge + evidence cards + map
```
The `single_address` path emits **24 step events** (including the live
sub-specialists ttm_311_forecast, ttm_battery_surge, floodnet_forecast,
and the eo_chip_fetch / terramind_lulc / terramind_buildings gates).
`neighborhood` emits 8β10 steps (NTA-level specialists only; per-address
registers don't run).
Each step is implemented as a `@action` in `app/fsm.py`. The Burr
runtime handles state-passing between actions and emits a trace record
per step (timing, ok/err, summary fields) which the front-end shows live.
### 3.1 What every specialist does, plain language
| # | Specialist | Plain-language description | Class |
|---|---------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|
| 1 | **geocode** | Resolve the user's text ("116-50 Sutphin Blvd, Queens") to a (lat, lon) and a NYC tax-lot ID (BBL). Uses NYC Planning's free Geosearch API. | n/a |
| 2 | **sandy** | Did the address get flooded by Hurricane Sandy in 2012? Point-in-polygon over the official NYC Sandy Inundation Zone. | empirical |
| 3 | **dep_stormwater** | Three modeled stormwater-flooding scenarios from NYC DEP: Moderate-2050, Extreme-2080, Tidal-2050. Each tells you depth (none / 0.4β0.8 ft / etc.) at this point. | modeled |
| 4 | **floodnet** | NYC's ultrasonic flood-sensor network. How many sensors are within 600 m, and have any of them registered a flood event in the last 3 years? | empirical |
| 5 | **nyc311** | The 311 service-request archive. How many flood-related complaints (street flooding, sewer backup, catch-basin clogged) within 200 m of the address over the last 5 years? | proxy |
| 6 | **noaa_tides** *(live)* | Current tide observation at the nearest of three NOAA gauges (Battery / Kings Pt / Sandy Hook). Reports observed water level, predicted astronomical tide, and the **residual** (β surge). | live |
| 7 | **nws_alerts** *(live)* | Are there active NWS flood-relevant alerts at this point right now? Flash Flood Warnings, Coastal Flood Advisories, etc. | live |
| 8 | **nws_obs** *(live)* | Recent precipitation from the nearest airport ASOS station (KNYC / KLGA / KJFK / KEWR / KFRG). | live |
| 9 | **ttm_forecast** *(live)* | Granite TTM r2 zero-shot forecast of the surge **residual** at the Battery for the next ~9.6 h. NOAA already publishes the astronomical tide; TTM forecasts the part NOAA doesn't. | live (model-derived) |
| 10 | **microtopo** | LiDAR-derived terrain features at the point: elevation, HAND, TWI, local relief percentile. | proxy |
| 11 | **ida_hwm** | USGS Hurricane Ida 2021 high-water marks. Actual measured water heights surveyed in the days after the storm. | empirical |
| 12 | **mta_entrance_exposure** | MTA subway entrances within radius: how many, how many are inside Sandy 2012 zone, how many are in DEP Extreme-2080. | empirical |
| 13 | **prithvi_eo_v2** | Pre-computed point-in-polygon against 166 Prithvi-derived Ida 2021 flood polygons (offline-built; instant at request time). | empirical (model-derived) |
| 14 | **prithvi_eo_live** / **terramind_synthesis** | Live Prithvi / TerraMind inference over fresh EO chips. GPU-only; silenced (deterministic skip) on cpu-basic HF Space. | empirical (model-derived) |
| 15 | **rag** | Granite Embedding 278M retrieves the most-relevant paragraphs from 5 NYC policy PDFs (Comptroller, NPCC4, MTA, NYCHA, ConEd) given the address's borough + which scenarios fired. | policy |
| 16 | **gliner_extract** | GLiNER medium-v2.1 runs named-entity extraction over the RAG-retrieved paragraphs: locations, agencies, dates, infrastructure-project names. Results ride into the reconciler as additional grounding context. | ancillary |
| 17 | **reconcile** | Granite 4.1 :8b reads all documents produced by steps 1β16 and writes the cited briefing paragraph. Mellea rejection-sampling validates 4 grounding requirements; up to 3 attempts. See [Β§6](#6-document-grounded-reconciliation). | LLM synthesis |
### 3.2 Worked example: 2940 Brighton 3rd St, Brooklyn
To make the chain concrete, here's what fires for a Brighton Beach
address:
| Step | What it returns |
|---|---|
| geocode | `(40.5780, -73.9617)`, BBL `3-08660-0001`, Brooklyn |
| sandy | **YES**. Inside the 2012 Sandy Inundation Zone |
| dep_stormwater | `dep_moderate_2050`: depth 0.4-0.8 ft; `dep_extreme_2080`: depth 0.8-2.0 ft |
| floodnet | 2 sensors within 600 m; 1 trigger event in last 3 yr (peak 14 cm) |
| nyc311 | 11 flood-related complaints in 200 m, 5-yr window |
| noaa_tides | Sandy Hook gauge, +0.49 ft residual *(today's reading)* |
| nws_alerts | 0 active alerts |
| nws_obs | KJFK ASOS, no recent precipitation |
| ttm_forecast | Forecast peak residual +0.6 ft in 4.2 h *(today's run)* |
| microtopo | Elevation 2.36 m, HAND 0.7 m, TWI 11.3, percentile 8 (very low) |
| ida_hwm | 0 USGS HWMs within 800 m (Ida hit Queens hardest, not Brighton) |
| prithvi | Inside an Ida-attributable polygon? **NO** (Ida was pluvial-inland) |
| rag | Top hits: NPCC4 Ch.3 (coastal), MTA Resilience (Coney Island D-train), Comptroller |
| reconcile | (see below) |
| **Tier** | **1 (High exposure)** with empirical floor applied |
The reconciler then writes:
```
**Status.** This Brighton Beach address sits **inside the 2012 Sandy
Inundation Zone** [sandy], on relatively low ground with HAND of 0.7 m
[microtopo].
**Empirical evidence.** NYC 311 records show **11 flood-related
complaints** within 200 m over the last 5 years [nyc311]; 2 FloodNet
sensors are within 600 m and one logged a 14 cm event in the last 3
years [floodnet].
**Modeled scenarios.** The address sits inside **DEP Moderate-2050**
with depth class 0.4-0.8 ft and **DEP Extreme-2080** with depth class
0.8-2.0 ft [dep_moderate_2050][dep_extreme_2080].
**Policy context.** **NPCC4 Ch. 3** documents accelerating coastal-
flood frequency along this stretch [rag_npcc4].
```
Note what *didn't* fire: no Ida HWM doc (Ida didn't flood here), no
Prithvi doc (no Ida-attributable polygon), no NWS alerts (clear day),
no TTM doc (forecast residual under threshold). The reconciler never
saw those headers and didn't invent them.
---
## 4. Five planner intents
The planner (`app/planner.py`) classifies every free-text query into one of
five intents before the FSM runs. This happens in a single Granite 4.1 call
that streams its JSON output to the client as `plan_token` events.
| Intent | Triggered by | FSM path | Steps |
|-----------------------|-----------------------------------------------|----------------------------------|-------|
| `single_address` | Fully-qualified street address | Full linear FSM (geocode β 19 specialists β reconcile) | 24 |
| `neighborhood` | NTA name, borough name, bare zip | NTA-level specialists only (no per-address registers) | 8β10 |
| `compare` | "A vs B", "compare X to Y" | Two sequential single_address runs; merged two-column paragraph | 2 Γ 24 |
| `development_check` | "what's being built at X", "is Y risky" | DOB filings + flood layers | 3β5 |
| `live_now` | "is it flooding now", "current alerts" | Live-only specialists (tides, alerts, obs) β no Mellea | 4 |
| `not_implemented` | Retrospective, ranking, cross-city queries | Returns rationale immediately | 0 |
### Compare intent detail
`_run_compare()` in `web/main.py` executes the full `single_address` FSM
sequentially for each target, then merges the two paragraphs under
`## PLACE A: β¦` / `## PLACE B: β¦` headers separated by `---`. The
`CompareBriefing.svelte` component renders this as a two-column layout with
a "Key differences" delta bar above. During streaming the tokens are rendered
in a single column (sequential); the two-column layout appears when the
`final` event lands.
**Registered routes**
| Path | Serves |
|-------------------------------------------|--------|
| `/` | SvelteKit landing + live query UI |
| `/api/agent/stream?q=β¦` | SSE stream β planner + all intent paths |
| `/register/{schools,nycha,mta_entrances}` | Pre-computed bulk register browser |
| `/legacy`, `/single`, `/compare`, `/register/*` | Legacy custom-element bundle (compatibility) |
Registers are pre-computed because running 1,900 reconciler calls at request
time is a non-starter; the register build runs offline
(`scripts/build_*_register.py`) and results are loaded from
`data/registers/*.json` at boot.
---
## 5. The scoring rubric
This is the part of the system that produces the tier 1β4. It is
**deterministic, published, and not done by the language model**.
See `METHODOLOGY.md` for the full citation list; here's the
high-level structure.
### 5.1 Three thematic sub-indices
Following Cutter et al. 2003 (SoVI hazards-of-place) and Tate 2012
(uncertainty analysis), indicators are grouped into thematic sub-
indices, equal-weighted within each group, normalized to [0, 1]:
| Sub-index | What it captures | Top weights |
|-----------------|----------------------------------------------------------|-------------|
| **Regulatory** | Inside FEMA / DEP / NPCC4 modeled or regulated zones | FEMA 1 %; DEP-2050; DEP Tidal |
| **Hydrological**| Terrain-based exposure (HAND, TWI, percentile, relief) | HAND (Nobre 2011); TWI half-weighted (urban DEM noise) |
| **Empirical** | Did flooding actually happen here (Sandy, Ida HWMs, 311) | Sandy + HWM<100m β also trigger floor |
The **composite** is the sum of the three sub-indices (range 0β3).
Tier breakpoints: β₯1.5 β Tier 1, β₯1.0 β Tier 2, β₯0.5 β Tier 3, >0 β
Tier 4, 0 β Tier 0.
### 5.2 Max-empirical floor
If **Sandy 2012 inundation** OR **a USGS Ida HWM within 100 m** fired,
the tier is capped at **2 (Elevated)**. It cannot be worse,
regardless of the additive composite.
This recovers the *important* multiplicative behaviour Balica 2012
argues for (empirical observations should not be cancelled by
terrain or modeled scenarios) without giving up additive transparency.
The 100 m radius is chosen because USGS HWM positional uncertainty is
typically 5β30 m. 100 m gives ~3Ο headroom for a confident "this
address was inundated" signal.
### 5.3 Live signals stay out
NWS alerts, NOAA tide residual, and NWS hourly precipitation are
**not** in the static tier. Per IPCC AR6 WG II glossary and NPCC4
Ch. 3, exposure is a quasi-stationary property of place; event
occurrence is time-varying. They appear separately as live evidence
cards.
---
## 6. Document-grounded reconciliation
`app/reconcile.py` builds a list of OpenAI-style chat messages where
each specialist's emission is its own message with a stable `doc_id`
ride-along on the role. Granite 4.1's Ollama chat template recognises
any `role: "document <doc_id>"` message and lifts it into a
`<documents>` block, prepending IBM's official grounded-generation
system message ("Write the response by strictly aligning with the
facts in the provided documents").
Example packet for the Brighton Beach address (abbreviated):
```python
[
{"role": "system", "content": "<citation-discipline + 4-section skeleton>"},
{"role": "document sandy", "content": "Address is INSIDE the 2012 Sandy zone. ..."},
{"role": "document dep_extreme_2080", "content": "Depth class 0.8-2.0 ft. ..."},
{"role": "document floodnet", "content": "2 sensors; peak 14 cm. ..."},
{"role": "document nyc311", "content": "11 flood complaints in 200 m. ..."},
{"role": "document microtopo", "content": "Elev 2.36 m, HAND 0.7 m, TWI 11.3. ..."},
{"role": "document rag_npcc4", "content": "<retrieved paragraph>"},
{"role": "user", "content": "Write the cited briefing now."},
]
```
The four-section structure (`**Status.** / **Empirical evidence.** /
**Modeled scenarios.** / **Policy context.**`) is enforced by the
`EXTRA_SYSTEM_PROMPT`. Sections without supporting documents are
omitted entirely.
### 6.1 Two reconciler models
- **`granite4.1:3b`** runs the planner and `live_now` (short outputs,
routing decisions). Always streamed.
- **`granite4.1:8b`** runs the synthesis path for `single_address`,
`neighborhood`, and `development_check` (long outputs, dense
citations). Pre-warmed into VRAM in `entrypoint.sh` so the first
query doesn't pay the model-load tax. Both fit warm on the T4 with
`OLLAMA_MAX_LOADED_MODELS=2` and `OLLAMA_KEEP_ALIVE=24h`.
### 6.2 Mellea-validated rejection sampling
`app/mellea_validator.py` wraps the Granite-via-Ollama call in IBM
Research's [Mellea](https://github.com/generative-computing/mellea)
framework. Instruct, validate, repair. The synthesis intents call
`reconcile_strict_streaming(...)` which:
1. **Streams** each generation attempt's tokens to the user (via the
FSM threadlocal `set_token_callback` for `single_address` or a
`progress_q` for the polygon intents).
2. After each attempt, runs **four deterministic checks** on the
accumulated paragraph:
- **`numerics_grounded`**. Every non-trivial number in the output
appears verbatim in a source document.
- **`no_placeholder_tokens`**. Output contains no leaked
`[source]` / `<document>` template markup.
- **`citations_dense`**. Every non-trivial number has a
`[doc_id]` citation **somewhere in the same sentence** (sentence
boundaries: `. ` / `.\n` / end-of-text).
- **`citations_resolve`**. Cited `doc_id`s are a subset of the
input doc_ids.
3. If any check fails, fires a `mellea_attempt` SSE event with the
failed-requirement names, then **rerolls** with a feedback prompt
that names the specific failing sentences (the model usually
responds well to surgical corrections). Loop budget: 3 attempts.
The frontend renders an inline banner above the briefing. Amber on
reroll (with the failed-req list), green on first-try pass. The final
reconcile step in the trace shows the `passed: N/4 Β· rerolls: M`
metadata for full audit transparency.
### 6.3 Number recognition is identifier-aware
The numeric guardrail uses `\b-?\d[\d,]*(?:\.\d+)?\b` so that
identifier codes embedded in prose (`QN1206` NTA codes, `BBL
3-00589-0003` parcels, `BIN`, `B12` community boards) are *not*
treated as numeric claims demanding citation. This was the dominant
false-positive in early probing; without it, almost every neighborhood
briefing failed `citations_dense` because the opening sentence
typically reads "*X (NTA QN1206) in Queensβ¦*".
### 6.4 Why no native Granite 4.x inline citations
We investigated using Granite's native `<|start_of_cite|>{document_id:
X}fact<|end_of_cite|>` mode. **It's deprecated in 4.x.** Verified:
- The official Ollama chat template for `granite4.x` has no citation
branch (the 3.3 / 4.0-preview templates did).
- `granite_common` ships only `granite3/granite32` and
`granite3/granite33` subdirs. No 4.x equivalent.
- `granite-io` has only `granite_3_2/` and `granite_3_3/` processor
dirs.
The base 4.1 weights still contain the cite tokens (training residue),
so the model emits them as real tokens when nudged. But only as an
end-of-response list, not inline in prose. IBM's published 4.x
grounding path is a separate **Citation Generation LoRA** (built on
`granite-4.0-micro`, not 4.1) requiring HF transformers + LoRA
loading. Mellea's `OllamaBackend` explicitly raises
`NotImplementedError` for activated LoRAs. So our hand-rolled
`[doc_id]` regex + reroll **is** the right pattern for our setup
(Granite 4.1 via Ollama, inline placement).
---
## 7. The four foundation models
| Model | Params | Runtime | Role |
|-------|--------|---------|------|
| **Granite 4.1 :3b alias** | 8 Bβ | Ollama or vLLM (AMD MI300X) | Planner (intent + specialist routing) + `live_now` reconciler. β Production alias `RIPRAP_OLLAMA_3B_TAG=granite4.1:8b` β planner runs 8b in production. |
| **Granite 4.1 :8b** | 8 B | Ollama or vLLM (AMD MI300X) | Synthesis reconciler for `single_address`, `neighborhood`, `development_check`, `compare`. Validated by Mellea (4 grounding requirements + reroll). |
| **Granite Embedding 278M** | 278 M | sentence-transformers (CPU) | RAG retrieval over 5 policy PDFs at query time. |
| **Prithvi-EO 2.0** | 300 M | TerraTorch (offline pre-compute) | NYC-Pluvial fine-tune; segmented Hurricane Ida 2021 pre/post Sentinel-2 polygons baked into `data/`. Fine-tune: `msradam/Prithvi-EO-2.0-NYC-Pluvial`. |
| **Granite TimeSeries TTM r2** | 1.5 M | granite-tsfm (CPU) | Zero-shot forecast of the Battery surge residual, ~9.6 h horizon. Fine-tune: `msradam/Granite-TTM-r2-Battery-Surge`. |
| **GLiNER medium-v2.1** | ~200 M | gliner (CPU) | Named-entity extraction over RAG hits (locations, agencies, dates, infrastructure). `urchade/gliner_medium-v2.1`. |
**Granite 4.1 β Granite Time Series.** Granite 4.1 is IBM's chat-LLM
family. Granite TimeSeries TTM is a separate IBM Research product
line (Ekambaram et al. 2024, NeurIPS). Both happen to share the
"Granite" brand but have different architectures, training data, and
authors.
**LiteLLM Router.** All LLM calls go through `app/llm.py`, a ~250-line
shim over a LiteLLM Router. Two backends are wired: `RIPRAP_LLM_PRIMARY=ollama`
(local + HF Space default) and `RIPRAP_LLM_PRIMARY=vllm` (AMD MI300X demo
path, auto-fails over to Ollama). The shim normalizes role names and
citation-token format so the rest of the codebase is backend-agnostic.
### 7.1 Why Prithvi runs offline
Prithvi-EO 2.0 with TerraTorch needs a GPU and minutes per HLS tile.
We segmented Hurricane Ida 2021 once (pre: 2021-08-25, post:
2021-09-02 ~12 h after peak), filtered the output (>30 000 sqft to
drop noise, <1 kmΒ² to drop tidal artifacts) into **166 polygons**
baked into `data/prithvi_ida_2021.geojson`. The runtime FSM does a
point-in-polygon test, not fresh inference. This is honest about
where foundation models earn their keep: **once, to produce a
defensible event-level signal. Not per request**.
### 7.2 Why TTM r2 runs live
TTM r2 is **1.5 M params**. Vastly smaller than Prithvi or Granite
4.1. Inference is millisecond-scale even on CPU. It forecasts only
the residual (surge component) at the Battery, which complements the
NOAA snapshot specialist; it does **not** try to forecast the
astronomical tide (NOAA already publishes that exactly).
---
## 8. Live signals separation
Live data (steps 6β9 in the FSM diagram) is fundamentally different
from static layers and is handled separately:
- **Surface**: in evidence cards and a "Right now" section in the UI.
- **Score**: explicitly excluded. Tier is reproducible across queries
unless source data changed.
- **Cadence**: NOAA tides update every 6 min; NWS alerts on push;
NWS obs ~hourly; TTM is computed per query (cheap).
- **Failure mode**: graceful. If NOAA times out, no `noaa_tides`
doc is emitted; the reconciler simply doesn't see it.
This mirrors how First Street separates Flood Factor (static, 30-yr)
from event-day Flood Lab products, and how Fathom separates Global
Flood Map from real-time intelligence.
---
## 9. Repository layout
```
riprap-nyc/
ARCHITECTURE.md this file
METHODOLOGY.md scoring methodology + full citations
README.md HF Spaces frontmatter + user-facing summary
Dockerfile nvidia/cuda:12.4 base + Ollama + Granite
entrypoint.sh Ollama daemon + uvicorn launcher
requirements.txt runtime deps (FastAPI, geopandas, sentence-transformers, ollama, burr, granite-tsfm)
pyproject.toml ruff + vulture config
riprap.py CLI driver for register builds
agent.py single-address CLI
app/
fsm.py Burr FSM (14 actions; Mellea hooks via threadlocal)
planner.py Granite 4.1:3b intent router (5 intents)
geocode.py NYC DCP Geosearch + borough-hint filter
reconcile.py Granite 4.1 grounded reconciler + numeric guardrail
mellea_validator.py streaming rejection sampler + 4 grounding checks
rag.py Granite Embedding 278M retrieval
score.py deterministic exposure rubric (3 sub-indices, floor)
spatial.py geopandas join helpers
energy.py per-query inference Wh accounting
register_builder.py bulk-mode runner (offline)
intents/ per-intent orchestration on top of fsm.py
live_now.py shoreline tide + alerts (cheap, non-strict)
single_address.py drives the linear FSM with strict reconcile
neighborhood.py polygon-aggregated specialists
development_check.py DOB permit overlap with flood polygons
llm.py LiteLLM Router shim β all LLM calls go here.
Routes to vLLM (AMD) or Ollama; normalizes
role names and citation token format.
areas/
nta.py NYC NTA 2020 polygon resolver
flood_layers/
sandy_inundation.py NYC OD 5xsi-dfpx
dep_stormwater.py 9i7c-xyvv (3 scenarios)
ida_hwm.py USGS STN Event 312
prithvi_water.py Ida pre/post diff polygons (offline-built)
context/
microtopo.py DEM + TWI + HAND raster sampling
nyc311.py erm2-nwe9 buffer aggregation
floodnet.py api.floodnet.nyc Hasura GraphQL
noaa_tides.py live water level + residual
nws_alerts.py live alerts at point
nws_obs.py nearest ASOS hourly METAR
live/
ttm_forecast.py Granite TTM r2 surge-residual nowcast
assets/
schools.py DCP FacDB
nycha.py phvi-damg
mta_entrances.py i9wp-a4ja
web/
main.py FastAPI. Primary SSE at /api/agent/stream.
_run_compare() handles the compare intent
(sequential single_address Γ 2; no separate
intent module). /api/backend returns live
backend descriptor for the UI pill.
static/
agent.html legacy primary UI (Svelte custom elements)
dist/ Svelte 5 custom-element bundle (committed).
Built from web/svelte/ via `npm run build`.
web/sveltekit/ SvelteKit app (primary UI). Build β
web/sveltekit/build/. Served at / by FastAPI.
src/routes/
+page.svelte landing + query form
q/[queryId]/+page.svelte live query page (SSE stream consumer)
src/lib/components/
briefing/
Briefing.svelte 4-section cited paragraph renderer
CompareBriefing.svelte two-column compare layout + delta bar
shell/StatusPill.svelte AMD / Ollama / Local backend indicator
web/svelte/ Legacy Svelte 5 custom-element source.
Builds <r-briefing>, <r-trace>, <r-sources-footer>.
Still loaded by agent.html / register/*.html.
scripts/ offline pre-compute + diagnostic probes
run_prithvi_ida.py
compute_hydrology_indices.py
fetch_nyc_dem.py
fetch_ida_hwms.py
build_schools_register.py
build_nycha_register.py
build_mta_entrances_register.py
probe_mellea.py drives the SSE stream N times, dumps
per-attempt pass/fail to CSV
corpus/ 5 LFS-tracked NYC policy PDFs
data/ LFS-tracked baked fixtures
sandy_inundation.geojson
prithvi_ida_2021.geojson 166 Hurricane Ida polygons
ida_2021_hwms_ny.geojson
nyc_dem_30m.tif, twi.tif, hand.tif
schools.geojson, nycha.geojson, mta_entrances.geojson
dep/ Esri FileGDBs (DEP scenarios)
registers/ pre-computed register outputs
```
---
## 10. Honest scope (what Riprap does NOT do)
- **Not a damage probability.** Riprap is exposure triage. We have no
labeled flood-damage outcomes (claim records, insurance loss data),
so we cannot calibrate. The tier is a literature-grounded prior,
not a prediction.
- **Not a flood insurance rating.** For that, see FEMA Risk Rating 2.0
(claims-driven GLM over decades of labeled outcomes).
- **Not a vulnerability assessment.** Engineering fragility (foundation
type, electrical hardening, drainage condition), social capacity,
and financial absorption are out of scope.
- **No sub-surface flooding.** Optical satellites can't see basement
apartments or subway entrances. The dominant Hurricane Ida damage
mode in NYC. Prithvi correctly emits no polygons for Hollis or
Carroll Gardens. That silence is a feature, not a bug.
- **Vintage-bounded.** FEMA NFHL is years stale; DEP Stormwater Maps
are 2021; corpus PDFs are point-in-time. All vintages are cited in
the methodology panel.
- **Public infrastructure only.** ConEd substations, water-supply
components, and other adversarially-sensitive registers are not
published. NYC OD has the same redaction posture; we follow it.
---
## 11. Why local foundation models
1. **Data governance.** A newsroom with FOIL'd documents, an agency
capital planner with internal data, or a researcher under IRB
constraints can't paste organization context into a vendor LLM.
All four models run inside this container; the org boundary
holds. Public NYC and USGS services receive resolved address
coordinates only; no LLM vendor does.
2. **Inference energy.** Granite 4.1 :3b draws roughly **0.03 Wh per
query** vs an estimated **~0.3 Wh per query** for GPT-4o-class
frontier models ([Epoch AI, 2025](https://epoch.ai/gradient-updates/how-much-energy-does-chatgpt-use)).
Order of magnitude lower per-query inference energy. The
methodology panel reports a per-query Wh estimate so users can
verify.
3. **Reproducibility.** Apache-2.0 stack end to end; no commercial
licenses required to reproduce the system.
---
## 12. Deployment
### 12.1 Hugging Face Spaces (production)
**HF Space**: `lablab-ai-amd-developer-hackathon/riprap-nyc` (cpu-basic).
Serves the FastAPI + SvelteKit UI. Hardware: cpu-basic (no GPU).
**AMD MI300X droplet** (separate): vLLM + riprap-models containers
(`services/riprap-models/`). The Space talks to the droplet over HTTP;
env vars `RIPRAP_LLM_BASE_URL` / `RIPRAP_ML_BASE_URL` point at it.
The bootstrap droplet was destroyed 2026-05-06; redeploy via
`scripts/deploy_droplet.sh <ip> <token>`.
LLM routing: `RIPRAP_LLM_PRIMARY=vllm` β AMD MI300X (30β50Γ faster than
T4 Ollama). Falls over to local Ollama on connection failure. Backend
status visible in the UI pill (top-right corner; backed by `GET /api/backend`).
Verified warm query times on AMD MI300X + vLLM (2026-05-06 probe):
- `single_address`: 5β12 s (4/4 Mellea, 0β2 rerolls)
- `neighborhood`: 3β5 s
- `compare` (two sequential legs): ~15 s
Cold-start after container restart: ~30 s for vLLM kernel JIT compile + prefix cache warmup. Run one warm-up query before a demo.
The SvelteKit build in `web/sveltekit/build/` and the Svelte bundle in
`web/static/dist/` are both committed, so HF Spaces runs no Node build step.
### 12.2 Local development
```bash
uv venv --python 3.12
source .venv/bin/activate
uv pip install -r requirements.txt
ollama pull granite4.1:3b
ollama pull granite4.1:8b
uvicorn web.main:app --reload --port 8000
# Frontend (only when changing components)
cd web/svelte && npm install && npm run build
```
The fixtures in `data/` and the policy PDFs in `corpus/` are LFS-
tracked. Granite Embedding and TTM download on first query.
### 12.3 Diagnostic probes
```bash
# Drive the live stream N times, dump per-attempt Mellea outcomes:
.venv/bin/python scripts/probe_mellea.py --query "Hollis" --runs 5
# Output: outputs/probe_*.csv with per-attempt pass/fail, paragraph,
# elapsed time, reroll count.
```
---
## 13. License
Apache-2.0. All foundation models (Granite 4.1, Granite Embedding,
Prithvi-EO 2.0, Granite TimeSeries TTM r2) and all input datasets
(NYC OpenData, USGS, NOAA, NWS, FloodNet NYC, NASA/MS Planetary
Computer for HLS Sentinel-2) are public. Visual idiom adapted from
[NYC Planning Labs](https://planninglabs.nyc/).
|