Riprap β Owner's Brief
1. The system in one paragraph
Riprap takes any NYC address, neighborhood, or development-permit query and produces a four-section flood-exposure briefing where every numeric claim is anchored to a [doc_id] citation that traces back to the source dataset, agency report, or model output. A natural-language planner (Granite 4.1 3b) routes each query to one of four intent paths; the chosen path fans out across up to ~25 atomic data specialists; a synthesizer (Granite 4.1 8b) reads only the specialist outputs that fired and writes the briefing; a Mellea rejection sampler checks four grounding invariants and rerolls if any fail. The system is NYC-specific and public-record-only: all data comes from NYC OpenData, USGS, NOAA, NWS, or FloodNet, and all four models run inside the container β no vendor LLM is contacted at runtime. The output is a tier 1β4 exposure score (deterministic, published rubric, not generated by the LLM) plus a cited paragraph in prose. What Riprap does not do: damage probability, insurance rating, flood prediction, or any claim about basement apartments or infrastructure that isn't in a public register.
2. Architecture map
HTTP request lifecycle
User browser β GET /api/agent/stream?q=<query>
web/main.py: api_agent_stream() (async SSE generator)
runs runner() in a threadpool executor
app/planner.plan(q, on_token=...) β streams plan_token events while Granite generates
returns Plan(intent, targets, specialists, rationale)
out_q.put({kind:"plan", ...}) β SSE plan event
intent dispatch:
"single_address" β app/intents/single_address.run(plan, q, progress_q, strict=True)
"neighborhood" β app/intents/neighborhood.run(plan, q, progress_q, strict=True)
"development_check" β app/intents/development_check.run(plan, q, progress_q, strict=True)
"live_now" β app/intents/live_now.run(plan, q, progress_q)
"not_implemented" β inline JSON response, no FSM
each intent calls fsm.iter_steps() or its own specialist loop
β out_q.put({kind:"step", ...}) per specialist
β out_q.put({kind:"token", ...}) per Granite reconcile chunk
β out_q.put({kind:"mellea_attempt", ...}) per Mellea pass/fail
out_q.put({kind:"final", ...})
event_stream() async generator reads out_q, wraps steps in
stone_start / stone_done envelope keyed by _STEP_TO_STONE dict,
yields SSE frames
SSE response header: Cache-Control: no-cache, X-Accel-Buffering: no
Planner: app/planner.py
- Entry:
plan(query, model, on_token) β Plan - Model:
RIPRAP_PLANNER_MODELenv, defaultgranite4.1:3b - Uses
llm.chat(format="json")withtemperature=0for deterministic JSON output via Ollama's constrained-decode mode - Pre-filter:
_not_implemented_message(query)checks two regex patterns (retrospective, ranking) and returns early with aPlan(intent="not_implemented")so no LLM call is made - Post-validator:
_validate(d, raw_query)sanitizes intent, targets, specialists against the declared INTENTS/SPECIALISTS dicts; adds floor specialists via_required_specialists(intent)if planner omitted them - Floor specialists (always added regardless of planner output): geocode+sandy+dep_stormwater+microtopo for single_address; nta_resolve+sandy+dep_stormwater+nyc311 for neighborhood; nws_alerts+noaa_tides for live_now
- Returns:
Plan(intent, targets: list[dict], specialists: list[str], rationale: str)
FSM: app/fsm.py
- Entry:
build_app(query) β Burr Application;run(query) β dict;iter_steps(query) β generator - Burr 0.x
ApplicationBuilderwithwith_state(query=query, trace=[]),with_entrypoint("geocode") - Actions registered in dict order, transitions are consecutive pairs (linear, not DAG)
- Each
@actionwrites one state key + appends totracelist - Out-of-NYC guard:
_NYC_S/W/N/E = 40.49, -74.27, 40.92, -73.69β NYC-specific specialists skip with"out of NYC scope"reason; live/national specialists (NWS/NOAA/TTM) run unconditionally - Thread-locals for streaming (since Burr runs sync in a background thread):
set_strict_mode(bool)/_current_strict_mode()set_token_callback(fn)/_current_token_callback()set_mellea_attempt_callback(fn)/_current_mellea_attempt_callback()set_planned_specialists(set)/_current_planned_specialists()set_user_query(str)/_current_user_query()set_planner_intent(str)/_current_planner_intent()
iter_stepsspawns a daemon thread runningapp.iterate(halt_after=["reconcile"]); snaps threadlocals from caller thread and re-installs on iterate thread; deduplicates trace records by (step_name, started_at)- Heavy-specialist gate:
_HEAVY_SPECIALISTS_ENABLED= True whenRIPRAP_LLM_PRIMARY != ollamaORRIPRAP_ML_BASE_URLis set; otherwise False. Controls whether prithvi_live, terramind, eo_chip, terramind_lulc, terramind_buildings fire - NYCHA register gate:
_NYCHA_REGISTERS_ENABLED= controlled byRIPRAP_NYCHA_REGISTERS=1(default off); registers load a 91 MB GeoJSON file on first call
Full action sequence (default, single_address)
| # | Action name | State key written | Data source |
|---|---|---|---|
| 1 | geocode | geocode, lat, lon | NYC DCP Geosearch β OSM Nominatim fallback |
| 2 | sandy | sandy | data/sandy_inundation.geojson (lru_cache) |
| 3 | dep | dep | data/dep/*.gdb (3 scenarios, lru_cache) |
| 4 | floodnet | floodnet | api.floodnet.nyc Hasura GraphQL |
| 5 | nyc311 | nyc311 | Socrata erm2-nwe9 |
| 6 | noaa_tides | noaa_tides | api.tidesandcurrents.noaa.gov |
| 7 | nws_alerts | nws_alerts | api.weather.gov/alerts/active |
| 8 | nws_obs | nws_obs | api.weather.gov/stations//observations |
| 9 | ttm_forecast | ttm_forecast | ibm-granite/granite-timeseries-ttm-r2 (in-process or remote) |
| 10 | ttm_311_forecast | ttm_311_forecast | TTM r2 on local 311 weekly series |
| 11 | floodnet_forecast | floodnet_forecast | TTM r2 on nearest FloodNet sensor history |
| 12 | ttm_battery_surge | ttm_battery_surge | msradam/Granite-TTM-r2-Battery-Surge (remote or local) |
| 13 | microtopo | microtopo | data/nyc_dem_30m.tif, twi.tif, hand.tif |
| 14 | ida_hwm | ida_hwm | data/ida_2021_hwms_ny.geojson |
| 15 | mta_entrances | mta_entrances | data/mta_entrances.geojson |
| 16 | prithvi | prithvi_water | data/prithvi_ida_2021.geojson (166 polygons) |
| 17β22 | (heavy, if enabled) | prithvi_live, terramind, eo_chip, terramind_lulc, terramind_buildings | STAC/Sentinel-2, msradam/TerraMind-NYC-Adapters |
| 23 | rag | rag | Granite Embedding 278M over corpus/*.pdf (5 PDFs) |
| 24 | gliner | gliner | GLiNER typed-entity extraction over RAG hits |
| 25 | reconcile | paragraph, audit, mellea | Granite 4.1:8b via Mellea strict sampler |
Capstone reconciliation: app/reconcile.py + app/mellea_validator.py
build_documents(state) β list[dict]β emits one{"role": "document <doc_id>", "content": "..."}per specialist that fired, in Stones order; gatted by both specialist fire status and the out-of-NYC guardtrim_docs_to_plan(doc_msgs, planned_specialists)β drops doc messages not matching planner's specialist set; saves ~30β50% prompt tokens;RIPRAP_TRIM_DOCS=0disablesEXTRA_SYSTEM_PROMPTβ the 4-section skeleton with the citation-discipline rulesaugment_system_prompt(EXTRA_SYSTEM_PROMPT, query, intent)β callsapp/framing.detect()to classify question type (11 types, deterministic regex), then appends aQUESTION-AWARE OPENING:directive to the system prompt for non-generic questions- Strict path (production):
reconcile_strict_streaming(doc_msgs, system_prompt, ...)inapp/mellea_validator.py- Streams each attempt's tokens via
on_token(delta, attempt_idx)callback - After each attempt runs four checks, fires
on_attempt_end(attempt_idx, passed, failed)callback - On failure, appends a feedback user-turn naming failing sentences and rerolls
- Budget:
DEFAULT_LOOP_BUDGET= 2 (Ollama primary) or 3 (vLLM primary), overridable viaRIPRAP_MELLEA_MAX_ATTEMPTS
- Streams each attempt's tokens via
- Legacy path (non-strict):
reconcile.reconcile(state)β streams tokens, then callsverify_paragraph()which drops sentences with ungrounded numbers (post-hoc, not rejection-sampling) - The
step_reconcileaction detects strict mode via_current_strict_mode()and routes to one or the other
Four Mellea grounding checks (app/mellea_validator.py)
numerics_groundedβ_check_no_invented_numbers(): every non-trivial number in output appears verbatim in haystack (joined document content). Trivial set:{0β10, 100, 311, 911, 211}. Number regex:\b-?\d[\d,]*(?:\.\d+)?\b(word-boundary β skipsQN1206,B12)no_placeholder_tokensβ_check_no_placeholder_tokens(): output contains none of[source],<document,</document,[doc_id]citations_denseβ_check_every_claim_cited(): each non-trivial number has a[doc_id]citation somewhere in the same sentence (sentence boundary:\.[\s)]or end of string)citations_resolveβ_check_referenced_doc_ids_exist(): every[id]cited in output is a member of the input doc_id set
SSE event vocabulary (/api/agent/stream)
| event | payload | when |
|---|---|---|
hello |
{query} |
connection open |
plan_token |
{delta} |
planner JSON tokens |
plan |
{intent, targets, specialists, rationale} |
planner done |
stone_start |
{name, tagline, description} |
first step in a Stone fires |
step |
{step, ok, elapsed_s, result?, err?} |
each FSM action completes |
token |
{delta, attempt?} |
Granite reconcile chunk (attempt idx resets on reroll) |
mellea_attempt |
{attempt, passed, failed} |
end of each Mellea attempt |
stone_done |
{name, tagline, description, n_steps} |
last step in a Stone done |
final |
full state dict (geocode, sandy, dep, paragraph, mellea, energy, ...) | reconcile done |
error |
{err} |
exception |
done |
{} |
stream closing |
3. The Five Stones, one section each
Cornerstone β Hazard Reader
Job: Establish the historical and modeled flood record at the address. These are static datasets that do not change between queries.
Specialists (file:function, what it does):
| Specialist | File:function | What it returns |
|---|---|---|
step_sandy |
fsm.py:step_sandy |
Boolean: inside 2012 Sandy Inundation Zone; gpd.sjoin point-in-polygon against data/sandy_inundation.geojson (91 MB) |
step_dep |
fsm.py:step_dep |
Three DEP stormwater scenarios: dep_extreme_2080 (3.66 in/hr rainfall, 2080 SLR), dep_moderate_2050 (2.13 in/hr, 2050 SLR), dep_moderate_current; depth class 1β3 per point |
step_microtopo |
fsm.py:step_microtopo |
Point elevation (m), HAND (height above nearest drainage, m), TWI (topographic wetness index), rel_elev_pct_200m, rel_elev_pct_750m, basin_relief_m from rasters in data/ |
step_ida_hwm |
fsm.py:step_ida_hwm |
USGS Hurricane Ida 2021 high-water marks; n_within_800m, max_height_above_gnd_ft, nearest_dist_m |
step_prithvi |
fsm.py:step_prithvi |
Point-in-polygon against 166 pre-computed polygons in data/prithvi_ida_2021.geojson; inside_water_polygon bool, nearest_distance_m |
Data sources:
- Sandy: NYC OpenData
5xsi-dfpxβ downloaded todata/sandy_inundation.geojson - DEP: NYC DEP Stormwater Flood Maps (2021), Esri FileGDBs at
data/dep/*.gdb - Microtopo: USGS 3DEP 30 m DEM via py3dep + whitebox-workflows for TWI/HAND computation, baked to
data/nyc_dem_30m.tif,data/twi.tif,data/hand.tifbyscripts/compute_hydrology_indices.py - Ida HWMs: USGS STN Event 312 (NY State), baked to
data/ida_2021_hwms_ny.geojsonbyscripts/fetch_ida_hwms.py - Prithvi polygons: offline Prithvi-EO 2.0 segmentation on Sentinel-2 HLS tile (pre-event 2021-08-25, post-event 2021-09-02), baked to
data/prithvi_ida_2021.geojsonbyscripts/run_prithvi_ida.py
Models invoked: Prithvi-EO 2.0 ran offline (TerraTorch) to produce the 166-polygon GeoJSON; no live model at query time for this Stone
Failure modes: Sandy/DEP/Prithvi fail silently if GeoJSON/GDB load errors; microtopo/ida_hwm fail if raster files absent from data/; all check _in_nyc() and skip with "out of NYC scope" for non-NYC addresses
UI evidence cards: Sandy inundation zone (boolean), DEP scenario depth classes (three cards), microtopo terrain indices, Ida HWM count/height, Prithvi polygon proximity
Keystone β Asset Register
Job: Quantify what public assets (transit, housing, schools, hospitals, buildings) are exposed to the hazards the Cornerstone established.
Specialists (file:function):
| Specialist | File:function | What it returns |
|---|---|---|
step_mta_entrances |
fsm.py:step_mta_entrances |
MTA subway entrances within 500 m: n_entrances, n_inside_sandy_2012, n_in_dep_extreme_2080; per-entrance elevation + HAND |
step_nycha |
fsm.py:step_nycha |
NYCHA developments within 1.5 km: n_developments, n_majority_inside_sandy_2012, n_with_dep_2080_overlap; per-development footprint overlap percentages |
step_doe_schools |
fsm.py:step_doe_schools |
DOE schools within 1 km: n_schools, n_inside_sandy_2012, n_in_dep_extreme_2080 |
step_doh_hospitals |
fsm.py:step_doh_hospitals |
NYS DOH hospitals within 2 km: n_hospitals, n_inside_sandy_2012, n_in_dep_extreme_2080 |
step_terramind_buildings |
fsm.py:step_terramind_buildings |
msradam/TerraMind-NYC-Adapters Buildings LoRA: pct_buildings, n_building_components in per-query Sentinel-2 chip; heavy, needs _HEAVY_SPECIALISTS_ENABLED |
Data sources:
- MTA:
data/mta_entrances.geojson(pre-computed register with elevation + flood layer joins) - NYCHA:
data/registers/nycha.json(built byscripts/build_nycha_register.py) - DOE schools:
data/registers/schools.json - DOH hospitals: fetched from NYS DOH Health Facility Certification (vn5v-hh5r) at register-build time
- TerraMind-Buildings:
msradam/TerraMind-NYC-Adaptersadapternyc-buildings-v1, viaapp/context/terramind_nyc.py:buildings()
Models invoked: msradam/TerraMind-NYC-Adapters (TerraMind 1.0 base + Buildings LoRA), ~1.6 GB base + ~325 MB LoRA, loaded lazily and cached; runs on RIPRAP_ML remote if configured
Failure modes: NYCHA/DOE/DOH registers require RIPRAP_NYCHA_REGISTERS=1 and the 91 MB sandy GeoJSON to be loaded β they fire disabled by default on local dev. TerraMind-Buildings skips silently if eo_chip didn't fire, deps unavailable, or heavy disabled
UI evidence cards: MTA entrance exposure summary, NYCHA development exposure, school exposure, hospital exposure, TerraMind building footprint fraction
Touchstone β Live Observer
Job: Report current conditions β what sensors, 311 data, and EO imagery show right now.
Specialists (file:function):
| Specialist | File:function | What it returns |
|---|---|---|
step_floodnet |
fsm.py:step_floodnet |
FloodNet sensors within 600 m: n_sensors, n_sensors_with_events, n_flood_events_3y, peak_event (max_depth_mm) |
step_311 |
fsm.py:step_311 |
NYC 311 flood complaints within 200 m, last 5 years: count, by_descriptor breakdown, by_year |
step_nws_obs |
fsm.py:step_nws_obs |
Nearest ASOS hourly METAR: station_id, precip_last_hour_mm, precip_last_3h_mm, precip_last_6h_mm |
step_noaa_tides |
fsm.py:step_noaa_tides |
Nearest of 3 NOAA gauges (Battery 8518750, Kings Point 8516945, Sandy Hook 8531680): observed_ft_mllw, predicted_ft_mllw, residual_ft |
step_prithvi_live |
fsm.py:step_prithvi_live |
Live Sentinel-2 L2A water segmentation via msradam/Prithvi-EO-2.0-NYC-Pluvial v2; pct_water_within_500m, pct_water_full, scene_date, cloud_cover; heavy |
step_terramind_lulc |
fsm.py:step_terramind_lulc |
msradam/TerraMind-NYC-Adapters LULC LoRA: dominant_class, dominant_pct, per-class fractions; heavy |
Data sources:
- FloodNet:
https://api.floodnet.nyc/v1/graphqlβ Hasura GraphQL, no auth; ~350 sensors - 311: Socrata
erm2-nwe9(live API call, 200 m buffer, last 5 years) - NWS obs:
https://api.weather.gov/stations/<id>/observations/latest; nearest of KNYC, KLGA, KJFK, KEWR, KFRG - NOAA tides:
https://api.tidesandcurrents.noaa.gov/api/prod/datagetter; 6-min cadence - Prithvi live: Microsoft Planetary Computer STAC API for Sentinel-2 L2A; msradam/Prithvi-EO-2.0-NYC-Pluvial v2 weights
- TerraMind LULC: shared chip from
step_eo_chip(also STAC/Planetary Computer)
Models invoked: Prithvi-EO-2.0-NYC-Pluvial v2 (300 M params, TerraTorch, flood IoU 0.5979 vs 0.10 base); TerraMind-NYC-Adapters LULC LoRA (mIoU 0.5866, +6.13 pp over full-FT)
Failure modes: FloodNet GraphQL call sets verify=False (self-signed cert); 311 Socrata times out gracefully; NOAA/NWS calls have 15-20 s timeouts; Prithvi/TerraMind LULC require _HEAVY_SPECIALISTS_ENABLED and app/context/eo_chip_cache.py:fetch() succeeding
Lodestone β Projector
Job: Report forward-looking signals β NWS alerts, surge forecasts, and complaint-rate trends.
Specialists (file:function):
| Specialist | File:function | What it returns |
|---|---|---|
step_nws_alerts |
fsm.py:step_nws_alerts |
Active NWS flood-relevant alerts at point (Flash Flood, Coastal Flood, etc.): n_active, list of alerts with event/severity/urgency/expires |
step_ttm_forecast |
fsm.py:step_ttm_forecast |
TTM r2 zero-shot Battery surge residual: context 512 steps ( |
step_ttm_311_forecast |
fsm.py:step_ttm_311_forecast |
TTM r2 zero-shot on 52 weeks of 311 complaint history β 4-week forecast; forecast_mean_per_week, forecast_peak_per_week, accelerating flag |
step_floodnet_forecast |
fsm.py:step_floodnet_forecast |
TTM r2 on nearest FloodNet sensor flood-event recurrence; forecast_28d_expected_events, accelerating; silent if sensor history too sparse |
step_ttm_battery_surge |
fsm.py:step_ttm_battery_surge |
msradam/Granite-TTM-r2-Battery-Surge fine-tune: hourly cadence, 96 h horizon; forecast_peak_m, forecast_peak_hours_ahead; only emits doc when interesting |
Data sources:
- NWS alerts:
https://api.weather.gov/alerts/activefiltered to flood event types at the point's county - TTM context data: live pull from NOAA CO-OPS 6-min water level (for Battery/Kings Point/Sandy Hook); Socrata 311 history; FloodNet GraphQL event history
- Battery surge fine-tune: NOAA hourly verified water level from Battery gauge (NOAA 8518750), loaded by
app/live/ttm_battery_surge.py
Models invoked: ibm-granite/granite-timeseries-ttm-r2 (1.5 M params, ~30 MB, CPU-viable, zero-shot); msradam/Granite-TTM-r2-Battery-Surge fine-tune (same backbone, test MAE 0.1091 m, β41% vs persistence, β25% vs zero-shot)
Failure modes: NWS alerts call gracefully returns n_active=0 on timeout; TTM models loaded lazily via app/live/ttm_forecast.py:_load_model() with _DEPS_OK = False fallback pattern; all Lodestone specialists fire unconditionally (no NYC bbox gate except floodnet/311 which are NYC-specific)
Capstone β Synthesizer
Job: Read all documents produced by the four data-Stones and write a citation-grounded four-section prose briefing.
Entry: app/mellea_validator.py:reconcile_strict_streaming(doc_msgs, system_prompt, user_prompt, loop_budget, on_token, on_attempt_end)
Document ordering in prompt: geocode preamble β Cornerstone (sandy, dep_*, ida_hwm, prithvi_water, microtopo) β Keystone (mta_entrance_*, nycha_dev_*, doe_school_*, nyc_hospital_*, tm_buildings) β Touchstone (floodnet, nyc311, nws_obs, noaa_tides, prithvi_live, tm_lulc) β Lodestone (nws_alerts, ttm_forecast, ttm_311_forecast, floodnet_forecast_*, ttm_battery) β Policy (rag_*, gliner_*)
Four-section skeleton (from EXTRA_SYSTEM_PROMPT):
- Status. β dominant exposure signal, strongest doc_id citation
- Empirical evidence. β Sandy, 311, FloodNet, Ida HWMs, Prithvi polygons
- Modeled scenarios. β DEP dep_* scenarios, microtopo terrain (HAND, TWI, percentile)
- Policy context. β one sentence per RAG hit, citing agency name + rag_* doc_id
Four grounding checks (described in Β§2 above): numerics_grounded, no_placeholder_tokens, citations_dense, citations_resolve
Reroll feedback mechanism: _failing_sentences_for_citations(text) identifies sentences with uncited numbers; on reroll the feedback user-turn names those specific sentences and instructs surgical citation additions
Model: RIPRAP_RECONCILER_MODEL env, default granite4.1:8b; num_ctx=4096, num_predict=400
Return shape from step_reconcile: {paragraph, audit: {raw, dropped}, mellea: {rerolls, n_attempts, requirements_passed, requirements_failed, requirements_total, model, loop_budget}}
4. The three NYC fine-tunes
msradam/Prithvi-EO-2.0-NYC-Pluvial
- HF Hub path:
msradam/Prithvi-EO-2.0-NYC-Pluvial - Base model: IBM/NASA Prithvi-EO 2.0 (300 M params, ViT-L foundation model pre-trained on HLS Sentinel-2 multispectral imagery), Apache-2.0
- Training data: NYC HLS Sentinel-2 tiles with pluvial flood labels derived from USGS Ida HWM survey and NYC DEP records; LovΓ‘sz-Softmax loss with copy-paste augmentation; trained on AMD Instinct MI300X
- Metrics: Test flood IoU 0.5979 vs 0.10 on Sen1Floods11 base (6Γ improvement)
- Invocation: Two paths:
- Offline (Cornerstone): produced
data/prithvi_ida_2021.geojsonviascripts/run_prithvi_ida.py; runtime does point-in-polygon, no model call - Live (Touchstone):
app/flood_layers/prithvi_live.py:fetch(lat, lon)β fetches latest Sentinel-2 L2A chip from Planetary Computer STAC, runs model forward pass, returnspct_water_within_500m,pct_water_full; slow (~30 s), gated by_HEAVY_SPECIALISTS_ENABLED; input 6-band S2L2A chip, output binary segmentation mask
- Offline (Cornerstone): produced
- Degradation: If Planetary Computer STAC unavailable or cloud cover too high,
fetch()returns{ok: False, skipped: "...reason..."}and no doc is emitted
msradam/TerraMind-NYC-Adapters
- HF Hub path:
msradam/TerraMind-NYC-Adapters - Base model: TerraMind 1.0 (IBM/ESA any-to-any generative EO foundation model), Apache-2.0
- Training data: NYC Sentinel-2 + SAR chips matched to ESRI Land Cover 2020β2022 labels (LULC adapter) and NYC building footprints (Buildings adapter); trained on AMD Instinct MI300X in ~18 minutes
- Metrics: LULC test mIoU 0.5866 (+6.13 pp over full-FT baseline); Buildings test mIoU 0.5511; TiM 0.6023
- Two adapters:
lulcβ 5-class land cover (water, built, vegetation, bare, agriculture); invoked bystep_terramind_lulcviaapp/context/terramind_nyc.py:lulc(s2_tensor, s1rtc, dem)buildingsβ binary building footprint mask; invoked bystep_terramind_buildingsviaapp/context/terramind_nyc.py:buildings(s2_tensor, s1rtc, dem)
- Shared chip: Both consume tensors from
step_eo_chipβapp/context/eo_chip_cache.py:fetch(lat, lon), which fetches S2L2A + S1RTC + DEM chip once per query - Degradation: If
eo_chipdidn't fire successfully, both LoRA specialists silently no-op. Lazy load + cached in-process; first call ~30 s, subsequent calls ~3β7 s
msradam/Granite-TTM-r2-Battery-Surge
- HF Hub path:
msradam/Granite-TTM-r2-Battery-Surge - Base model: ibm-granite/granite-timeseries-ttm-r2 (1.5 M params, Tiny Time Mixer, Ekambaram et al. NeurIPS 2024), Apache-2.0
- Training data: NOAA CO-OPS Battery gauge (station 8518750) hourly verified water level, surge residual computed as verified minus harmonic tide; trained on AMD Instinct MI300X
- Metrics: Test MAE 0.1091 m, β41% vs persistence, β25% vs zero-shot TTM r2
- Invocation:
app/live/ttm_battery_surge.py:fetch()β loads model viatsfm_public.get_model(), fetches NOAA hourly context, returns{available, context_hours, horizon_hours: 96, forecast_peak_m, forecast_peak_hours_ahead, interesting}; in-process on CPU - Input shape:
(context_length, 1)float tensor of hourly surge residuals; context = 336 h (~14 days) - Output shape:
(96,)hourly forecast, scanned for peak - Degradation:
_DEPS_OKmodule-level flag set at import time; on failure returns{available: False, reason: "..."}, no doc emitted
5. The deployment topology
Local development
- Python 3.12 venv (
.venv),uvfor package management - Ollama serving
granite4.1:3b+granite4.1:8blocally uvicorn web.main:app --host 127.0.0.1 --port 7860_HEAVY_SPECIALISTS_ENABLED = Falseby default (noRIPRAP_ML_BASE_URLset, no vLLM)RIPRAP_NYCHA_REGISTERS = 0by default (heavy 91 MB GeoJSON loads)- Granite Embedding 278M and TTM r2 download to HF cache on first query (~280 MB + ~30 MB)
- SvelteKit UI built at
web/sveltekit/build/; rebuild only needed when sources change
HF Space (production demo URL)
- URL:
https://lablab-ai-amd-developer-hackathon-riprap-nyc.hf.space - Docker SDK, base
nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04, hardwarecpu-basic(actual hardware is cpu-basic, not T4 β the ARCHITECTURE.md mentions T4 but the Dockerfile's GPU notes are aspirational) - Python 3.10 inside container (pinning
mellea<0.4,transformers<5,huggingface_hub<1) entrypoint.shflow:- Attempts EO toolchain install at runtime to
$HOME/.eo-pkgs(bypasses HF build disk limit); if fails, terramind/prithvi-live silently skip - Starts
ollama servein background, polls until ready (up to 60 s) - Pulls
granite4.1:8bat runtime if not cached (~5 GB, ~2 min first cold start); 3b is optional - Pre-warms 8b via
curl POST /api/generatewithkeep_alive=24h - Launches
uvicorn web.main:app --host 0.0.0.0 --port 7860
- Attempts EO toolchain install at runtime to
RIPRAP_OLLAMA_3B_TAG=granite4.1:8bset in Dockerfile so planner routes to 8b (avoids disk cost of two separate model pulls)web/main.py:_warm_caches()on startup: loads sandy + DEP layers, optionally NYCHA registers, warms RAG (Granite Embedding 278M + 5 PDFs), pre-imports heavy ML stacks to avoid import races, warms Ollama models via HTTP
AMD MI300X droplet (demo GPU path β currently destroyed)
- Two Docker containers on same host, both with
--device=/dev/kfd --device=/dev/dri - Container 1:
vllm/vllm-openai-rocm:v0.17.1β servesgranite-4.1-8bon port 8001--max-model-len 8192,--served-model-name granite-4.1-8bGLOO_SOCKET_IFNAME=eth0required or gloo fails to bind
- Container 2:
riprap-models:latest(built fromservices/riprap-models/Dockerfile) β FastAPI on port 8002 (or 7860 per scripts)- Endpoints:
GET /healthz,POST /v1/prithvi-pluvial,POST /v1/terramind,POST /v1/ttm-forecast,POST /v1/granite-embed,POST /v1/gliner-extract - Model loading: lazy + per-model threading.Lock to prevent double-load on concurrent requests
- ROCm device:
cuda(ROCm's CUDA shim mapscudato first/dev/kfddevice)
- Endpoints:
Env vars to connect HF Space to droplet:
RIPRAP_LLM_PRIMARY=vllm
RIPRAP_LLM_BASE_URL=http://<ip>:8001/v1
RIPRAP_LLM_API_KEY=<token>
RIPRAP_ML_BASE_URL=http://<ip>:8002
RIPRAP_ML_API_KEY=<token>
What breaks if droplet IP changes: Set the four env vars above via huggingface-cli space variables and restart the Space. The LiteLLM Router builds at import time from env, so a Space restart is required.
Deterministic redeploy: scripts/deploy_droplet.sh <new-ip> $TOKEN β idempotent, ~10β20 min first run (pulls images, builds riprap-models); re-runs on same droplet ~1 min. Known fragile: safetensors==0.8.0rc0 pin in services/riprap-models/requirements-full.txt is an RC and may fail on future pip resolves.
6. One query traced end-to-end: "80 Pioneer Street, Brooklyn"
Query enters: GET /api/agent/stream?q=80+Pioneer+Street%2C+Brooklyn
1. Planner (app/planner.py:plan)
- No not-implemented regex matches
- Calls
llm.chat(model="granite4.1:3b", messages=[system, user], format="json", stream=True, temperature=0) - Streams
plan_tokenSSE events as JSON generates - Returns
Plan(intent="single_address", targets=[{type:"address", text:"80 Pioneer Street, Brooklyn"}], specialists=[...], rationale="...") - Validator adds floor specialists: geocode, sandy, dep_stormwater, microtopo
- SSE:
planevent emitted
2. single_address.run (app/intents/single_address.py:run)
- Sets threadlocals:
strict=True,planned_specialists={...},user_query="80 Pioneer Street, Brooklyn",planner_intent="single_address" - Registers
on_tokenandon_mellea_attemptcallbacks onprogress_q - Calls
fsm.iter_steps("80 Pioneer Street, Brooklyn")
3. FSM: step_geocode
app/geocode.py:geocode_one("80 Pioneer Street, Brooklyn")- Detects borough hint "Brooklyn", calls DCP Geosearch with
size=8, filters for Brooklyn results - Returns
GeocodeHit(address="80 Pioneer Street, Brooklyn, NY 11231", borough="Brooklyn", lat=40.6772, lon=-74.0070, bbl="3-00589-0003", ...) - State:
{geocode: {...}, lat: 40.6772, lon: -74.0070} - SSE:
stepevent{step: "geocode", ok: true, elapsed_s: 0.4, result: {address:..., lat:..., lon:...}}
4. FSM: step_sandy
- Confirmed inside NYC bbox
sandy_inundation.join(point)β spatial join againstdata/sandy_inundation.geojson- Red Hook is inside the 2012 Sandy inundation zone β
sandy=True - State:
{sandy: True} - SSE:
stepevent β opensstone_start: Cornerstone
5. FSM: step_dep
dep_stormwater.join(pt, scen)for each of 3 scenarios againstdata/dep/*.gdb- Likely returns
dep_moderate_2050: depth_class=2 (Deep & Contiguous 1-4 ft),dep_extreme_2080: depth_class=3 (Deep Contiguous >4 ft),dep_moderate_current: depth_class=1
6β8. FSM: step_floodnet, step_311, step_noaa_tides
- FloodNet: GraphQL POST to
api.floodnet.nycβ checks sensors within 600 m of (40.6772, -74.0070) - 311: Socrata API call for flood complaints within 200 m, last 5 years
- NOAA: fetches Battery gauge (closest of 3 stations to Red Hook), returns observed/predicted/residual
9β12. FSM: TTM forecast steps
ttm_forecast.summary_for_point(40.6772, -74.0070): loads ibm-granite/granite-timeseries-ttm-r2, fetches 512 steps of Battery residual history via NOAA, forecasts 96 steps ahead; emits doc only if peak > 0.3 ftttm_311_forecast.weekly_311_forecast_for_point(...): fetches 52-week complaint history for 200 m buffer from 311, runs TTM zero-shotfloodnet_forecast.summary_for_point(...): nearest sensor historical events β TTM recurrence forecastttm_battery_surge.fetch(): msradam/Granite-TTM-r2-Battery-Surge, hourly context β 96 h forecast
13β14. FSM: step_microtopo, step_ida_hwm
microtopo.microtopo_at(40.6772, -74.0070): samplesdata/nyc_dem_30m.tif,hand.tif,twi.tifat point; returns elevation ~3 m, HAND ~0.8 m (near drainage), TWI ~11ida_hwm.summary_for_point(...): checksdata/ida_2021_hwms_ny.geojsonwithin 800 m β Ida hit Queens hardest, Red Hook had no USGS HWMs
15. FSM: step_mta_entrances
app/registers/mta_entrances.py:summary_for_point(...): loadsdata/mta_entrances.geojson, finds entrances within 500 m (likely Smith-9th and Carroll St A/C/G stations)
16. FSM: step_prithvi
prithvi_water.summary_for_point(40.6772, -74.0070): point-in-polygon againstdata/prithvi_ida_2021.geojson166 polygons; Red Hook is coastal β likelyinside_water_polygon=Trueor close proximity
17. FSM: step_rag
- Builds query: "address 80 Pioneer Street, Brooklyn; inside Hurricane Sandy 2012 inundation zone; in Deep Contiguous pluvial scenario; flood resilience plan..."
rag.retrieve(q, k=3, min_score=0.45): Granite Embedding 278M cosine similarity over embedded corpus; likely returnsrag_npcc4(NPCC4 coastal) +rag_mta(MTA Resilience Roadmap coastal references) +rag_comptroller- External reads: none after startup (RAG index built at startup via
rag.warm())
18. FSM: step_gliner
gliner_extract.extract_for_rag_hits(hits): GLiNER NER extraction over RAG paragraphs; extracts agency names, dollar amounts, infrastructure projects, NYC locations, date ranges- Emits
gliner_{source}doc messages
19. FSM: step_reconcile
_current_strict_mode() = Truebuild_documents(snap)β ~15 doc messagestrim_docs_to_plan(doc_msgs, planned_specialists)β drops specialists planner didn't ask foraugment_system_prompt(EXTRA_SYSTEM_PROMPT, query="80 Pioneer Street, Brooklyn", intent="single_address")βframing.detect()βgeneric_exposureβ no directive added (Red Hook query has no question-shape keywords)reconcile_strict_streaming(doc_msgs, framed_prompt, loop_budget=2, on_token=..., on_attempt_end=...)- Attempt 0: streams tokens to frontend; runs 4 checks; likely passes
- If fails: feedback user-turn names failing sentences, attempt 1
- Emits: paragraph, mellea metadata (
rerolls=0,requirements_passed=[4/4]) - SSE: multiple
tokenevents βmellea_attemptevent βstone_done: Capstoneβfinalevent
Scoring (computed in web/main.py from final state, or explicitly via app/score.py:composite()):
sandy=Trueβ empirical.sandy=1.0 β floor triggered (tier capped at 2)dep_moderate_2050 depth_class=2β regulatory.dep_moderate_2050=0.75microtopo HAND=0.8β hydrological.hand_band=1.0 (HAND < 1 m)- composite likely β₯ 1.5 β raw tier 1; floor_applied=True β final tier capped at min(1,2) = 1 (floor is a floor, not a ceiling β the actual rule caps tier at no worse than 2; since tier 1 is better than tier 2, the floor is satisfied: tier stays 1)
- Final tier: 1 (High exposure)
7. What's robust vs fragile
Robust (load-bearing, tested)
- Silence-over-confabulation in specialists: Every FSM action returns the declared state key as
Noneon failure;build_documents()gates onstate.get(key) is not None; Granite never invents content from absent documents. Pattern is consistent across 25 specialists. - NYC-scope guard:
_in_nyc()check in every FSM action +build_documents()scope_note mechanism for out-of-NYC addresses. National specialists (NOAA, NWS) still fire and a live-conditions-only briefing is produced. - LiteLLM Router failover:
app/llm.pyauto-fails from vLLM to Ollama on timeout/5xx.num_retries=0so the Router doesn't burn seconds re-hitting dead endpoints. The Ollama fallback fires from the same call site. - Planner validator floor:
_required_specialists()adds geocode/sandy/dep/microtopo even if planner forgot them; prevents silent missing-Stone briefings. - Four Mellea grounding checks with reroll feedback: The
_failing_sentences_for_citations()targeted feedback mechanism is the reason neighborhood queries went from chronic 3/4 β 4/4. The identifier-aware\bregex in_NUM_REis specifically why it stopped false-firing on NTA codes. - End-to-end probe suite:
scripts/probe_addresses.pydrives/api/agent/streamagainst 5 addresses (442 E Houston, 80 Pioneer, 100 Gold, Hollis, Coney Island), asserts Stone fire patterns + Mellea 4/4 + four-section structure. Last green run: 5/5, 5.8β13.1 s per address atRIPRAP_MELLEA_MAX_ATTEMPTS=3. - Startup warmup in
web/main.py:_warm_caches(): Sandy, DEP, RAG, Ollama models, and heavy ML module pre-imports all happen before the first request. The startup function catches exceptions individually so one failure doesn't kill the app. - Threadlocal cleanup in
finally:blocks:app/intents/single_address.pyalways resets all five threadlocals in afinally:clause, preventing state bleeding between requests.
Fragile (single points of failure, missing error handling)
- Burr FSM concurrent queries:
iter_steps()mutates module-level Burr state. Two concurrentsingle_addressqueries to the same uvicorn worker will interleave threadlocals. No per-request isolation. Production HF Space is single-worker; local dev with--workers 2would break. build_documents()complexity radon F=101: ~750-line function with oneif/elifbranch per specialist. Order matters for the Granite prompt. Small edits risk subtle doc-ordering regressions that are silent but affect citation density.- entrypoint.sh EO install: Runtime
pip install --targetfor terratorch/einops/diffusers/timm/torchvision into$HOME/.eo-pkgsis brittle β if pip fails mid-install the marker isn't created and the next container start retries, but if the Space's filesystem cache persists a partial install, it might never clear. The build log won't show this failure clearly. - Droplet redeploy: Dockerfile unverified end-to-end: The last full E2E Dockerfile build was never confirmed β the bootstrap droplet was destroyed before final verification.
safetensors==0.8.0rc0inservices/riprap-models/requirements-full.txtis an RC that may fail on a fresh pip resolve. - NOAA/NWS live calls without rate-limit handling:
app/context/noaa_tides.pyandnws_obs.pycall live APIs on every request with no caching, no retry-after handling. Under concurrent load or NOAA outage, specialists fail silently (returnserrorkey in result dict) but every request re-hits the failed endpoint. - FloodNet GraphQL
verify=False: Certificate validation disabled inapp/context/floodnet.py:_gql(). This is a permanent workaround for FloodNet's self-signed cert, not a temporary workaround. - Static asset cache:
web/sveltekit/build/assets have no cache-busting. When iterating on Svelte sources, browser hard-reload is required. - Planner 3b β 8b alias on HF Space:
RIPRAP_OLLAMA_3B_TAG=granite4.1:8bin the Dockerfile means both planner and reconciler use the 8b on the Space. If 3b is never pulled, thegranite4.1:3bmodel is absent and an explicit call to that tag would fail. Current routing via the alias system prevents this, but a direct tag reference in new code would break. - vLLM
[doc_id=X]normalization inapp/llm.py:_normalize_citations(): Applied per-chunk in streaming and once on non-streaming responses. If vLLM ever batches citation tokens across two stream chunks, the regex would miss them. This hasn't happened in practice but is a known theoretical gap. - RAG startup failure doesn't prevent startup:
rag.warm()is wrapped in a try/except that prints and continues. If sentence-transformers fails to load, all queries return without policy context β the briefing still works but silently loses the RAG section. - Mellea API shape versioning:
reconcile_strict()usesmellea.start_session(backend_name="ollama")from Mellea 0.3/0.4 (HF has 0.3, local has 0.4). The_extract_text()and_extract_attempts()helpers duck-type multiple attribute names.reconcile_strict_streaming()avoids Mellea's session entirely (hand-rolled) and is version-independent β this is the production path. Thereconcile_strict()function is only exercised in offline contexts. - NYC 311 Socrata calls uncached: Each query fetches fresh from Socrata. Under rate-limit or extended 311 maintenance, the specialist returns
n=0and no 311 doc is emitted; the briefing silently lacks that signal.
Known gaps / out-of-scope
compareintent defined in planner.py INTENTS dict but no routing to acompare.pyintent module exists inweb/main.py:api_agent_stream. Planner would route to it but the runner would fall through tosingle_address.- Retrospective mode (
what would Riprap have said on date X): blocked at planner with not-implemented message. No historical data replay exists. - Cross-register ranking (
rank top 5 neighborhoods by flood exposure): blocked at planner. Would require a cross-register join that doesn't exist. - FEMA NFHL integration: FEMA 1% and 0.2% floodplain indicators are in the scoring rubric (
app/score.py:REGULATORY) but the corresponding FSM step and data layer are absent β they're stubbed at 0 in practice. The score still works but the FEMA regulatory sub-index doesn't contribute. - Sub-surface flooding (Ida basement mode): Optical satellites can't see basement flooding. Prithvi correctly emits no polygons for inland Queens. This is documented as an honest scope limit, not a bug.
/api/compareendpoint exists atweb/main.py:compare_streamand works as a two-parallel-FSM-runs endpoint, but the SvelteKit UI doesn't expose a compare page (legacycompare.htmlwas retired in v0.4.5).
8. The non-obvious decisions
Why not a risk score from 0β100
The tier is a deterministic, published rubric (Cutter et al. 2003 construction, Tate 2012 equal-weights argument, Balica 2012 empirical floor). A continuous score would imply calibration against labeled damage outcomes β which don't exist here. Riprap has no closed claim records; producing "flood risk 0.73" without claims-driven calibration would be a fabricated precision. The tier is explicitly a prior (METHODOLOGY.md Β§1). FEMA Risk Rating 2.0 is the product to use if you want claims-driven numbers.
Why silence over confabulation
Specialists that don't fire emit nothing. build_documents() gates on state.get(key) is not None. Granite's post-training includes grounded-generation discipline ("don't generate from absent documents"). This plus the Mellea citation checks means a calm-weather query produces no NWS-alerts section in the briefing rather than "no alerts were found" β that would be correct but uncitable. The section is absent. This is explicit in the system prompt: "Omit any section whose supporting facts are absent from the documents."
Why public-record-only at runtime
Data governance: a newsroom with FOIL'd documents, or an agency with internal capital plans, can't paste that data into a vendor LLM (ARCHITECTURE.md Β§11). All specialist data comes from NYC OpenData, USGS, NOAA, NWS, FloodNet NYC (public sensor network). No commercial data; no private address databases. The system is reproducible and auditable.
Why the four epistemic tiers (empirical / modeled / proxy / synthetic)
The distinction matters for how much weight to give each signal, documented in ARCHITECTURE.md Β§1.2. Empirical (Sandy HWMs, Ida HWMs, FloodNet events) = something flooded a place and was measured. Modeled scenarios (DEP, FEMA NFHL) = hydraulic simulation under assumptions. Proxy (311 complaints, HAND, TWI) = indirect indicators. Synthetic prior (TerraMind synthesis) = generative model output, never "imaged" or "reconstructed." The build_documents() function embeds these interpretive framing sentences directly into the doc bodies so Granite is instructed in the document itself how to characterize each source.
Why the Five Stones names
Functional grouping for a trace UI with 25+ specialists. Stonework vocabulary maps to function: Cornerstone remembers the foundation (static hazard record); Keystone is the load-bearing arch piece (what's exposed); Touchstone is the evaluative reference (current state); Lodestone draws you toward something (forecast pull); Capstone is the crown that holds the vault (synthesis). The names let a non-technical demo audience follow the 25-step trace without reading each step label.
Why citation-grounded prose vs structured output
JSON structured output (tier + per-field arrays) is easy to produce but hard to cite in a grant application or news article. The four-section prose format with [doc_id] tags produces text a planner can quote in a FEMA BRIC sub-application or a journalist can use verbatim with inline sourcing. The citation tags map to clickable source chips in the frontend. Structured JSON of the underlying specialist outputs is also available in the final SSE event for machine consumption.
Why Mellea rejection sampling (vs post-hoc sentence dropping)
The original verify_paragraph() in app/reconcile.py drops sentences after generation. This produces a shorter briefing and a silent quality improvement β but the user sees a briefing that may have had sentences removed. The Mellea rejection sampler rerolls the entire generation when it fails, and streams each attempt's tokens to the user live (visible progress), then shows a green/amber inline banner. The user understands the system is enforcing quality, not silently deleting content. Psychologically this is more defensible in a professional context.
Why planner-then-Capstone two-LLM split
The planner is a structured-output routing task (small JSON, deterministic, temperature=0). It should be fast and cheap. The reconciler is a long-form synthesis task requiring dense citation discipline β it benefits from the larger context window and stronger instruction-following of the 8b model. Using 3b for routing keeps TTFB low (planner JSON appears in ~2 s vs ~8 s for 8b). On the HF Space both aliases map to 8b via RIPRAP_OLLAMA_3B_TAG=granite4.1:8b to avoid disk cost, accepting the TTFB penalty.
Why LiteLLM Router
The alternative was a hand-rolled if primary == "vllm": ... else: ollama.chat(...) dispatch. LiteLLM's Router gives model aliasing, failover, and a common call signature for free. The ~250-line shim in app/llm.py covers: Ollama-vs-vLLM backend selection, document-role message extraction for vLLM's HF chat template, [doc_id=X] β [X] citation normalization, JSON-mode translation, and backend info for the UI badge. Any future backend (mlx-lm, llama.cpp, etc.) is a 10-line entry in _build_router().
Why vLLM emits [doc_id=X] while Ollama emits [X]
Ollama's Granite 4.1 Modelfile template lifts role="document <id>" messages into a <documents> block and the model emits bare [X] citations. The HF tokenizer template used by vLLM emits [doc_id=X]. The rest of Riprap (Mellea regex, frontend citation chip parser, sources footer) was written against [X]. The _CITE_NORMALIZE_RE in app/llm.py normalizes per-chunk in streaming, preventing any vLLM-specific citation format from leaking downstream.
Why Prithvi runs offline (baked GeoJSON) while TTM runs live
Prithvi-EO 2.0 with TerraTorch needs GPU and minutes per HLS tile. Running it per-query on a CPU-basic Space is not viable. The 166-polygon GeoJSON was computed once on AMD MI300X, filtered (>30,000 sqft to drop noise, <1 kmΒ² to drop tidal artifacts), and committed. The runtime FSM does point-in-polygon (milliseconds). This is honest about what EO models earn their keep on: a one-time defensible event-level signal, not per-request inference. TTM r2 at 1.5 M params runs in milliseconds on CPU β no such tradeoff exists.
Why citations_dense uses sentence scope, not character window
The original implementation used ~40 chars proximity between a number and its citation tag. This was fragile for normal English sentence structure ("The address has 11 flood-related complaints [nyc311] within 200 m"). The citation might be 60 chars from the number. Switching to sentence scope (.[\s)] split) eliminated the chronic 3/4 neighborhood-query failure mode. "Sentence scope" is also how human readers actually assign attribution β the citation at the end of the sentence covers the claim anywhere in that sentence.
9. What's next
From OPEN-ISSUES.md, CLAUDE.md polish targets, and code-level TODO comments, in priority order for the May 13 ASCE presentation:
- Demo-script dry run against live Space. Space sometimes sleeps after idle; cold start is 30β90 s. Pre-ping the Space before presenting. Verify the backend pill shows correct hardware.
compareintent wiring.planner.pydeclares thecompareintent (noted asNOT_IMPLEMENTEDcomment β actually the planner doesn't short-circuit compare, it just routes to single_address by default). If you want the compare flow to work end-to-end,web/main.py:api_agent_streamneeds routing toi_addr.runtwice in parallel, or a newcompare.pyintent module.- FEMA NFHL layer. The scoring rubric has
fema_1pctandfema_02pctweights but no FSM step or data layer. Adding the FEMA NFHL download and astep_fema_nfhlaction would materially improve Regulatory sub-index accuracy for addresses in AE/VE zones that aren't in Sandy extent. - NYCHA/DOE/DOH registers on Space.
RIPRAP_NYCHA_REGISTERS=0by default. Enabling on HF Space would add 3 more Keystone specialists to every single_address query but requires the 91 MB sandy GeoJSON pre-load to complete within Space startup time. - Droplet redeploy verification. The
services/riprap-models/Dockerfilewas never tested end-to-end. Thesafetensors==0.8.0rc0RC pin is the most likely failure point. Next droplet bring-up should test this first. - Experiments
OPEN-ISSUES.mditems. All four issues are inexperiments/only (F821 numpy annotation in exp17, f-string Py 3.12+ syntax in exp18, B023 closure variable in exp05, F841 unused api in exp18). Won't affect production but clean up the codebase. - Reranker integration.
app/rag.pyhas a full_ensure_reranker()andRIPRAP_RERANKER_ENABLEflag foribm-granite/granite-embedding-reranker-english-r2cross-encoder. Off by default (no HF Space disk for the CrossEncoder model). Enabling on the AMD droplet path would improve Policy context quality at no latency cost. - Historical replay / retrospective mode. Blocked at planner with not-implemented message. Substantial feature: would require snapshotting specialist output at query time or storing NOAA/311/FloodNet historical pull results.
10. Quick reference: files that matter
| Task | Open first |
|---|---|
| Add a new specialist | app/fsm.py (add @action + wire into build_app()), app/reconcile.py:build_documents() (add doc emission), app/intents/single_address.py (no change usually needed), web/sveltekit/src/ (add step label + source card) |
| Change the briefing structure / system prompt | app/reconcile.py:EXTRA_SYSTEM_PROMPT, then app/intents/neighborhood.py:EXTRA_SYSTEM_PROMPT for neighborhood path; rebuild web/sveltekit if adding new section rendering |
| Tune the Mellea grounding checks | app/mellea_validator.py β _NUM_RE, _TRIVIAL_NUMS, _check_every_claim_cited(), _failing_sentences_for_citations() |
| Change which backend (vLLM vs Ollama) | app/llm.py env vars; no code change needed |
| Add a new intent | app/planner.py:INTENTS + SPECIALISTS entries, _required_specialists(), then new app/intents/<intent>.py; wire in web/main.py:api_agent_stream and api_agent |
| Change the exposure tier scoring | app/score.py:REGULATORY/HYDROLOGICAL/EMPIRICAL dicts + TIER_BREAKPOINTS; update METHODOLOGY.md |
| Debug why a specialist fired wrong | scripts/probe_mellea.py --query "<address>" --runs 1; check step events in SSE stream; look at final.mellea.requirements_failed |
| Rebuild the frontend | cd web/sveltekit && npm run build (new design-system UI); cd web/svelte && npm run build (legacy Svelte 5 custom elements to web/static/dist/riprap.js) |
| Run the full end-to-end test | .venv/bin/python scripts/probe_addresses.py |
| Rebuild the pre-computed registers | scripts/build_mta_entrances_register.py, scripts/build_nycha_register.py, scripts/build_schools_register.py |
| Rebuild Prithvi Ida polygons | scripts/run_prithvi_ida.py β needs GPU + TerraTorch |
| Rebuild the pitch deck | cd slides && make pdf html pptx (needs marp-cli) |
| Add a question-type framing | app/framing.py:_PATTERNS + _DIRECTIVES |
| Understand why a doc was missing from the briefing | Check build_documents() in app/reconcile.py β each block has an explicit gate condition; also check trim_docs_to_plan() |
| Understand the SSE stream structure | web/main.py:api_agent_stream, the _STEP_TO_STONE dict, and the stone_start/stone_done wrapping logic |
| Deploy to HF Space | git push && git push huggingface main; monitor rebuild via `curl -sf "https://huggingface.co/api/spaces/lablab-ai-amd-developer-hackathon/riprap-nyc/runtime" |
| Deploy to AMD droplet | scripts/deploy_droplet.sh <ip> <token>, then set Space env vars via huggingface-cli space variables, restart Space |