fix: catch RuntimeError in EO deps probes, add demo playbook
Browse filesThe deps-probe pattern in terramind_nyc / terramind_synthesis /
prithvi_live only caught ImportError. But terratorch's import chain
on the HF Space raises RuntimeError("operator torchvision::nms does
not exist") because torchvision's C extension can't load against
our CPU torch wheel. The exception propagates past the probe, the
module fails to load, and the FSM step's outer except surfaces the
raw RuntimeError as 'err' in the trace instead of 'skipped'.
Catch any Exception during the probe and treat as unavailable. The
specialist returns a clean 'skipped' entry, the trace UI renders it
gray (silent) instead of red (errored), and the demo reads as
honest engineering instead of broken plumbing.
DEMO-PLAYBOOK.md: top-level handoff doc with the 3 demo queries,
what each one shows, the trace skip messages to expect, and the
end-to-end smoke test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- DEMO-PLAYBOOK.md +92 -0
- app/context/terramind_nyc.py +12 -1
- app/context/terramind_synthesis.py +7 -0
- app/flood_layers/prithvi_live.py +8 -0
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Riprap Demo Playbook
|
| 2 |
+
**For:** AMD Developer Cloud Hackathon Β· May 4β10, 2026
|
| 3 |
+
**Live URL:** https://lablab-ai-amd-developer-hackathon-riprap-nyc.hf.space
|
| 4 |
+
|
| 5 |
+
## What was fixed for the demo
|
| 6 |
+
|
| 7 |
+
1. **Cornerstone (Hazard Reader) latency** β DEP stormwater + Sandy 2012 join went from a 33-second cold-load to <100ms. Both layers are baked to compact GeoTIFFs (`data/baked/`, 7 MB total) sampled with rasterio. The 33s ReadTimeouts in batch testing are gone.
|
| 8 |
+
|
| 9 |
+
2. **Heavy register specialists no longer hang** β `step_nycha`, `step_doe_schools`, `step_doh_hospitals` previously did 20 polygonΓpolygon intersections per query (8+ minute hang on HF Space CPU). They now read pre-computed exposure flags from `data/registers/*.json` (sub-millisecond). Hospitals don't have a pre-built register but read the 30 KB GeoJSON directly and sample the new Cornerstone rasters per hit.
|
| 10 |
+
|
| 11 |
+
3. **Live EO chain enabled** β `eo_chip_fetch` (multi-modal Sentinel-2 + Sentinel-1 from Microsoft Planetary Computer) and `prithvi_eo_live` (NYC-Pluvial flood segmentation on live imagery, remote-inferenced on the AMD MI300X) are now firing on every query. This is the marquee live-data Stone for the demo.
|
| 12 |
+
|
| 13 |
+
4. **Misleading UI copy fixed** β three Stone specialists (TerraMind Buildings/LULC/Synthesis, Prithvi-NYC-Pluvial) previously claimed `RIPRAP_HEAVY_SPECIALISTS=0` when they silently skipped. Heavy specialists are actually enabled in production β the new copy reflects the actual cause (no recent <30% cloud Sentinel-2 chip / inference unavailable).
|
| 14 |
+
|
| 15 |
+
## The 3 demo queries
|
| 16 |
+
|
| 17 |
+
### 1. **2508 Beach Channel Drive, Queens** β full single_address activation
|
| 18 |
+
**What it shows:** the deep-data-density address from FRIDAY-REPORT. Single_address intent triggers every specialist:
|
| 19 |
+
- **Cornerstone:** Sandy outside, all DEP scenarios outside (this is Bayswater, just inland of the Sandy zone β useful counter-example)
|
| 20 |
+
- **Touchstone:** 2 FloodNet sensors (600m radius), 64 NYC 311 flood complaints, NOAA station 8516945 live, NWS hourly METAR
|
| 21 |
+
- **Lodestone:** Granite TTM forecasts (peak 0.47 ft surge ~2h ahead), TTM 311-forecast, NWS alerts
|
| 22 |
+
- **Keystone:** **7 MTA entrances** (6 in Sandy, 5 in DEP-2080), **2 NYCHA developments**, **5 schools** (4 in Sandy, 3 in DEP-2080), **1 hospital** β all with elevation/HAND from baked rasters
|
| 23 |
+
- **Live EO:** **`prithvi_eo_live`** fires with a real Sentinel-2 chip (β€30% cloud, β€120 days old) β flood segmentation runs on the MI300X; **`eo_chip_fetch`** pulls multi-modal S2L2A + S1RTC chip
|
| 24 |
+
- **Capstone:** Granite Embedding RAG (3 hits) + GLiNER typed extraction + Mellea-grounded Granite-4.1 8B reconciliation (4/4 requirements pass)
|
| 25 |
+
|
| 26 |
+
**Talking points:** "This is what 'resilient infrastructure briefing' produces when every specialist fires. The Touchstone and Cornerstone disagree β Bayswater is just inland of Sandy 2012 but its subway entrances 200m away are deep in the zone. Live data: the Prithvi specialist just pulled a Sentinel-2 image from this month and ran flood segmentation on the MI300X."
|
| 27 |
+
|
| 28 |
+
### 2. **Coney Island I Houses, Brooklyn** β neighborhood path, narrative briefing
|
| 29 |
+
**What it shows:** neighborhood intent routing. Planner reads "Houses" with no street number β resolves to NTA polygon, runs the neighborhood specialist set (sandy_nta, dep_*_nta, nyc311_nta).
|
| 30 |
+
- Returns a Markdown briefing structured as Status / Empirical / Modeled / Policy
|
| 31 |
+
- Uses NTA-aggregated metrics ("X% of the neighborhood was inundated during Sandy")
|
| 32 |
+
- ~7-second total latency
|
| 33 |
+
|
| 34 |
+
**Talking points:** "Same system, different intent. The planner picks neighborhood for queries that name an area without a house number. The briefing is denser narrative; the underlying data is NTA-aggregated, which is the right unit for emergency-management framing."
|
| 35 |
+
|
| 36 |
+
### 3. **80 Pioneer Street, Brooklyn (Red Hook)** β single_address, full activation
|
| 37 |
+
**What it shows:** Red Hook is canonical Sandy turf. All Stones populated:
|
| 38 |
+
- **Cornerstone:** Sandy inside, DEP-2080 outside, microtopo 0.83m elevation (low-lying)
|
| 39 |
+
- **NYCHA:** Red Hook Houses (East/West) β both inside Sandy
|
| 40 |
+
- **Schools:** PS 27, PS 30 β both inside Sandy
|
| 41 |
+
- **Hospitals:** 3 nearby
|
| 42 |
+
- **MTA:** entrances inside Sandy
|
| 43 |
+
- **Live Sentinel-2 chip + Prithvi flood segmentation** runs
|
| 44 |
+
|
| 45 |
+
**Talking points:** "Red Hook is the canonical 'this is what we got wrong in 2012' address. The system surfaces a NYCHA development, a school, a hospital, and a subway entrance β all at risk in the same query. That's the demo: one query, four asset classes, every Stone audited, plus a live Sentinel-2 chip."
|
| 46 |
+
|
| 47 |
+
## Reading the trace
|
| 48 |
+
|
| 49 |
+
The trace UI groups specialists by Stone. Each row shows status (fired / silent / errored) and a one-line skip reason if silent. Silent isn't broken β it's the engineering-honest contract: when a specialist's preconditions aren't met, it stays silent rather than fabricating.
|
| 50 |
+
|
| 51 |
+
**Honest skips you'll see in the demo:**
|
| 52 |
+
- *"FloodNet sensor recurrence: sensor has < silent-floor historical events; forecast omitted"* β sensor too new to forecast
|
| 53 |
+
- *"NPCC4 SLR projection: not yet wired into FSM"* β out of scope, listed for transparency
|
| 54 |
+
- *"NWS public alerts: no active flood-relevant alerts at this address"* β true, no active alert today
|
| 55 |
+
|
| 56 |
+
**Specialists that may show as skipped/errored on the demo:**
|
| 57 |
+
- **TerraMind LULC / Buildings / Synthesis** β the droplet's MI300X has Prithvi loaded but not TerraMind weights yet. Specialists return *"remote inference unreachable + local torchvision binary unavailable on this deployment"* β honest. Out of scope to fix from the Space side; needs droplet rebuild with terramind models.
|
| 58 |
+
- **floodnet_forecast** β sensors with <5 historical events skip the forecast.
|
| 59 |
+
|
| 60 |
+
## Caveats to be ready for
|
| 61 |
+
|
| 62 |
+
- **NYCHA cards are binary, not pct-overlap** β per-query view shows `inside_sandy_2012: bool` and `dep_*_class: int` instead of `pct_inside_sandy_2012` floats. Same source data, less precise representation but fast (~ms instead of 8 min). The `/api/register/nycha` city-wide register is unchanged.
|
| 63 |
+
- **Heavy specialists are enabled** but may silently skip if Sentinel-2 chip fetch returns nothing recent for the query address. Prithvi-EO Live looks back 120 days for <30% cloud β most NYC addresses have a recent hit; very edge-of-NYC ones may not.
|
| 64 |
+
- **Inference is remote** on AMD MI300X via vLLM at `165.245.141.218:8001`. If the droplet is down, the reconciler will fail and the Capstone Stone won't render a paragraph; specialists will still fire and surface their data.
|
| 65 |
+
- **Bake re-runs** β `data/baked/*.tif` (7 MB) was generated once via `scripts/bake_cornerstone_rasters.py`. Re-bake when DEP scenarios get republished by NYC DEP (rare, ~every 5 years).
|
| 66 |
+
- **Register rebuilds** β `data/registers/*.json` are regenerated by `scripts/build_*_register.py` when the underlying NYCHA / DOE / NYS DOH datasets refresh.
|
| 67 |
+
|
| 68 |
+
## End-to-end smoke test
|
| 69 |
+
|
| 70 |
+
To verify before showtime:
|
| 71 |
+
|
| 72 |
+
```
|
| 73 |
+
.venv/bin/python scripts/probe_addresses.py \
|
| 74 |
+
--base https://lablab-ai-amd-developer-hackathon-riprap-nyc.hf.space \
|
| 75 |
+
--addresses "2508 Beach Channel Drive, Queens|Coney Island I Houses, Brooklyn|80 Pioneer Street, Brooklyn" \
|
| 76 |
+
--timeout 240
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
Expected: 3/3 PASS, each in 6β17 s after warm-up. If 2508 Beach Channel takes >60s, that's the post-restart pre-warm finishing β re-run.
|
| 80 |
+
|
| 81 |
+
## Final summary of changes shipped this cycle
|
| 82 |
+
|
| 83 |
+
| Change | Files | Effect |
|
| 84 |
+
|---|---|---|
|
| 85 |
+
| Cornerstone raster bake | `app/flood_layers/{dep_stormwater,sandy_inundation}.py`, `scripts/bake_cornerstone_rasters.py`, `data/baked/*.tif` | 33s β <100ms cold; <5ms per query |
|
| 86 |
+
| Register refactor | `app/registers/{nycha,doe_schools,doh_hospitals}.py`, `app/registers/_loader.py` | 8+ min hang β <100ms total |
|
| 87 |
+
| EO deps | `requirements.txt` (planetary-computer/pystac-client/rioxarray/xarray/einops) | Live Sentinel-2 + Prithvi remote inference |
|
| 88 |
+
| Deps gate split | `app/flood_layers/prithvi_live.py`, `app/context/terramind_synthesis.py`, `app/context/terramind_nyc.py` | Tier-1 chip-fetch separated from Tier-2 local-inference |
|
| 89 |
+
| UI honesty | `web/sveltekit/src/lib/data/stoneRegistry.ts`, `web/sveltekit/src/lib/client/registerAdapter.ts`, `web/sveltekit/src/routes/q/[queryId]/+page.svelte` | "RIPRAP_HEAVY_SPECIALISTS=0" copy gone; new NYCHA schema |
|
| 90 |
+
| HF env | `scripts/update_hf_env.sh` (RIPRAP_NYCHA_REGISTERS=1), set on live Space | Heavy register specialists actually attached to FSM |
|
| 91 |
+
| FSM consumers | `app/fsm.py`, `app/reconcile.py` | Match new NYCHA schema |
|
| 92 |
+
| Warmup hygiene | `web/main.py` | Drop 91 MB Sandy GeoJSON pre-load (no longer needed) |
|
|
@@ -85,7 +85,13 @@ def _has_required_deps() -> tuple[bool, str | None]:
|
|
| 85 |
"""Probe the heavy-EO deps. Same shape as prithvi_live's check β
|
| 86 |
a missing dep (terratorch / peft / safetensors / hf_hub) returns a
|
| 87 |
clean `skipped: deps_unavailable` outcome instead of a noisy
|
| 88 |
-
ModuleNotFoundError in the trace.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
missing: list[str] = []
|
| 90 |
for name in ("terratorch", "peft", "safetensors", "huggingface_hub",
|
| 91 |
"torch", "yaml"):
|
|
@@ -93,6 +99,11 @@ def _has_required_deps() -> tuple[bool, str | None]:
|
|
| 93 |
__import__(name)
|
| 94 |
except ImportError:
|
| 95 |
missing.append(name)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
if missing:
|
| 97 |
return False, ", ".join(missing)
|
| 98 |
return True, None
|
|
|
|
| 85 |
"""Probe the heavy-EO deps. Same shape as prithvi_live's check β
|
| 86 |
a missing dep (terratorch / peft / safetensors / hf_hub) returns a
|
| 87 |
clean `skipped: deps_unavailable` outcome instead of a noisy
|
| 88 |
+
ModuleNotFoundError in the trace.
|
| 89 |
+
|
| 90 |
+
On the HF Space, terratorch's import chain itself can raise
|
| 91 |
+
RuntimeError("operator torchvision::nms does not exist") when the
|
| 92 |
+
torchvision binary extension can't load against our CPU torch
|
| 93 |
+
wheel. Treat that as 'unavailable' too β the local inference path
|
| 94 |
+
is dead-on-arrival there."""
|
| 95 |
missing: list[str] = []
|
| 96 |
for name in ("terratorch", "peft", "safetensors", "huggingface_hub",
|
| 97 |
"torch", "yaml"):
|
|
|
|
| 99 |
__import__(name)
|
| 100 |
except ImportError:
|
| 101 |
missing.append(name)
|
| 102 |
+
except Exception as e:
|
| 103 |
+
# torchvision::nms RuntimeError, libcuda load failure, etc.
|
| 104 |
+
log.warning("terramind_nyc: %s import raised %s; treating as "
|
| 105 |
+
"unavailable", name, type(e).__name__)
|
| 106 |
+
missing.append(f"{name} ({type(e).__name__})")
|
| 107 |
if missing:
|
| 108 |
return False, ", ".join(missing)
|
| 109 |
return True, None
|
|
@@ -91,6 +91,13 @@ def _has_required_deps() -> tuple[bool, str | None]:
|
|
| 91 |
missing.append(name)
|
| 92 |
except ImportError:
|
| 93 |
log.debug("terramind: import race on %s, will retry on demand", name)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
return (not missing, ", ".join(missing) if missing else None)
|
| 95 |
|
| 96 |
|
|
|
|
| 91 |
missing.append(name)
|
| 92 |
except ImportError:
|
| 93 |
log.debug("terramind: import race on %s, will retry on demand", name)
|
| 94 |
+
except Exception as e:
|
| 95 |
+
# torchvision::nms RuntimeError on HF Space β local inference
|
| 96 |
+
# is unavailable; treat as missing so fetch() returns a clean
|
| 97 |
+
# skip rather than crashing in _ensure_model.
|
| 98 |
+
log.warning("terramind: %s import raised %s; treating as "
|
| 99 |
+
"unavailable", name, type(e).__name__)
|
| 100 |
+
missing.append(f"{name} ({type(e).__name__})")
|
| 101 |
return (not missing, ", ".join(missing) if missing else None)
|
| 102 |
|
| 103 |
|
|
@@ -98,11 +98,19 @@ def _has_required_deps() -> tuple[bool, str | None]:
|
|
| 98 |
|
| 99 |
|
| 100 |
def _has_module(name: str) -> bool:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
try:
|
| 102 |
__import__(name)
|
| 103 |
return True
|
| 104 |
except ImportError:
|
| 105 |
return False
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
|
| 107 |
|
| 108 |
_DEPS_OK, _DEPS_MISSING = _has_required_deps()
|
|
|
|
| 98 |
|
| 99 |
|
| 100 |
def _has_module(name: str) -> bool:
|
| 101 |
+
"""True if `name` imports cleanly. ImportError β not installed.
|
| 102 |
+
Other exceptions (e.g. torchvision::nms RuntimeError on the HF
|
| 103 |
+
Space) β treat as unavailable too; we don't want a clean-skip
|
| 104 |
+
intent to crash the FSM at deps-probe time."""
|
| 105 |
try:
|
| 106 |
__import__(name)
|
| 107 |
return True
|
| 108 |
except ImportError:
|
| 109 |
return False
|
| 110 |
+
except Exception as e:
|
| 111 |
+
log.warning("prithvi_live: %s import raised %s; treating as "
|
| 112 |
+
"unavailable", name, type(e).__name__)
|
| 113 |
+
return False
|
| 114 |
|
| 115 |
|
| 116 |
_DEPS_OK, _DEPS_MISSING = _has_required_deps()
|