File size: 2,325 Bytes
48be8c8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | # 22 β Cornerstone optimization
**Goal:** drop the 33s DEP join and 5β10s Sandy join on the HF Space CPU
to <1s without changing Stone semantics.
## Layer triage: live vs bakeable
The Cornerstone is a **Hazard Reader** β it reads what the ground
*already remembers*. Every Cornerstone source is by definition
historical or modeled, so the per-query cost of recomputing a
spatial join is unwarranted. Live recency belongs to the **Touchstone**
(FloodNet) and **Lodestone** (forecasts), not here.
| Source | Nature | Updates | Verdict |
|---|---|---|---|
| `dep_stormwater` | Modeled scenarios (2050/2080 SLR + design storm) | NYC DEP republishes every few years | **bake** to GeoTIFF |
| `sandy_inundation` | Empirical 2012 extent | Will not change | **bake** to GeoTIFF |
| `ida_hwm` | USGS HWMs (point set, ~few hundred) | Will not change | already O(n) haversine β leave alone |
| `prithvi_water` | Pre-baked Ida polygons | Will not change | already baked |
| `microtopo` (DEM/HAND/TWI) | LiDAR-derived rasters | Re-baked on terrain changes | already raster β already fast |
**Live (kept live for demo recency):**
- Geocoding (Geosearch + Nominatim fallback)
- FloodNet sensor pull (Touchstone)
- TTM battery surge / pluvial forecast (Lodestone)
- NYCHA / DOE / MTA registers (semi-static, prebuilt at boot β already fast)
So this experiment only touches the two slow Cornerstone specialists.
## Approaches benchmarked
1. **baseline** β current `gpd.sjoin` (full layer)
2. **strtree** β pre-warm `gdf.sindex`, query with single-point `intersects`
3. **bbox-prefilter** β clip layer to bbox(point, 100ft) then sjoin
4. **raster** β bake polygons β uint8 GeoTIFF in EPSG:2263; `rasterio.sample()` per point
For DEP, the raster encodes max `Flooding_Category` per pixel
(0=outside, 1/2/3 = depth class). Sandy is a 1-bit raster.
## Files
- `bench.py` β runs all four paths on canonical addresses
- `bake_rasters.py` β one-time bake of DEP + Sandy to GeoTIFF
- `RESULTS.md` β written after `bench.py` completes
## Canonical addresses
Per CLAUDE.md / probe set:
1. 80 Pioneer Street, Brooklyn β (40.6790, -74.0050)
2. 2508 Beach Channel Drive, Queens β (40.5867, -73.8062)
3. Coney Island I Houses, Brooklyn β (40.5772, -73.9870)
4. Carleton Manor, Queens β (40.6033, -73.7626)
|