seriffic commited on
Commit
76f3ae6
Β·
1 Parent(s): bc00192

Reconciler skeleton + composite score + energy stub

Browse files

The capstone of v0.1's nine-specialist pipeline:

reconcile.py β€” the prompt scaffold for Granite 4.1:3b. Specialist
outputs become role='document <doc_id>' messages so
the Ollama Modelfile bundles them into a <documents>
block; system prompt enforces a four-section
structure and the citation discipline. Includes a
numeric-claim guardrail that rejects outputs whose
numbers don't appear verbatim in the doc messages.

score.py β€” composite weighted exposure score across the layers
with a max-empirical floor (Sandy/Ida-observed
always pulls the score up).

energy.py β€” per-query Wh estimate for the local-Granite path
vs. a cloud LLM baseline. Honesty-mark for the
'no vendor LLM' claim.

This unblocks the FSM (next slot) β€” every specialist now has a
consumer that turns its dataclass back into prose.

Files changed (3) hide show
  1. app/energy.py +56 -0
  2. app/reconcile.py +338 -0
  3. app/score.py +47 -0
app/energy.py ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Per-query energy footprint estimate.
2
+
3
+ Conservative, defensible numbers β€” no overclaim. We measure local
4
+ inference time and apply a published-range package-power figure for
5
+ Apple-Silicon LLM inference; we compare to the most recent published
6
+ estimate of frontier-cloud per-query energy (Epoch AI, 2025).
7
+
8
+ This is not a benchmark β€” it's a transparent rule-of-thumb that the
9
+ user can audit. The system prompt and the UI both surface the
10
+ underlying numbers and the citation.
11
+ """
12
+ from __future__ import annotations
13
+
14
+ # Local: Granite 4.1:3b on Apple M-series (M3/M4 Pro range)
15
+ # Sustained package power during ~5 s of LLM inference, q4_K_M quant.
16
+ # Source: ml.energy + community measurements; conservative midpoint.
17
+ LOCAL_PACKAGE_POWER_W = 20.0
18
+
19
+ # Frontier cloud per-query inference energy.
20
+ # Source: Epoch AI, "How much energy does ChatGPT use?" (2025).
21
+ # https://epoch.ai/gradient-updates/how-much-energy-does-chatgpt-use
22
+ # This is a typical-query estimate for GPT-4o-class inference; long-context
23
+ # queries scale roughly linearly with token count.
24
+ CLOUD_PER_QUERY_WH = 0.30
25
+
26
+ # Citation strings used in the UI.
27
+ LOCAL_SOURCE = ("ml.energy / community measurements; ~20 W package power "
28
+ "during Granite 4.1:3b q4_K_M inference on Apple M-series.")
29
+ CLOUD_SOURCE = ('Epoch AI (2025), "How much energy does ChatGPT use?", '
30
+ "estimating ~0.3 Wh per typical GPT-4o query.")
31
+
32
+
33
+ def estimate(reconcile_seconds: float, total_seconds: float | None = None) -> dict:
34
+ """Return a per-query energy estimate.
35
+
36
+ Args:
37
+ reconcile_seconds: wallclock of the Granite reconcile step (the
38
+ only step that meaningfully draws CPU/GPU power).
39
+ total_seconds: optional full-FSM wallclock for context.
40
+ """
41
+ local_wh = LOCAL_PACKAGE_POWER_W * reconcile_seconds / 3600.0
42
+ return {
43
+ "local_wh": round(local_wh, 4),
44
+ "local_mwh": round(local_wh * 1000, 1),
45
+ "cloud_wh": CLOUD_PER_QUERY_WH,
46
+ "cloud_mwh": round(CLOUD_PER_QUERY_WH * 1000, 1),
47
+ "ratio_cloud_over_local": round(CLOUD_PER_QUERY_WH / local_wh, 1) if local_wh > 0 else None,
48
+ "method": {
49
+ "local": f"{LOCAL_PACKAGE_POWER_W} W Γ— {reconcile_seconds:.2f} s Γ· 3600",
50
+ "local_source": LOCAL_SOURCE,
51
+ "cloud": f"{CLOUD_PER_QUERY_WH} Wh per query (published estimate)",
52
+ "cloud_source": CLOUD_SOURCE,
53
+ },
54
+ "reconcile_seconds": round(reconcile_seconds, 2),
55
+ "total_seconds": round(total_seconds, 2) if total_seconds is not None else None,
56
+ }
app/reconcile.py ADDED
@@ -0,0 +1,338 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Document-grounded reconciliation via Granite 4.1 (local Ollama).
2
+
3
+ Uses Granite 4.1's native grounded-generation interface: each specialist
4
+ that produced data becomes a separate message with role="document <doc_id>".
5
+ Ollama's chat template lifts those into the model's `<documents>` system
6
+ block and prepends IBM's official grounded-generation system prompt.
7
+
8
+ Specialists that didn't fire emit nothing β€” silence over confabulation.
9
+ The model is post-trained to refuse to ground on absent documents.
10
+
11
+ A server-side post-check verifies every numeric token in the output appears
12
+ verbatim in the source documents. Sentences with ungrounded numbers are
13
+ dropped from the rendered paragraph (still recorded in the trace as
14
+ unverified for audit). This is the cheapest reliable guardrail against
15
+ the worst hallucination class β€” fabricated stats β€” and it's deterministic.
16
+ """
17
+ from __future__ import annotations
18
+
19
+ import logging
20
+ import os
21
+ import re
22
+ from typing import Any
23
+
24
+ import ollama
25
+
26
+ log = logging.getLogger("riprap.reconcile")
27
+
28
+ OLLAMA_MODEL = os.environ.get("HELIOS_NYC_OLLAMA_MODEL", "granite4.1:3b")
29
+
30
+ # Granite auto-prepends its own grounded-generation system prompt when the
31
+ # message list contains "document" roles. This adds *additional* rules.
32
+ EXTRA_SYSTEM_PROMPT = """You are Riprap's grounded reconciler. Produce a SHORT factual paragraph (4-7 sentences) summarising flood risk at a NYC address. Use ONLY information from the documents provided.
33
+
34
+ Citation format β€” STRICT:
35
+ - After every factual or numerical claim, cite the originating document by its doc_id in square brackets, e.g. [sandy] or [floodnet].
36
+ - Use square brackets [ and ]. Never parentheses, never the word "source".
37
+ - A claim drawn from multiple documents may carry multiple tags, e.g. [sandy][floodnet].
38
+
39
+ Hard rules β€” non-negotiable:
40
+ - Copy numerical values verbatim from documents. Do not round.
41
+ - Do NOT name a specific weather event (Hurricane Sandy, Ida, Henri, Ophelia, etc.) unless THIS document set explicitly mentions that event applies to THIS address. The fact that a RAG passage discusses an event in passing is NOT licence to apply it to the address. If you mention an event, you must cite the specific document supporting that the event affected this address.
42
+ - Do NOT invent dates, sensor IDs, hazard categories, or street/neighborhood names beyond what the documents contain.
43
+ - For RAG documents whose id starts with `rag_`: paraphrase the retrieved passage at the policy / agency level β€” talk about what the agency report SAYS about flood risk in general or for this asset class β€” do not assert findings the report did not make about this specific address. Cite with the doc_id.
44
+ - Stay neutral. No editorialising. No future speculation.
45
+ - If no documents are present, output exactly: No grounded data available for this address.
46
+
47
+ Microtopo interpretation hint:
48
+ - A LOW percentile (e.g. 5%) means the address is at a topographic LOW POINT in its surroundings β€” water tends to pool there. A HIGH percentile (e.g. 80%) means the address sits on relatively HIGH ground. Get this direction right or omit the percentile.
49
+ """
50
+
51
+
52
+ # ---- Hallucination guardrail: numeric grounding post-check -----------------
53
+
54
+ _NUM_RE = re.compile(r"-?\d[\d,]*(?:\.\d+)?")
55
+ _SENTENCE_END_RE = re.compile(r"(?<=[.!?])\s+(?=[A-Z\[])")
56
+ # Strings that are too generic to be useful as grounding evidence; ignore
57
+ # them when matching numeric tokens.
58
+ _TRIVIAL_NUMS = {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "100"}
59
+
60
+
61
+ def _normalize_num(s: str) -> set[str]:
62
+ """A numeric value can appear in a document with or without commas, with
63
+ or without trailing zeros. Return a small set of plausible string
64
+ representations to substring-search for."""
65
+ forms = {s}
66
+ no_comma = s.replace(",", "")
67
+ forms.add(no_comma)
68
+ if "." in no_comma:
69
+ forms.add(no_comma.rstrip("0").rstrip("."))
70
+ return {f for f in forms if f}
71
+
72
+
73
+ def _docs_corpus(doc_msgs: list[dict]) -> str:
74
+ """Join all document message contents (and their role suffixes β€” those
75
+ carry the doc_id, which is itself a number-free identifier) into one
76
+ big haystack we substring-search for numeric claims."""
77
+ return "\n".join(m.get("content", "") for m in doc_msgs)
78
+
79
+
80
+ def verify_paragraph(paragraph: str, doc_msgs: list[dict]) -> tuple[str, list[dict]]:
81
+ """Drop sentences whose numeric tokens don't appear in any source doc.
82
+
83
+ Returns (clean_paragraph, dropped_sentences_with_reason). Sentences are
84
+ split on sentence-end punctuation followed by whitespace + a capital
85
+ letter or '['. The bracketed-citation tags `[doc_id]` and document
86
+ roles in the source message list are excluded from the haystack so we
87
+ don't accidentally accept fabricated values that happen to be
88
+ substrings of doc_ids.
89
+ """
90
+ haystack = _docs_corpus(doc_msgs)
91
+
92
+ sentences = _SENTENCE_END_RE.split(paragraph.strip())
93
+ kept: list[str] = []
94
+ dropped: list[dict] = []
95
+
96
+ for sent in sentences:
97
+ sent_stripped = sent.strip()
98
+ if not sent_stripped:
99
+ continue
100
+ # remove citation tags before extracting numbers (they're not claims)
101
+ sent_no_cites = re.sub(r"\[[a-z0-9_]+\]", "", sent_stripped, flags=re.I)
102
+ nums = _NUM_RE.findall(sent_no_cites)
103
+ ungrounded = []
104
+ for n in nums:
105
+ if n in _TRIVIAL_NUMS:
106
+ continue
107
+ forms = _normalize_num(n)
108
+ if not any(f in haystack for f in forms):
109
+ ungrounded.append(n)
110
+
111
+ if ungrounded:
112
+ dropped.append({"sentence": sent_stripped, "ungrounded_numbers": ungrounded})
113
+ log.warning("dropped ungrounded sentence: %r (nums: %s)", sent_stripped, ungrounded)
114
+ continue
115
+ kept.append(sent_stripped)
116
+
117
+ cleaned = " ".join(kept).strip()
118
+ if not cleaned:
119
+ cleaned = "Could not produce a verifiable summary; see the data panels."
120
+ return cleaned, dropped
121
+
122
+
123
+ def _doc_message(doc_id: str, body_lines: list[str]) -> dict:
124
+ """One Granite-native document message. The doc_id rides on the role
125
+ suffix; Ollama's template uses it as the document title and lifts the
126
+ pair into the <documents> block."""
127
+ return {"role": f"document {doc_id}", "content": "\n".join(body_lines)}
128
+
129
+
130
+ def build_documents(state: dict[str, Any]) -> list[dict]:
131
+ """Build Granite-native document-role messages, gated so absent
132
+ specialists emit no document at all."""
133
+ docs: list[dict] = []
134
+
135
+ geo = state.get("geocode")
136
+ if geo:
137
+ body = [
138
+ f"Source: NYC DCP Geosearch (geosearch.planninglabs.nyc).",
139
+ f"Resolved address: {geo['address']}.",
140
+ f"Borough: {geo.get('borough') or 'unknown'}.",
141
+ f"Coordinates: {geo['lat']:.5f} N, {geo['lon']:.5f} W.",
142
+ ]
143
+ if geo.get("bbl"):
144
+ body.append(f"BBL (tax-lot id): {geo['bbl']}.")
145
+ docs.append(_doc_message("geocode", body))
146
+
147
+ # Gate: only emit the Sandy doc when the address is actually inside the
148
+ # 2012 extent. Granite has a strong training prior associating NYC + flood
149
+ # + Brooklyn with Sandy and will misread "outside" as "inside" if given
150
+ # the chance β€” silence-over-confabulation rules.
151
+ if state.get("sandy") is True:
152
+ body = [
153
+ "Source: NYC Sandy Inundation Zone (NYC OpenData 5xsi-dfpx, "
154
+ "empirical extent of areas flooded by Hurricane Sandy in 2012).",
155
+ "FACT: The address is LOCATED WITHIN this empirical 2012 inundation extent.",
156
+ "INTERPRETATION: Hurricane Sandy did flood this address (or this immediate parcel) on October 29-30, 2012. This is a historical fact, not a model prediction.",
157
+ "Do not state the opposite. The address is inside the Sandy inundation zone.",
158
+ ]
159
+ docs.append(_doc_message("sandy", body))
160
+
161
+ dep = state.get("dep")
162
+ if dep:
163
+ for scen, info in dep.items():
164
+ if info.get("depth_class", 0) > 0:
165
+ body = [
166
+ f"Source: {info['citation']}.",
167
+ f"Address inside scenario footprint: yes.",
168
+ f"Modeled depth class: {info['depth_label']}.",
169
+ ]
170
+ docs.append(_doc_message(scen, body))
171
+
172
+ fn = state.get("floodnet")
173
+ if fn and fn.get("n_sensors", 0) > 0:
174
+ body = [
175
+ "Source: FloodNet NYC ultrasonic depth sensor network (api.floodnet.nyc).",
176
+ f"Sensors within {fn['radius_m']} m: {fn['n_sensors']}.",
177
+ f"Sensors with labeled flood events in last 3 years: {fn['n_sensors_with_events']}.",
178
+ f"Total flood events at those sensors: {fn['n_flood_events_3y']}.",
179
+ ]
180
+ peak = fn.get("peak_event")
181
+ if peak and peak.get("max_depth_mm") is not None:
182
+ ts = (peak.get("start_time") or "")[:10]
183
+ body.append(
184
+ f"Peak event: {peak['max_depth_mm']} mm depth at sensor "
185
+ f"{peak['deployment_id']} starting {ts}."
186
+ )
187
+ docs.append(_doc_message("floodnet", body))
188
+
189
+ pw = state.get("prithvi_water")
190
+ if pw and pw.get("nearest_distance_m") is not None:
191
+ body = [
192
+ "Source: Prithvi-EO 2.0 (300M params, NASA/IBM, Apache-2.0). "
193
+ "Sen1Floods11 fine-tune for water/flood semantic segmentation, "
194
+ "run via TerraTorch on a real Hurricane Ida pre/post HLS Sentinel-2 "
195
+ f"pair: {pw['scene_id']} (dates: {pw['scene_date']}).",
196
+ "INTERPRETATION: the polygons are pixels classified as water in the "
197
+ "post-event scene (2021-09-02, ~12 h after Ida peak rainfall) but NOT "
198
+ "in the pre-event reference (2021-08-25). They are candidate "
199
+ "Ida-attributable surface inundation.",
200
+ f"Address sits inside an Ida-attributable inundation polygon: "
201
+ f"{'YES' if pw['inside_water_polygon'] else 'no'}.",
202
+ f"Distance to nearest Ida-attributable polygon: {pw['nearest_distance_m']} m.",
203
+ f"Distinct Ida-attributable polygons within 500 m: "
204
+ f"{pw['n_polygons_within_500m']}.",
205
+ "Honest scope: subway entrances and basement apartments β€” the dominant "
206
+ "Ida damage mode in NYC β€” are not visible to optical satellites. By the "
207
+ "Sep 2 16:02 UTC pass much pluvial street water had drained. The signal "
208
+ "primarily captures marsh/parkland ponding, riverside spillover, and "
209
+ "low-lying inundation that survived ~12 hours.",
210
+ ]
211
+ docs.append(_doc_message("prithvi_water", body))
212
+
213
+ ida = state.get("ida_hwm")
214
+ if ida and (ida.get("n_within_radius") or 0) > 0:
215
+ body = [
216
+ "Source: USGS STN Hurricane Ida 2021 high-water marks (Event 312, NY State).",
217
+ f"USGS HWMs within {ida['radius_m']} m: {ida['n_within_radius']}.",
218
+ ]
219
+ if ida.get("max_height_above_gnd_ft") is not None:
220
+ body.append(f"Max water height above ground: {ida['max_height_above_gnd_ft']} ft.")
221
+ if ida.get("max_elev_ft") is not None:
222
+ body.append(f"Max HWM elevation: {ida['max_elev_ft']} ft.")
223
+ if ida.get("nearest_dist_m") is not None:
224
+ body.append(f"Nearest HWM site: {ida['nearest_site']} ({ida['nearest_dist_m']} m away).")
225
+ docs.append(_doc_message("ida_hwm", body))
226
+
227
+ mt = state.get("microtopo")
228
+ if mt:
229
+ # Compute a categorical topographic position so Granite can't flip
230
+ # the directional reading of the percentile.
231
+ p200 = mt["rel_elev_pct_200m"]
232
+ if p200 < 25:
233
+ position = ("topographic LOW POINT β€” surface runoff in the "
234
+ "200 m neighbourhood routes toward this location")
235
+ elif p200 > 75:
236
+ position = ("RELATIVELY HIGH GROUND β€” most of the 200 m "
237
+ "neighbourhood is at lower elevation than this address")
238
+ else:
239
+ position = ("MID-SLOPE β€” neither a clear low point nor high ground")
240
+ body = [
241
+ "Source: USGS 3DEP 30 m DEM (LiDAR-derived) via py3dep, with TWI and HAND derived using whitebox-workflows hydrology toolkit.",
242
+ f"Point elevation at this address: {mt['point_elev_m']} m above sea level.",
243
+ f"Topographic position relative to surroundings: {position}.",
244
+ f"Fraction of cells within 200 m radius that are LOWER in elevation than this address: {mt['rel_elev_pct_200m']}%.",
245
+ f"Fraction of cells within 750 m radius that are LOWER in elevation than this address: {mt['rel_elev_pct_750m']}%.",
246
+ f"Basin relief (max elevation in 750 m AOI minus address elevation): {mt['basin_relief_m']} m.",
247
+ ]
248
+ if mt.get("hand_m") is not None:
249
+ hand_v = mt["hand_m"]
250
+ hand_interp = (
251
+ "very low (sub-meter) β€” the address sits at or near drainage level"
252
+ if hand_v < 1.0 else
253
+ "low (1-3 m) β€” the address is close to the local drainage line"
254
+ if hand_v < 3.0 else
255
+ "moderate (3-8 m) β€” typical urban-block elevation above drainage"
256
+ if hand_v < 8.0 else
257
+ "high (>8 m) β€” the address sits well above the local drainage network"
258
+ )
259
+ body.append(
260
+ f"Height Above Nearest Drainage (HAND): {hand_v} m. "
261
+ f"Interpretation: {hand_interp}. HAND is the standard hydrology "
262
+ f"index for vertical distance from a cell to the nearest channel; "
263
+ f"used by USGS, USACE, and InfoWorks ICM."
264
+ )
265
+ if mt.get("twi") is not None:
266
+ twi_v = mt["twi"]
267
+ twi_interp = (
268
+ "low β€” the cell sheds water; not saturation-prone"
269
+ if twi_v < 6 else
270
+ "moderate"
271
+ if twi_v < 10 else
272
+ "high β€” the cell tends to accumulate water"
273
+ if twi_v < 14 else
274
+ "very high β€” saturation-prone terrain"
275
+ )
276
+ body.append(
277
+ f"Topographic Wetness Index (TWI): {twi_v}. "
278
+ f"Interpretation: {twi_interp}. TWI = ln(specific catchment area / tan slope) "
279
+ f"is the TOPMODEL framework's saturation propensity metric."
280
+ )
281
+ docs.append(_doc_message("microtopo", body))
282
+
283
+ rag_hits = state.get("rag") or []
284
+ for h in rag_hits:
285
+ body = [
286
+ f"Source: {h['citation']}, page {h['page']}.",
287
+ f"Retrieved passage (verbatim): {h['text']}",
288
+ ]
289
+ docs.append(_doc_message(h["doc_id"], body))
290
+
291
+ nyc311 = state.get("nyc311")
292
+ if nyc311 and nyc311.get("n", 0) > 0:
293
+ body = [
294
+ "Source: NYC 311 service requests (Socrata erm2-nwe9, 2010-present).",
295
+ f"311 flood-related complaints within {nyc311['radius_m']} m, last {nyc311['years']} years: {nyc311['n']}.",
296
+ ]
297
+ if nyc311.get("by_descriptor"):
298
+ top = "; ".join(f"{k}: {v}" for k, v in nyc311["by_descriptor"].items())
299
+ body.append(f"Top descriptors and counts: {top}.")
300
+ if nyc311.get("by_year"):
301
+ yrs = ", ".join(f"{y}: {n}" for y, n in nyc311["by_year"].items())
302
+ body.append(f"Per-year counts: {yrs}.")
303
+ docs.append(_doc_message("nyc311", body))
304
+
305
+ return docs
306
+
307
+
308
+ def reconcile(state: dict[str, Any], model: str = OLLAMA_MODEL,
309
+ return_audit: bool = False):
310
+ """Run Granite reconciliation, then drop sentences with ungrounded numbers.
311
+
312
+ If return_audit=True, returns (paragraph, audit_dict) where audit_dict
313
+ has 'raw' (Granite's original output) and 'dropped' (list of dropped
314
+ sentences with their ungrounded numeric tokens).
315
+ """
316
+ doc_msgs = build_documents(state)
317
+ if not doc_msgs:
318
+ msg = "No grounded data available for this address."
319
+ return (msg, {"raw": msg, "dropped": []}) if return_audit else msg
320
+
321
+ messages = (
322
+ doc_msgs
323
+ + [
324
+ {"role": "system", "content": EXTRA_SYSTEM_PROMPT},
325
+ {"role": "user", "content": "Write the cited paragraph now."},
326
+ ]
327
+ )
328
+ resp = ollama.chat(
329
+ model=model,
330
+ messages=messages,
331
+ options={"temperature": 0, "num_ctx": 8192},
332
+ )
333
+ raw = resp["message"]["content"].strip()
334
+ cleaned, dropped = verify_paragraph(raw, doc_msgs)
335
+
336
+ if return_audit:
337
+ return cleaned, {"raw": raw, "dropped": dropped}
338
+ return cleaned
app/score.py ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Transparent exposure scoring rubric. Published, not a black box.
2
+
3
+ Each signal contributes a small integer; sum -> tier 1..4.
4
+ """
5
+ from __future__ import annotations
6
+
7
+ import pandas as pd
8
+
9
+ WEIGHTS = {
10
+ "sandy": 3, # empirical Sandy 2012 inundation
11
+ "dep_extreme_2080": 2, # pluvial scenario, 3.66 in/hr + 2080 SLR
12
+ "dep_moderate_2050": 2, # pluvial scenario, 2.13 in/hr + 2050 SLR
13
+ "dep_moderate_current": 1, # pluvial scenario, 2.13 in/hr current
14
+ "complaints_3plus": 1, # >=3 flood-related 311s within 200m, last 5 years
15
+ "floodnet_trigger": 1, # FloodNet sensor within 400m with >=1 trigger event
16
+ "policy_named": 1, # named in HMP/NPCC4/agency plan paragraph (RAG hit)
17
+ }
18
+
19
+
20
+ def tier(score: int) -> int:
21
+ if score >= 6:
22
+ return 1
23
+ if score >= 4:
24
+ return 2
25
+ if score >= 2:
26
+ return 3
27
+ if score >= 1:
28
+ return 4
29
+ return 0
30
+
31
+
32
+ def score_row(signals: dict) -> tuple[int, int]:
33
+ s = 0
34
+ for k, w in WEIGHTS.items():
35
+ if signals.get(k):
36
+ s += w
37
+ return s, tier(s)
38
+
39
+
40
+ def score_frame(df: pd.DataFrame) -> pd.DataFrame:
41
+ out = df.copy()
42
+ out["score"] = 0
43
+ for k, w in WEIGHTS.items():
44
+ if k in out.columns:
45
+ out["score"] += out[k].astype(bool).astype(int) * w
46
+ out["tier"] = out["score"].map(tier)
47
+ return out