mekosotto Claude Opus 4.7 (1M context) commited on
Commit
3c2d45f
·
1 Parent(s): fc4e33b

docs: Day-7 close-out — AGENTS §10 drift + §11 LLM explainer + README recipe

Browse files

- AGENTS §10 documents the per-worker deque, train-stats stash, and
z-score formula. §11 documents the explainer's two-path contract,
env knobs (OPENROUTER_API_KEY, NEUROBRIDGE_DISABLE_LLM=1), and the
/explain/bbb endpoint shape.
- README adds Day 7 to the status table (175 tests green), pointers
to the Day-7 spec + plan + new surfaces, and a Demo Recipe section
with curl invocations for both endpoints (template-only and LLM).
- DoD-1 through DoD-5 all green: pytest 175, UserWarning gate clean,
Streamlit boot 200, predict body shape, explain template path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (2) hide show
  1. AGENTS.md +31 -0
  2. README.md +49 -0
AGENTS.md CHANGED
@@ -200,3 +200,34 @@ the core contract:
200
  data + site-gap KPIs (Pre, Post, Reduction factor). The UI renders
201
  a faceted altair density plot — visual proof that ComBat removes
202
  site-driven domain shift.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
200
  data + site-gap KPIs (Pre, Post, Reduction factor). The UI renders
201
  a faceted altair density plot — visual proof that ComBat removes
202
  site-driven domain shift.
203
+
204
+ ## 10. Drift Surface (Day 7)
205
+
206
+ Each predict route maintains a per-worker rolling window of recent
207
+ prediction confidences (`collections.deque(maxlen=100)`). Train-time
208
+ median + std are stashed on `model._neurobridge_train_stats` (joblib
209
+ roundtrip-safe). The drift z-score is `(rolling_median − train_median) /
210
+ max(train_std, 1e-9)`, computed only when the buffer holds ≥10 samples
211
+ AND the model has the train-stats attribute. The `/predict/bbb`
212
+ response carries `drift_z: float | None` and `rolling_n: int`. The UI
213
+ renders a one-line caption with a magnitude tag (in-band, mild,
214
+ significant). Worker restart clears the deque; this is acceptable for
215
+ demo and removes the audit-trail concern.
216
+
217
+ ## 11. LLM Explainer Surface (Day 7)
218
+
219
+ `src/llm/explainer.py` is the single entry point for natural-language
220
+ rationales. `explain(payload)` always returns `{rationale, source,
221
+ model}`. The deterministic template path is the source of truth for
222
+ tests; the LLM path is OpenRouter via the `openai==1.51.0` SDK using
223
+ `meta-llama/llama-3.2-3b-instruct:free`. Two env knobs control the
224
+ behavior:
225
+
226
+ - `OPENROUTER_API_KEY` — when absent, fallback to template.
227
+ - `NEUROBRIDGE_DISABLE_LLM=1` — hard kill-switch; force template even
228
+ if a key is set. Use this for demo days when you want fully
229
+ deterministic, reproducible rationales.
230
+
231
+ The `POST /explain/bbb` endpoint mirrors this contract. Pydantic
232
+ enforces a non-empty `top_features` list (422 on empty); every other
233
+ failure mode degrades to template + WARNING log + `source="template"`.
README.md CHANGED
@@ -16,6 +16,7 @@ and Docker shipping.
16
  | 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
17
  | 5 | Decision Layer (Model + XAI + Interactive UI) | [`bbb_model.py`](src/models/bbb_model.py) — RandomForest + SHAP + `POST /predict/bbb` | Shipped — 158 tests green |
18
  | 6 | Final Polish & Demo Features (Edge cases + Calibration + ComBat viz) | Calibration metadata + edge-case probes + `POST /pipeline/mri/diagnostics` | Shipped — 165 tests green |
 
19
 
20
  ## Quick Start
21
 
@@ -194,3 +195,51 @@ finishes in under 4 seconds on a 2024 laptop.
194
  - **BBB downstream model (classifier + SHAP explainer + trainer CLI):** [`src/models/bbb_model.py`](src/models/bbb_model.py) + [`tests/models/test_bbb_model.py`](tests/models/test_bbb_model.py) (12 tests)
195
  - **Day-6 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md`](docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md)
196
  - **MRI ComBat diagnostics surface (pre/post site-gap KPIs):** `POST /pipeline/mri/diagnostics` — see [`src/api/routes.py`](src/api/routes.py) + [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  | 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
17
  | 5 | Decision Layer (Model + XAI + Interactive UI) | [`bbb_model.py`](src/models/bbb_model.py) — RandomForest + SHAP + `POST /predict/bbb` | Shipped — 158 tests green |
18
  | 6 | Final Polish & Demo Features (Edge cases + Calibration + ComBat viz) | Calibration metadata + edge-case probes + `POST /pipeline/mri/diagnostics` | Shipped — 165 tests green |
19
+ | 7 | Final 5% (Drift, Traceability & Agents) | Per-worker drift z-score + MLflow provenance badge + `POST /explain/bbb` (LLM + template fallback) + AI Assistant tab | Shipped — 175 tests green |
20
 
21
  ## Quick Start
22
 
 
195
  - **BBB downstream model (classifier + SHAP explainer + trainer CLI):** [`src/models/bbb_model.py`](src/models/bbb_model.py) + [`tests/models/test_bbb_model.py`](tests/models/test_bbb_model.py) (12 tests)
196
  - **Day-6 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md`](docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md)
197
  - **MRI ComBat diagnostics surface (pre/post site-gap KPIs):** `POST /pipeline/mri/diagnostics` — see [`src/api/routes.py`](src/api/routes.py) + [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py)
198
+ - **Day-7 design spec:** [`docs/superpowers/specs/2026-05-05-day7-drift-traceability-agents-design.md`](docs/superpowers/specs/2026-05-05-day7-drift-traceability-agents-design.md)
199
+ - **Day-7 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-05-day7-drift-traceability-agents.md`](docs/superpowers/plans/2026-05-05-day7-drift-traceability-agents.md)
200
+ - **New surface:** `POST /explain/bbb` — natural-language rationale (LLM + deterministic fallback)
201
+ - **New surface:** `drift_z` / `rolling_n` / `provenance` fields in `POST /predict/bbb` response
202
+
203
+ ## Day 7 — Demo Recipe
204
+
205
+ Pre-flight (one terminal):
206
+
207
+ ```bash
208
+ # Start API with deterministic explainer (no LLM key needed)
209
+ NEUROBRIDGE_DISABLE_LLM=1 BBB_MODEL_PATH=data/processed/bbb_model.joblib \
210
+ uvicorn src.api.main:app --port 8000
211
+ ```
212
+
213
+ Predict + explain (other terminal):
214
+
215
+ ```bash
216
+ # 1) Predict — body now carries drift_z, rolling_n, provenance
217
+ curl -s -X POST http://localhost:8000/predict/bbb \
218
+ -H "Content-Type: application/json" \
219
+ -d '{"smiles": "CCO", "top_k": 5}' | jq
220
+
221
+ # 2) Explain — feed the predict response back as the explain payload
222
+ curl -s -X POST http://localhost:8000/explain/bbb \
223
+ -H "Content-Type: application/json" \
224
+ -d '{
225
+ "smiles": "CCO",
226
+ "label": 1,
227
+ "label_text": "permeable",
228
+ "confidence": 0.82,
229
+ "top_features": [
230
+ {"feature": "fp_341", "shap_value": 0.045},
231
+ {"feature": "fp_902", "shap_value": -0.031}
232
+ ],
233
+ "drift_z": 0.42,
234
+ "user_question": "Why permeable?"
235
+ }' | jq
236
+
237
+ # 3) Same call but with LLM enabled (set the key first)
238
+ unset NEUROBRIDGE_DISABLE_LLM
239
+ export OPENROUTER_API_KEY="sk-or-v1-…"
240
+ # Repeat the curl above; expect "source": "llm" and a model name.
241
+ ```
242
+
243
+ Streamlit demo: `streamlit run src/frontend/app.py` → BBB tab → Predict → AI Assistant tab → ask a preset question.
244
+
245
+ Drift demo: refresh the BBB tab and predict 10+ times in a row — the drift caption transitions from "warming up" to a numeric z-score.