docs: Day-7 close-out — AGENTS §10 drift + §11 LLM explainer + README recipe
Browse files- AGENTS §10 documents the per-worker deque, train-stats stash, and
z-score formula. §11 documents the explainer's two-path contract,
env knobs (OPENROUTER_API_KEY, NEUROBRIDGE_DISABLE_LLM=1), and the
/explain/bbb endpoint shape.
- README adds Day 7 to the status table (175 tests green), pointers
to the Day-7 spec + plan + new surfaces, and a Demo Recipe section
with curl invocations for both endpoints (template-only and LLM).
- DoD-1 through DoD-5 all green: pytest 175, UserWarning gate clean,
Streamlit boot 200, predict body shape, explain template path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AGENTS.md
CHANGED
|
@@ -200,3 +200,34 @@ the core contract:
|
|
| 200 |
data + site-gap KPIs (Pre, Post, Reduction factor). The UI renders
|
| 201 |
a faceted altair density plot — visual proof that ComBat removes
|
| 202 |
site-driven domain shift.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 200 |
data + site-gap KPIs (Pre, Post, Reduction factor). The UI renders
|
| 201 |
a faceted altair density plot — visual proof that ComBat removes
|
| 202 |
site-driven domain shift.
|
| 203 |
+
|
| 204 |
+
## 10. Drift Surface (Day 7)
|
| 205 |
+
|
| 206 |
+
Each predict route maintains a per-worker rolling window of recent
|
| 207 |
+
prediction confidences (`collections.deque(maxlen=100)`). Train-time
|
| 208 |
+
median + std are stashed on `model._neurobridge_train_stats` (joblib
|
| 209 |
+
roundtrip-safe). The drift z-score is `(rolling_median − train_median) /
|
| 210 |
+
max(train_std, 1e-9)`, computed only when the buffer holds ≥10 samples
|
| 211 |
+
AND the model has the train-stats attribute. The `/predict/bbb`
|
| 212 |
+
response carries `drift_z: float | None` and `rolling_n: int`. The UI
|
| 213 |
+
renders a one-line caption with a magnitude tag (in-band, mild,
|
| 214 |
+
significant). Worker restart clears the deque; this is acceptable for
|
| 215 |
+
demo and removes the audit-trail concern.
|
| 216 |
+
|
| 217 |
+
## 11. LLM Explainer Surface (Day 7)
|
| 218 |
+
|
| 219 |
+
`src/llm/explainer.py` is the single entry point for natural-language
|
| 220 |
+
rationales. `explain(payload)` always returns `{rationale, source,
|
| 221 |
+
model}`. The deterministic template path is the source of truth for
|
| 222 |
+
tests; the LLM path is OpenRouter via the `openai==1.51.0` SDK using
|
| 223 |
+
`meta-llama/llama-3.2-3b-instruct:free`. Two env knobs control the
|
| 224 |
+
behavior:
|
| 225 |
+
|
| 226 |
+
- `OPENROUTER_API_KEY` — when absent, fallback to template.
|
| 227 |
+
- `NEUROBRIDGE_DISABLE_LLM=1` — hard kill-switch; force template even
|
| 228 |
+
if a key is set. Use this for demo days when you want fully
|
| 229 |
+
deterministic, reproducible rationales.
|
| 230 |
+
|
| 231 |
+
The `POST /explain/bbb` endpoint mirrors this contract. Pydantic
|
| 232 |
+
enforces a non-empty `top_features` list (422 on empty); every other
|
| 233 |
+
failure mode degrades to template + WARNING log + `source="template"`.
|
README.md
CHANGED
|
@@ -16,6 +16,7 @@ and Docker shipping.
|
|
| 16 |
| 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
|
| 17 |
| 5 | Decision Layer (Model + XAI + Interactive UI) | [`bbb_model.py`](src/models/bbb_model.py) — RandomForest + SHAP + `POST /predict/bbb` | Shipped — 158 tests green |
|
| 18 |
| 6 | Final Polish & Demo Features (Edge cases + Calibration + ComBat viz) | Calibration metadata + edge-case probes + `POST /pipeline/mri/diagnostics` | Shipped — 165 tests green |
|
|
|
|
| 19 |
|
| 20 |
## Quick Start
|
| 21 |
|
|
@@ -194,3 +195,51 @@ finishes in under 4 seconds on a 2024 laptop.
|
|
| 194 |
- **BBB downstream model (classifier + SHAP explainer + trainer CLI):** [`src/models/bbb_model.py`](src/models/bbb_model.py) + [`tests/models/test_bbb_model.py`](tests/models/test_bbb_model.py) (12 tests)
|
| 195 |
- **Day-6 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md`](docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md)
|
| 196 |
- **MRI ComBat diagnostics surface (pre/post site-gap KPIs):** `POST /pipeline/mri/diagnostics` — see [`src/api/routes.py`](src/api/routes.py) + [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
| 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
|
| 17 |
| 5 | Decision Layer (Model + XAI + Interactive UI) | [`bbb_model.py`](src/models/bbb_model.py) — RandomForest + SHAP + `POST /predict/bbb` | Shipped — 158 tests green |
|
| 18 |
| 6 | Final Polish & Demo Features (Edge cases + Calibration + ComBat viz) | Calibration metadata + edge-case probes + `POST /pipeline/mri/diagnostics` | Shipped — 165 tests green |
|
| 19 |
+
| 7 | Final 5% (Drift, Traceability & Agents) | Per-worker drift z-score + MLflow provenance badge + `POST /explain/bbb` (LLM + template fallback) + AI Assistant tab | Shipped — 175 tests green |
|
| 20 |
|
| 21 |
## Quick Start
|
| 22 |
|
|
|
|
| 195 |
- **BBB downstream model (classifier + SHAP explainer + trainer CLI):** [`src/models/bbb_model.py`](src/models/bbb_model.py) + [`tests/models/test_bbb_model.py`](tests/models/test_bbb_model.py) (12 tests)
|
| 196 |
- **Day-6 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md`](docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md)
|
| 197 |
- **MRI ComBat diagnostics surface (pre/post site-gap KPIs):** `POST /pipeline/mri/diagnostics` — see [`src/api/routes.py`](src/api/routes.py) + [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py)
|
| 198 |
+
- **Day-7 design spec:** [`docs/superpowers/specs/2026-05-05-day7-drift-traceability-agents-design.md`](docs/superpowers/specs/2026-05-05-day7-drift-traceability-agents-design.md)
|
| 199 |
+
- **Day-7 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-05-day7-drift-traceability-agents.md`](docs/superpowers/plans/2026-05-05-day7-drift-traceability-agents.md)
|
| 200 |
+
- **New surface:** `POST /explain/bbb` — natural-language rationale (LLM + deterministic fallback)
|
| 201 |
+
- **New surface:** `drift_z` / `rolling_n` / `provenance` fields in `POST /predict/bbb` response
|
| 202 |
+
|
| 203 |
+
## Day 7 — Demo Recipe
|
| 204 |
+
|
| 205 |
+
Pre-flight (one terminal):
|
| 206 |
+
|
| 207 |
+
```bash
|
| 208 |
+
# Start API with deterministic explainer (no LLM key needed)
|
| 209 |
+
NEUROBRIDGE_DISABLE_LLM=1 BBB_MODEL_PATH=data/processed/bbb_model.joblib \
|
| 210 |
+
uvicorn src.api.main:app --port 8000
|
| 211 |
+
```
|
| 212 |
+
|
| 213 |
+
Predict + explain (other terminal):
|
| 214 |
+
|
| 215 |
+
```bash
|
| 216 |
+
# 1) Predict — body now carries drift_z, rolling_n, provenance
|
| 217 |
+
curl -s -X POST http://localhost:8000/predict/bbb \
|
| 218 |
+
-H "Content-Type: application/json" \
|
| 219 |
+
-d '{"smiles": "CCO", "top_k": 5}' | jq
|
| 220 |
+
|
| 221 |
+
# 2) Explain — feed the predict response back as the explain payload
|
| 222 |
+
curl -s -X POST http://localhost:8000/explain/bbb \
|
| 223 |
+
-H "Content-Type: application/json" \
|
| 224 |
+
-d '{
|
| 225 |
+
"smiles": "CCO",
|
| 226 |
+
"label": 1,
|
| 227 |
+
"label_text": "permeable",
|
| 228 |
+
"confidence": 0.82,
|
| 229 |
+
"top_features": [
|
| 230 |
+
{"feature": "fp_341", "shap_value": 0.045},
|
| 231 |
+
{"feature": "fp_902", "shap_value": -0.031}
|
| 232 |
+
],
|
| 233 |
+
"drift_z": 0.42,
|
| 234 |
+
"user_question": "Why permeable?"
|
| 235 |
+
}' | jq
|
| 236 |
+
|
| 237 |
+
# 3) Same call but with LLM enabled (set the key first)
|
| 238 |
+
unset NEUROBRIDGE_DISABLE_LLM
|
| 239 |
+
export OPENROUTER_API_KEY="sk-or-v1-…"
|
| 240 |
+
# Repeat the curl above; expect "source": "llm" and a model name.
|
| 241 |
+
```
|
| 242 |
+
|
| 243 |
+
Streamlit demo: `streamlit run src/frontend/app.py` → BBB tab → Predict → AI Assistant tab → ask a preset question.
|
| 244 |
+
|
| 245 |
+
Drift demo: refresh the BBB tab and predict 10+ times in a row — the drift caption transitions from "warming up" to a numeric z-score.
|