Spaces:

mekosotto
/

hackathon

Running

mekosotto Claude Opus 4.7 (1M context) commited on 7 days ago

Commit

3c2d45f

1 Parent(s): fc4e33b

docs: Day-7 close-out — AGENTS §10 drift + §11 LLM explainer + README recipe

- AGENTS §10 documents the per-worker deque, train-stats stash, and
z-score formula. §11 documents the explainer's two-path contract,
env knobs (OPENROUTER_API_KEY, NEUROBRIDGE_DISABLE_LLM=1), and the
/explain/bbb endpoint shape.
- README adds Day 7 to the status table (175 tests green), pointers
to the Day-7 spec + plan + new surfaces, and a Demo Recipe section
with curl invocations for both endpoints (template-only and LLM).
- DoD-1 through DoD-5 all green: pytest 175, UserWarning gate clean,
Streamlit boot 200, predict body shape, explain template path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (2) hide show

AGENTS.md +31 -0
README.md +49 -0

AGENTS.md CHANGED Viewed

@@ -200,3 +200,34 @@ the core contract:
   data + site-gap KPIs (Pre, Post, Reduction factor). The UI renders
   a faceted altair density plot — visual proof that ComBat removes
   site-driven domain shift.

   data + site-gap KPIs (Pre, Post, Reduction factor). The UI renders
   a faceted altair density plot — visual proof that ComBat removes
   site-driven domain shift.
+## 10. Drift Surface (Day 7)
+Each predict route maintains a per-worker rolling window of recent
+prediction confidences (`collections.deque(maxlen=100)`). Train-time
+median + std are stashed on `model._neurobridge_train_stats` (joblib
+roundtrip-safe). The drift z-score is `(rolling_median − train_median) /
+max(train_std, 1e-9)`, computed only when the buffer holds ≥10 samples
+AND the model has the train-stats attribute. The `/predict/bbb`
+response carries `drift_z: float | None` and `rolling_n: int`. The UI
+renders a one-line caption with a magnitude tag (in-band, mild,
+significant). Worker restart clears the deque; this is acceptable for
+demo and removes the audit-trail concern.
+## 11. LLM Explainer Surface (Day 7)
+`src/llm/explainer.py` is the single entry point for natural-language
+rationales. `explain(payload)` always returns `{rationale, source,
+model}`. The deterministic template path is the source of truth for
+tests; the LLM path is OpenRouter via the `openai==1.51.0` SDK using
+`meta-llama/llama-3.2-3b-instruct:free`. Two env knobs control the
+behavior:
+- `OPENROUTER_API_KEY` — when absent, fallback to template.
+- `NEUROBRIDGE_DISABLE_LLM=1` — hard kill-switch; force template even
+  if a key is set. Use this for demo days when you want fully
+  deterministic, reproducible rationales.
+The `POST /explain/bbb` endpoint mirrors this contract. Pydantic
+enforces a non-empty `top_features` list (422 on empty); every other
+failure mode degrades to template + WARNING log + `source="template"`.

README.md CHANGED Viewed

@@ -16,6 +16,7 @@ and Docker shipping.
 | 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
 | 5 | Decision Layer (Model + XAI + Interactive UI) | [`bbb_model.py`](src/models/bbb_model.py) — RandomForest + SHAP + `POST /predict/bbb` | Shipped — 158 tests green |
 | 6 | Final Polish & Demo Features (Edge cases + Calibration + ComBat viz) | Calibration metadata + edge-case probes + `POST /pipeline/mri/diagnostics` | Shipped — 165 tests green |
 ## Quick Start
@@ -194,3 +195,51 @@ finishes in under 4 seconds on a 2024 laptop.
 - **BBB downstream model (classifier + SHAP explainer + trainer CLI):** [`src/models/bbb_model.py`](src/models/bbb_model.py) + [`tests/models/test_bbb_model.py`](tests/models/test_bbb_model.py) (12 tests)
 - **Day-6 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md`](docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md)
 - **MRI ComBat diagnostics surface (pre/post site-gap KPIs):** `POST /pipeline/mri/diagnostics` — see [`src/api/routes.py`](src/api/routes.py) + [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py)

 | 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
 | 5 | Decision Layer (Model + XAI + Interactive UI) | [`bbb_model.py`](src/models/bbb_model.py) — RandomForest + SHAP + `POST /predict/bbb` | Shipped — 158 tests green |
 | 6 | Final Polish & Demo Features (Edge cases + Calibration + ComBat viz) | Calibration metadata + edge-case probes + `POST /pipeline/mri/diagnostics` | Shipped — 165 tests green |
+| 7 | Final 5% (Drift, Traceability & Agents) | Per-worker drift z-score + MLflow provenance badge + `POST /explain/bbb` (LLM + template fallback) + AI Assistant tab | Shipped — 175 tests green |
 ## Quick Start
 - **BBB downstream model (classifier + SHAP explainer + trainer CLI):** [`src/models/bbb_model.py`](src/models/bbb_model.py) + [`tests/models/test_bbb_model.py`](tests/models/test_bbb_model.py) (12 tests)
 - **Day-6 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md`](docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md)
 - **MRI ComBat diagnostics surface (pre/post site-gap KPIs):** `POST /pipeline/mri/diagnostics` — see [`src/api/routes.py`](src/api/routes.py) + [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py)
+- **Day-7 design spec:** [`docs/superpowers/specs/2026-05-05-day7-drift-traceability-agents-design.md`](docs/superpowers/specs/2026-05-05-day7-drift-traceability-agents-design.md)
+- **Day-7 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-05-day7-drift-traceability-agents.md`](docs/superpowers/plans/2026-05-05-day7-drift-traceability-agents.md)
+- **New surface:** `POST /explain/bbb` — natural-language rationale (LLM + deterministic fallback)
+- **New surface:** `drift_z` / `rolling_n` / `provenance` fields in `POST /predict/bbb` response
+## Day 7 — Demo Recipe
+Pre-flight (one terminal):
+```bash
+# Start API with deterministic explainer (no LLM key needed)
+NEUROBRIDGE_DISABLE_LLM=1 BBB_MODEL_PATH=data/processed/bbb_model.joblib \
+  uvicorn src.api.main:app --port 8000
+```
+Predict + explain (other terminal):
+```bash
+# 1) Predict — body now carries drift_z, rolling_n, provenance
+curl -s -X POST http://localhost:8000/predict/bbb \
+  -H "Content-Type: application/json" \
+  -d '{"smiles": "CCO", "top_k": 5}' | jq
+# 2) Explain — feed the predict response back as the explain payload
+curl -s -X POST http://localhost:8000/explain/bbb \
+  -H "Content-Type: application/json" \
+  -d '{
+    "smiles": "CCO",
+    "label": 1,
+    "label_text": "permeable",
+    "confidence": 0.82,
+    "top_features": [
+      {"feature": "fp_341", "shap_value": 0.045},
+      {"feature": "fp_902", "shap_value": -0.031}
+    ],
+    "drift_z": 0.42,
+    "user_question": "Why permeable?"
+  }' | jq
+# 3) Same call but with LLM enabled (set the key first)
+unset NEUROBRIDGE_DISABLE_LLM
+export OPENROUTER_API_KEY="sk-or-v1-…"
+# Repeat the curl above; expect "source": "llm" and a model name.
+```
+Streamlit demo: `streamlit run src/frontend/app.py` → BBB tab → Predict → AI Assistant tab → ask a preset question.
+Drift demo: refresh the BBB tab and predict 10+ times in a row — the drift caption transitions from "warming up" to a numeric z-score.