docs: Day-6 close-out — AGENTS §8 calibration + §9 demo features
Browse files- §8 calibration sub-section: documents how train() computes
precision-at-confidence bins and how the API surfaces them.
- §9 Demo Features: edge-case dropdown, trust caption, MRI ComBat
diagnostics — the three jury-day amplifiers landed in Day 6.
- README: Day 6 row in status table (165 tests green), pointers to
new plan + new /pipeline/mri/diagnostics surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AGENTS.md
CHANGED
|
@@ -172,3 +172,31 @@ Parquet produces identical predictions.
|
|
| 172 |
|
| 173 |
**Override `BBB_MODEL_PATH`** env var to point the API at a non-default
|
| 174 |
artifact location (used by tests for tmp_path isolation).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 172 |
|
| 173 |
**Override `BBB_MODEL_PATH`** env var to point the API at a non-default
|
| 174 |
artifact location (used by tests for tmp_path isolation).
|
| 175 |
+
|
| 176 |
+
**Calibration metadata** (Day 6): `train()` does an 80/20 stratified split,
|
| 177 |
+
computes precision-at-confidence-threshold bins on the held-out test set,
|
| 178 |
+
and stashes them on `model._neurobridge_calibration: list[dict]` (sorted
|
| 179 |
+
ascending by threshold). The API includes the bin matching each
|
| 180 |
+
prediction's confidence in `BBBPredictResponse.calibration`. UI uses this
|
| 181 |
+
to render an honest trust caption ("≥75% confident → 92% precision, n=18").
|
| 182 |
+
For tiny test fixtures where stratified split fails, calibration falls
|
| 183 |
+
back to zero-support bins so the API contract is always populated.
|
| 184 |
+
|
| 185 |
+
## 9. Demo Features (Day 6)
|
| 186 |
+
|
| 187 |
+
The frontend includes three jury-day demo amplifiers that don't change
|
| 188 |
+
the core contract:
|
| 189 |
+
|
| 190 |
+
- **Edge-case dropdown** (BBB tab): a curated catalog of 5 robustness
|
| 191 |
+
probes — invalid SMILES, empty input, OOD macrocycle (cyclosporine-like),
|
| 192 |
+
heavy halogenated aromatic. Each has a stated expectation; the UI
|
| 193 |
+
visualizes graceful failure (HTTP 400 → recoverable warning, never
|
| 194 |
+
a crash).
|
| 195 |
+
- **Calibration trust caption** (BBB decision card): renders the
|
| 196 |
+
precision-at-confidence-threshold from `BBBPredictResponse.calibration`.
|
| 197 |
+
Demonstrates that the system knows what it doesn't know.
|
| 198 |
+
- **MRI ComBat diagnostics** (MRI tab): `POST /pipeline/mri/diagnostics`
|
| 199 |
+
runs the pipeline twice (pre + post ComBat) and returns long-format
|
| 200 |
+
data + site-gap KPIs (Pre, Post, Reduction factor). The UI renders
|
| 201 |
+
a faceted altair density plot — visual proof that ComBat removes
|
| 202 |
+
site-driven domain shift.
|
README.md
CHANGED
|
@@ -15,6 +15,7 @@ and Docker shipping.
|
|
| 15 |
| 3 | Image (MRI / fMRI) | [`mri_pipeline.py`](src/pipelines/mri_pipeline.py) | Shipped — 106 tests green |
|
| 16 |
| 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
|
| 17 |
| 5 | Decision Layer (Model + XAI + Interactive UI) | [`bbb_model.py`](src/models/bbb_model.py) — RandomForest + SHAP + `POST /predict/bbb` | Shipped — 158 tests green |
|
|
|
|
| 18 |
|
| 19 |
## Quick Start
|
| 20 |
|
|
@@ -171,6 +172,7 @@ finishes in under 4 seconds on a 2024 laptop.
|
|
| 171 |
- **Day 3 (shipped):** `mri_pipeline.py` — NIfTI volume loading, brain masking, ROI feature extraction, ComBat harmonization (`neuroHarmonize`) for site-level domain shift → Parquet (48 features, 106 tests green).
|
| 172 |
- **Day 4 (shipped):** FastAPI surface in `src/api/` (POST `/pipeline/{bbb,eeg,mri}` + `/health`), MLflow experiment tracking via `src.core.tracking` (see AGENTS.md §7), Streamlit dashboard at `src/frontend/app.py`, and Docker / `docker-compose.yml` for the api + MLflow stack — 142 tests green.
|
| 173 |
- **Day 5 (shipped):** Decision layer in `src/models/bbb_model.py` — RandomForest BBB classifier on Morgan fingerprints, SHAP top-k explanations, `POST /predict/bbb` endpoint, interactive Streamlit BBB tab with SMILES input + decision card + SHAP bar chart, and trainer CLI (`python -m src.models.bbb_model`). See AGENTS.md §8 — 158 tests green.
|
|
|
|
| 174 |
|
| 175 |
## Where to Look
|
| 176 |
|
|
@@ -190,3 +192,5 @@ finishes in under 4 seconds on a 2024 laptop.
|
|
| 190 |
- **Day-4 tests:** [`tests/api/`](tests/api/), [`tests/frontend/`](tests/frontend/), [`tests/pipelines/test_cross_pipeline_smoke.py`](tests/pipelines/test_cross_pipeline_smoke.py)
|
| 191 |
- **Day-5 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md`](docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md)
|
| 192 |
- **BBB downstream model (classifier + SHAP explainer + trainer CLI):** [`src/models/bbb_model.py`](src/models/bbb_model.py) + [`tests/models/test_bbb_model.py`](tests/models/test_bbb_model.py) (12 tests)
|
|
|
|
|
|
|
|
|
| 15 |
| 3 | Image (MRI / fMRI) | [`mri_pipeline.py`](src/pipelines/mri_pipeline.py) | Shipped — 106 tests green |
|
| 16 |
| 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
|
| 17 |
| 5 | Decision Layer (Model + XAI + Interactive UI) | [`bbb_model.py`](src/models/bbb_model.py) — RandomForest + SHAP + `POST /predict/bbb` | Shipped — 158 tests green |
|
| 18 |
+
| 6 | Final Polish & Demo Features (Edge cases + Calibration + ComBat viz) | Calibration metadata + edge-case probes + `POST /pipeline/mri/diagnostics` | Shipped — 165 tests green |
|
| 19 |
|
| 20 |
## Quick Start
|
| 21 |
|
|
|
|
| 172 |
- **Day 3 (shipped):** `mri_pipeline.py` — NIfTI volume loading, brain masking, ROI feature extraction, ComBat harmonization (`neuroHarmonize`) for site-level domain shift → Parquet (48 features, 106 tests green).
|
| 173 |
- **Day 4 (shipped):** FastAPI surface in `src/api/` (POST `/pipeline/{bbb,eeg,mri}` + `/health`), MLflow experiment tracking via `src.core.tracking` (see AGENTS.md §7), Streamlit dashboard at `src/frontend/app.py`, and Docker / `docker-compose.yml` for the api + MLflow stack — 142 tests green.
|
| 174 |
- **Day 5 (shipped):** Decision layer in `src/models/bbb_model.py` — RandomForest BBB classifier on Morgan fingerprints, SHAP top-k explanations, `POST /predict/bbb` endpoint, interactive Streamlit BBB tab with SMILES input + decision card + SHAP bar chart, and trainer CLI (`python -m src.models.bbb_model`). See AGENTS.md §8 — 158 tests green.
|
| 175 |
+
- **Day 6 (shipped):** Final polish & demo features — calibration metadata bins on the BBB classifier (precision-at-confidence in `BBBPredictResponse.calibration`), edge-case dropdown in the Streamlit BBB tab (5 curated robustness probes), trust caption on the decision card, and `POST /pipeline/mri/diagnostics` returning Pre/Post ComBat long-format data + site-gap KPIs visualized as a faceted altair KDE in the MRI tab. See AGENTS.md §8 (calibration) + §9 (demo features) — 165 tests green.
|
| 176 |
|
| 177 |
## Where to Look
|
| 178 |
|
|
|
|
| 192 |
- **Day-4 tests:** [`tests/api/`](tests/api/), [`tests/frontend/`](tests/frontend/), [`tests/pipelines/test_cross_pipeline_smoke.py`](tests/pipelines/test_cross_pipeline_smoke.py)
|
| 193 |
- **Day-5 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md`](docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md)
|
| 194 |
- **BBB downstream model (classifier + SHAP explainer + trainer CLI):** [`src/models/bbb_model.py`](src/models/bbb_model.py) + [`tests/models/test_bbb_model.py`](tests/models/test_bbb_model.py) (12 tests)
|
| 195 |
+
- **Day-6 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md`](docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md)
|
| 196 |
+
- **MRI ComBat diagnostics surface (pre/post site-gap KPIs):** `POST /pipeline/mri/diagnostics` — see [`src/api/routes.py`](src/api/routes.py) + [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py)
|