mekosotto Claude Opus 4.7 (1M context) commited on
Commit
d05fcf1
·
1 Parent(s): 4a861ef

docs: Day-6 close-out — AGENTS §8 calibration + §9 demo features

Browse files

- §8 calibration sub-section: documents how train() computes
precision-at-confidence bins and how the API surfaces them.
- §9 Demo Features: edge-case dropdown, trust caption, MRI ComBat
diagnostics — the three jury-day amplifiers landed in Day 6.
- README: Day 6 row in status table (165 tests green), pointers to
new plan + new /pipeline/mri/diagnostics surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (2) hide show
  1. AGENTS.md +28 -0
  2. README.md +4 -0
AGENTS.md CHANGED
@@ -172,3 +172,31 @@ Parquet produces identical predictions.
172
 
173
  **Override `BBB_MODEL_PATH`** env var to point the API at a non-default
174
  artifact location (used by tests for tmp_path isolation).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
 
173
  **Override `BBB_MODEL_PATH`** env var to point the API at a non-default
174
  artifact location (used by tests for tmp_path isolation).
175
+
176
+ **Calibration metadata** (Day 6): `train()` does an 80/20 stratified split,
177
+ computes precision-at-confidence-threshold bins on the held-out test set,
178
+ and stashes them on `model._neurobridge_calibration: list[dict]` (sorted
179
+ ascending by threshold). The API includes the bin matching each
180
+ prediction's confidence in `BBBPredictResponse.calibration`. UI uses this
181
+ to render an honest trust caption ("≥75% confident → 92% precision, n=18").
182
+ For tiny test fixtures where stratified split fails, calibration falls
183
+ back to zero-support bins so the API contract is always populated.
184
+
185
+ ## 9. Demo Features (Day 6)
186
+
187
+ The frontend includes three jury-day demo amplifiers that don't change
188
+ the core contract:
189
+
190
+ - **Edge-case dropdown** (BBB tab): a curated catalog of 5 robustness
191
+ probes — invalid SMILES, empty input, OOD macrocycle (cyclosporine-like),
192
+ heavy halogenated aromatic. Each has a stated expectation; the UI
193
+ visualizes graceful failure (HTTP 400 → recoverable warning, never
194
+ a crash).
195
+ - **Calibration trust caption** (BBB decision card): renders the
196
+ precision-at-confidence-threshold from `BBBPredictResponse.calibration`.
197
+ Demonstrates that the system knows what it doesn't know.
198
+ - **MRI ComBat diagnostics** (MRI tab): `POST /pipeline/mri/diagnostics`
199
+ runs the pipeline twice (pre + post ComBat) and returns long-format
200
+ data + site-gap KPIs (Pre, Post, Reduction factor). The UI renders
201
+ a faceted altair density plot — visual proof that ComBat removes
202
+ site-driven domain shift.
README.md CHANGED
@@ -15,6 +15,7 @@ and Docker shipping.
15
  | 3 | Image (MRI / fMRI) | [`mri_pipeline.py`](src/pipelines/mri_pipeline.py) | Shipped — 106 tests green |
16
  | 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
17
  | 5 | Decision Layer (Model + XAI + Interactive UI) | [`bbb_model.py`](src/models/bbb_model.py) — RandomForest + SHAP + `POST /predict/bbb` | Shipped — 158 tests green |
 
18
 
19
  ## Quick Start
20
 
@@ -171,6 +172,7 @@ finishes in under 4 seconds on a 2024 laptop.
171
  - **Day 3 (shipped):** `mri_pipeline.py` — NIfTI volume loading, brain masking, ROI feature extraction, ComBat harmonization (`neuroHarmonize`) for site-level domain shift → Parquet (48 features, 106 tests green).
172
  - **Day 4 (shipped):** FastAPI surface in `src/api/` (POST `/pipeline/{bbb,eeg,mri}` + `/health`), MLflow experiment tracking via `src.core.tracking` (see AGENTS.md §7), Streamlit dashboard at `src/frontend/app.py`, and Docker / `docker-compose.yml` for the api + MLflow stack — 142 tests green.
173
  - **Day 5 (shipped):** Decision layer in `src/models/bbb_model.py` — RandomForest BBB classifier on Morgan fingerprints, SHAP top-k explanations, `POST /predict/bbb` endpoint, interactive Streamlit BBB tab with SMILES input + decision card + SHAP bar chart, and trainer CLI (`python -m src.models.bbb_model`). See AGENTS.md §8 — 158 tests green.
 
174
 
175
  ## Where to Look
176
 
@@ -190,3 +192,5 @@ finishes in under 4 seconds on a 2024 laptop.
190
  - **Day-4 tests:** [`tests/api/`](tests/api/), [`tests/frontend/`](tests/frontend/), [`tests/pipelines/test_cross_pipeline_smoke.py`](tests/pipelines/test_cross_pipeline_smoke.py)
191
  - **Day-5 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md`](docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md)
192
  - **BBB downstream model (classifier + SHAP explainer + trainer CLI):** [`src/models/bbb_model.py`](src/models/bbb_model.py) + [`tests/models/test_bbb_model.py`](tests/models/test_bbb_model.py) (12 tests)
 
 
 
15
  | 3 | Image (MRI / fMRI) | [`mri_pipeline.py`](src/pipelines/mri_pipeline.py) | Shipped — 106 tests green |
16
  | 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
17
  | 5 | Decision Layer (Model + XAI + Interactive UI) | [`bbb_model.py`](src/models/bbb_model.py) — RandomForest + SHAP + `POST /predict/bbb` | Shipped — 158 tests green |
18
+ | 6 | Final Polish & Demo Features (Edge cases + Calibration + ComBat viz) | Calibration metadata + edge-case probes + `POST /pipeline/mri/diagnostics` | Shipped — 165 tests green |
19
 
20
  ## Quick Start
21
 
 
172
  - **Day 3 (shipped):** `mri_pipeline.py` — NIfTI volume loading, brain masking, ROI feature extraction, ComBat harmonization (`neuroHarmonize`) for site-level domain shift → Parquet (48 features, 106 tests green).
173
  - **Day 4 (shipped):** FastAPI surface in `src/api/` (POST `/pipeline/{bbb,eeg,mri}` + `/health`), MLflow experiment tracking via `src.core.tracking` (see AGENTS.md §7), Streamlit dashboard at `src/frontend/app.py`, and Docker / `docker-compose.yml` for the api + MLflow stack — 142 tests green.
174
  - **Day 5 (shipped):** Decision layer in `src/models/bbb_model.py` — RandomForest BBB classifier on Morgan fingerprints, SHAP top-k explanations, `POST /predict/bbb` endpoint, interactive Streamlit BBB tab with SMILES input + decision card + SHAP bar chart, and trainer CLI (`python -m src.models.bbb_model`). See AGENTS.md §8 — 158 tests green.
175
+ - **Day 6 (shipped):** Final polish & demo features — calibration metadata bins on the BBB classifier (precision-at-confidence in `BBBPredictResponse.calibration`), edge-case dropdown in the Streamlit BBB tab (5 curated robustness probes), trust caption on the decision card, and `POST /pipeline/mri/diagnostics` returning Pre/Post ComBat long-format data + site-gap KPIs visualized as a faceted altair KDE in the MRI tab. See AGENTS.md §8 (calibration) + §9 (demo features) — 165 tests green.
176
 
177
  ## Where to Look
178
 
 
192
  - **Day-4 tests:** [`tests/api/`](tests/api/), [`tests/frontend/`](tests/frontend/), [`tests/pipelines/test_cross_pipeline_smoke.py`](tests/pipelines/test_cross_pipeline_smoke.py)
193
  - **Day-5 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md`](docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md)
194
  - **BBB downstream model (classifier + SHAP explainer + trainer CLI):** [`src/models/bbb_model.py`](src/models/bbb_model.py) + [`tests/models/test_bbb_model.py`](tests/models/test_bbb_model.py) (12 tests)
195
+ - **Day-6 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md`](docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md)
196
+ - **MRI ComBat diagnostics surface (pre/post site-gap KPIs):** `POST /pipeline/mri/diagnostics` — see [`src/api/routes.py`](src/api/routes.py) + [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py)