Spaces:

mekosotto
/

hackathon

Running

mekosotto Claude Opus 4.7 (1M context) commited on 7 days ago

Commit

d05fcf1

1 Parent(s): 4a861ef

docs: Day-6 close-out — AGENTS §8 calibration + §9 demo features

- §8 calibration sub-section: documents how train() computes
precision-at-confidence bins and how the API surfaces them.
- §9 Demo Features: edge-case dropdown, trust caption, MRI ComBat
diagnostics — the three jury-day amplifiers landed in Day 6.
- README: Day 6 row in status table (165 tests green), pointers to
new plan + new /pipeline/mri/diagnostics surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (2) hide show

AGENTS.md +28 -0
README.md +4 -0

AGENTS.md CHANGED Viewed

@@ -172,3 +172,31 @@ Parquet produces identical predictions.
 **Override `BBB_MODEL_PATH`** env var to point the API at a non-default
 artifact location (used by tests for tmp_path isolation).

 **Override `BBB_MODEL_PATH`** env var to point the API at a non-default
 artifact location (used by tests for tmp_path isolation).
+**Calibration metadata** (Day 6): `train()` does an 80/20 stratified split,
+computes precision-at-confidence-threshold bins on the held-out test set,
+and stashes them on `model._neurobridge_calibration: list[dict]` (sorted
+ascending by threshold). The API includes the bin matching each
+prediction's confidence in `BBBPredictResponse.calibration`. UI uses this
+to render an honest trust caption ("≥75% confident → 92% precision, n=18").
+For tiny test fixtures where stratified split fails, calibration falls
+back to zero-support bins so the API contract is always populated.
+## 9. Demo Features (Day 6)
+The frontend includes three jury-day demo amplifiers that don't change
+the core contract:
+- **Edge-case dropdown** (BBB tab): a curated catalog of 5 robustness
+  probes — invalid SMILES, empty input, OOD macrocycle (cyclosporine-like),
+  heavy halogenated aromatic. Each has a stated expectation; the UI
+  visualizes graceful failure (HTTP 400 → recoverable warning, never
+  a crash).
+- **Calibration trust caption** (BBB decision card): renders the
+  precision-at-confidence-threshold from `BBBPredictResponse.calibration`.
+  Demonstrates that the system knows what it doesn't know.
+- **MRI ComBat diagnostics** (MRI tab): `POST /pipeline/mri/diagnostics`
+  runs the pipeline twice (pre + post ComBat) and returns long-format
+  data + site-gap KPIs (Pre, Post, Reduction factor). The UI renders
+  a faceted altair density plot — visual proof that ComBat removes
+  site-driven domain shift.

README.md CHANGED Viewed

@@ -15,6 +15,7 @@ and Docker shipping.
 | 3 | Image (MRI / fMRI) | [`mri_pipeline.py`](src/pipelines/mri_pipeline.py) | Shipped — 106 tests green |
 | 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
 | 5 | Decision Layer (Model + XAI + Interactive UI) | [`bbb_model.py`](src/models/bbb_model.py) — RandomForest + SHAP + `POST /predict/bbb` | Shipped — 158 tests green |
 ## Quick Start
@@ -171,6 +172,7 @@ finishes in under 4 seconds on a 2024 laptop.
 - **Day 3 (shipped):** `mri_pipeline.py` — NIfTI volume loading, brain masking, ROI feature extraction, ComBat harmonization (`neuroHarmonize`) for site-level domain shift → Parquet (48 features, 106 tests green).
 - **Day 4 (shipped):** FastAPI surface in `src/api/` (POST `/pipeline/{bbb,eeg,mri}` + `/health`), MLflow experiment tracking via `src.core.tracking` (see AGENTS.md §7), Streamlit dashboard at `src/frontend/app.py`, and Docker / `docker-compose.yml` for the api + MLflow stack — 142 tests green.
 - **Day 5 (shipped):** Decision layer in `src/models/bbb_model.py` — RandomForest BBB classifier on Morgan fingerprints, SHAP top-k explanations, `POST /predict/bbb` endpoint, interactive Streamlit BBB tab with SMILES input + decision card + SHAP bar chart, and trainer CLI (`python -m src.models.bbb_model`). See AGENTS.md §8 — 158 tests green.
 ## Where to Look
@@ -190,3 +192,5 @@ finishes in under 4 seconds on a 2024 laptop.
 - **Day-4 tests:** [`tests/api/`](tests/api/), [`tests/frontend/`](tests/frontend/), [`tests/pipelines/test_cross_pipeline_smoke.py`](tests/pipelines/test_cross_pipeline_smoke.py)
 - **Day-5 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md`](docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md)
 - **BBB downstream model (classifier + SHAP explainer + trainer CLI):** [`src/models/bbb_model.py`](src/models/bbb_model.py) + [`tests/models/test_bbb_model.py`](tests/models/test_bbb_model.py) (12 tests)

 | 3 | Image (MRI / fMRI) | [`mri_pipeline.py`](src/pipelines/mri_pipeline.py) | Shipped — 106 tests green |
 | 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
 | 5 | Decision Layer (Model + XAI + Interactive UI) | [`bbb_model.py`](src/models/bbb_model.py) — RandomForest + SHAP + `POST /predict/bbb` | Shipped — 158 tests green |
+| 6 | Final Polish & Demo Features (Edge cases + Calibration + ComBat viz) | Calibration metadata + edge-case probes + `POST /pipeline/mri/diagnostics` | Shipped — 165 tests green |
 ## Quick Start
 - **Day 3 (shipped):** `mri_pipeline.py` — NIfTI volume loading, brain masking, ROI feature extraction, ComBat harmonization (`neuroHarmonize`) for site-level domain shift → Parquet (48 features, 106 tests green).
 - **Day 4 (shipped):** FastAPI surface in `src/api/` (POST `/pipeline/{bbb,eeg,mri}` + `/health`), MLflow experiment tracking via `src.core.tracking` (see AGENTS.md §7), Streamlit dashboard at `src/frontend/app.py`, and Docker / `docker-compose.yml` for the api + MLflow stack — 142 tests green.
 - **Day 5 (shipped):** Decision layer in `src/models/bbb_model.py` — RandomForest BBB classifier on Morgan fingerprints, SHAP top-k explanations, `POST /predict/bbb` endpoint, interactive Streamlit BBB tab with SMILES input + decision card + SHAP bar chart, and trainer CLI (`python -m src.models.bbb_model`). See AGENTS.md §8 — 158 tests green.
+- **Day 6 (shipped):** Final polish & demo features — calibration metadata bins on the BBB classifier (precision-at-confidence in `BBBPredictResponse.calibration`), edge-case dropdown in the Streamlit BBB tab (5 curated robustness probes), trust caption on the decision card, and `POST /pipeline/mri/diagnostics` returning Pre/Post ComBat long-format data + site-gap KPIs visualized as a faceted altair KDE in the MRI tab. See AGENTS.md §8 (calibration) + §9 (demo features) — 165 tests green.
 ## Where to Look
 - **Day-4 tests:** [`tests/api/`](tests/api/), [`tests/frontend/`](tests/frontend/), [`tests/pipelines/test_cross_pipeline_smoke.py`](tests/pipelines/test_cross_pipeline_smoke.py)
 - **Day-5 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md`](docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md)
 - **BBB downstream model (classifier + SHAP explainer + trainer CLI):** [`src/models/bbb_model.py`](src/models/bbb_model.py) + [`tests/models/test_bbb_model.py`](tests/models/test_bbb_model.py) (12 tests)
+- **Day-6 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md`](docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md)
+- **MRI ComBat diagnostics surface (pre/post site-gap KPIs):** `POST /pipeline/mri/diagnostics` — see [`src/api/routes.py`](src/api/routes.py) + [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py)