mekosotto Claude Opus 4.7 (1M context) commited on
Commit
053cbbc
·
1 Parent(s): e8e922d

docs(readme): document MRI 2D, clinical RAG, EEG stub; bump test count to 330+2

Browse files

Adds Day-10 (fusion engine) and Day-11 (external assets integration) rows to
the status table, plus three new feature sections:

- 'MRI Deep-Learning Backends' — volumetric_onnx default vs resnet18_2d
(4-class Alzheimer's). Streamlit auto-adapts the form; switch via
MRI_MODEL_KIND env without restarting workers.
- 'Clinical Corpus (TF-IDF, Turkish + English)' — 14-PDF index with
TR->EN query expansion (egzersiz/beslenme/unutkanlik/...). Agent calls
retrieve_context(corpus="clinical"); CLI smoke at scripts/clinical_rag_smoke.py.
- 'EEG Pretrained Classifier (stub-able for demo)' — POST /predict/eeg loads
any sklearn predict_proba joblib. Default stub flows into fusion as the
eeg modality with zero code changes when the real artifact arrives.

Updates Quick Start and Executive Summary test counts to 330+2 and lists
the new env-var demo lifelines (MRI_MODEL_PATH_2D, EEG_CLF_ARTIFACT,
CLINICAL_RAG_INDEX_PATH).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +58 -2
README.md CHANGED
@@ -25,7 +25,7 @@ short_description: Living decision system for BBB, EEG, and MRI clinical ML
25
 
26
  **4.** Adapt-Over-Time is built in: each FastAPI worker keeps a rolling 100-prediction window; the trailing median is z-scored against the train-time confidence distribution and surfaced both in the API response and the UI ("trailing-100 confidence median is +1.42σ from training distribution — mild distribution shift").
27
 
28
- **5.** Current verification: 242 passed, 2 skipped. Demo lifelines (`NEUROBRIDGE_DISABLE_MLFLOW=1`, `NEUROBRIDGE_DISABLE_LLM=1`, `BBB_MODEL_PATH`, `MRI_MODEL_PATH`) keep the system usable when MLflow, OpenRouter, or model artifacts are unavailable.
29
 
30
  ## Status
31
 
@@ -40,6 +40,8 @@ short_description: Living decision system for BBB, EEG, and MRI clinical ML
40
  | 7 | Final 5% (Drift, Traceability & Agents) | Per-worker drift z-score + MLflow provenance badge + `POST /explain/bbb` (LLM + template fallback) + AI Assistant tab | Shipped |
41
  | 8 | Grand Finale (Multi-Modal Agents, Track 5 & Public Deploy) | Multi-modal explainers + experiments + deploy surface | Shipped |
42
  | 9 | Agent/RAG hardening + MRI DL decision layer | Guarded orchestration + `POST /predict/mri` ONNX surface | Shipped — 242 passed, 2 skipped |
 
 
43
 
44
  ### Fusion Engine
45
 
@@ -54,6 +56,60 @@ heuristic — adjust there. **BBB is intentionally NOT a fusion modality**:
54
  it is a researcher-side concern (drug permeability) and stays decoupled
55
  from disease classification.
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  ## Quick Start
58
 
59
  **Prerequisite:** Python 3.10–3.12. The pinned `requirements.txt` has no cp313+ wheels;
@@ -63,7 +119,7 @@ from disease classification.
63
  # 1. Create venv and install
64
  python3.12 -m venv .venv312 && source .venv312/bin/activate && pip install -r requirements.txt
65
 
66
- # 2. Verify — current full suite: 242 passed, 2 skipped
67
  pytest -v
68
 
69
  # 3. Smoke run with the bundled 6-row fixture
 
25
 
26
  **4.** Adapt-Over-Time is built in: each FastAPI worker keeps a rolling 100-prediction window; the trailing median is z-scored against the train-time confidence distribution and surfaced both in the API response and the UI ("trailing-100 confidence median is +1.42σ from training distribution — mild distribution shift").
27
 
28
+ **5.** Current verification: 330 passed, 2 skipped. Demo lifelines (`NEUROBRIDGE_DISABLE_MLFLOW=1`, `NEUROBRIDGE_DISABLE_LLM=1`, `BBB_MODEL_PATH`, `MRI_MODEL_PATH`, `MRI_MODEL_PATH_2D`, `EEG_CLF_ARTIFACT`, `CLINICAL_RAG_INDEX_PATH`) keep the system usable when MLflow, OpenRouter, or model artifacts are unavailable.
29
 
30
  ## Status
31
 
 
40
  | 7 | Final 5% (Drift, Traceability & Agents) | Per-worker drift z-score + MLflow provenance badge + `POST /explain/bbb` (LLM + template fallback) + AI Assistant tab | Shipped |
41
  | 8 | Grand Finale (Multi-Modal Agents, Track 5 & Public Deploy) | Multi-modal explainers + experiments + deploy surface | Shipped |
42
  | 9 | Agent/RAG hardening + MRI DL decision layer | Guarded orchestration + `POST /predict/mri` ONNX surface | Shipped — 242 passed, 2 skipped |
43
+ | 10 | Multi-modal fusion engine | `POST /fusion/predict` + `run_fusion` agent tool — MRI + EEG + clinical scores → per-disease confidence with attribution | Shipped — 295 passed, 1 skipped |
44
+ | 11 | External assets integration | 2D resnet18 MRI Alzheimer's path · TF-IDF clinical RAG with TR query expansion · stub-able EEG pretrained classifier | Shipped — 330 passed, 2 skipped |
45
 
46
  ### Fusion Engine
47
 
 
56
  it is a researcher-side concern (drug permeability) and stays decoupled
57
  from disease classification.
58
 
59
+ ### MRI Deep-Learning Backends
60
+
61
+ The MRI prediction route supports two backends, selected via env at request time:
62
+
63
+ - `MRI_MODEL_KIND=volumetric_onnx` (default). Loads an ONNX volumetric model
64
+ from `MRI_MODEL_PATH` (default `data/processed/mri_model.onnx`). Input:
65
+ `.nii` / `.nii.gz`. Two-class output by default (`control`, `abnormal`).
66
+ - `MRI_MODEL_KIND=resnet18_2d`. Loads a PyTorch state_dict from
67
+ `MRI_MODEL_PATH_2D` (default `data/processed/mri_dl_2d/best_model.pt`).
68
+ Input: 2D image (`.png` / `.jpg`). 4-class Alzheimer's classifier:
69
+ `MildDemented`, `ModerateDemented`, `NonDemented`, `VeryMildDemented`.
70
+ Trainer's BEST_PARAMS bake in: `image_size=160`, ImageNet normalisation,
71
+ resnet18 backbone with a 4-class head.
72
+
73
+ The Streamlit `Predict` tab auto-adapts its form to the active backend.
74
+ Switch backends without restarting workers — env is read on each request.
75
+
76
+ ### Clinical Corpus (TF-IDF, Turkish + English)
77
+
78
+ A second RAG index covers 14 peer-reviewed PDFs (Alzheimer's, Parkinson's,
79
+ lifestyle, nutrition, exercise) using TF-IDF + sklearn. Source PDFs at
80
+ `data/external_rag/clinical_pdfs/` (gitignored — copy from the team
81
+ shared drive); pre-built index at `data/external_rag/index/rag_index.pkl`.
82
+
83
+ Agent invocation:
84
+
85
+ ```python
86
+ retrieve_context(query="egzersiz Alzheimer feedback", corpus="clinical", k=5)
87
+ ```
88
+
89
+ Local CLI smoke:
90
+
91
+ ```bash
92
+ python scripts/clinical_rag_smoke.py "egzersiz Alzheimer feedback"
93
+ ```
94
+
95
+ The Turkish keywords `alzheimer`, `parkinson`, `egzersiz`, `beslenme`,
96
+ `tani`, `tedavi`, `risk`, `unutkanlik`, `titreme`, `demans` auto-expand
97
+ to English equivalents so Turkish queries hit English chunks.
98
+
99
+ ### EEG Pretrained Classifier (stub-able for demo)
100
+
101
+ `POST /predict/eeg` runs an sklearn-style classifier (any `predict_proba`
102
+ interface) on a feature vector and returns probability + attribution. The
103
+ artifact loads from `data/processed/eeg_clf.joblib` (override via
104
+ `EEG_CLF_ARTIFACT`). Default labels are `(control, alzheimers)` — override
105
+ via `EEG_CLF_LABELS=label0,label1,...`.
106
+
107
+ For the hackathon demo a synthetic stub
108
+ (`tests/fixtures/build_dummy_eeg_clf.py`) is acceptable — drop the real
109
+ `.joblib` at the artifact path to swap in production weights with **zero
110
+ code changes**. The fusion engine consumes this prediction as the `eeg`
111
+ modality automatically.
112
+
113
  ## Quick Start
114
 
115
  **Prerequisite:** Python 3.10–3.12. The pinned `requirements.txt` has no cp313+ wheels;
 
119
  # 1. Create venv and install
120
  python3.12 -m venv .venv312 && source .venv312/bin/activate && pip install -r requirements.txt
121
 
122
+ # 2. Verify — current full suite: 330 passed, 2 skipped
123
  pytest -v
124
 
125
  # 3. Smoke run with the bundled 6-row fixture