Spaces:

mekosotto
/

hackathon

Running

mekosotto Claude Opus 4.7 (1M context) commited on 5 days ago

Commit

8ec95a0

1 Parent(s): 270f76f

docs(plan): swap OASIS branch for EEG stub plan; soften MRI prerequisite

User clarified intent:
1. MRI training is final — best_model.pt is the artifact, just load+predict.
Removed the ambiguous 'state_dict vs full model' blocker framing; tests
already use a synthetic dummy resnet18 so TDD works before the real .pt
lands. Real-artifact sanity test auto-skips when absent.
2. EEG pretrained model assumed present for the hackathon demo. Replaced
the OASIS-tabular fork with a stub-able EEG plan: src/models/eeg_model.py
loads any sklearn predict_proba classifier from joblib, with a synthetic
stub fixture so tests pass today. Real artifact swaps in at
data/processed/eeg_clf.joblib with zero code changes.

Roadmap, MRI plan, and the new EEG plan all preserve independence
guarantees and align on the same loader+stub pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (4) hide show

docs/superpowers/plans/2026-05-02-eeg-stub-integration.md +542 -0
docs/superpowers/plans/2026-05-02-external-assets-integration-roadmap.md +11 -18
docs/superpowers/plans/2026-05-02-mri-dl-2d-integration.md +4 -9
docs/superpowers/plans/2026-05-02-oasis-tabular-fusion-integration.md +0 -641

docs/superpowers/plans/2026-05-02-eeg-stub-integration.md ADDED Viewed

	@@ -0,0 +1,542 @@

+# EEG Pretrained Classifier — Stub Integration Plan
+> **For agentic workers:** REQUIRED SUB-SKILL: `superpowers:subagent-driven-development`. TDD throughout.
+**Goal.** Add an EEG classifier to the decision layer that flows into the fusion engine as the `eeg` modality. The real pretrained artifact will arrive later; for the hackathon demo we ship a stub-able contract so the entire flow (Streamlit → API → fusion) works **today**, and swapping in the real `.joblib` later is a one-file drop with **zero** code changes.
+**Architecture.** New module `src/models/eeg_model.py` parallel to `src/models/mri_dl_2d.py`. Loads a sklearn-style classifier from `joblib`, runs `predict_proba` on a feature row produced by the existing `src/pipelines/eeg_pipeline.py` (which already extracts band-power features). Output dict shape mirrors the other model surfaces, so the API and fusion engine consume it without dispatch logic. A new route `POST /predict/eeg` exposes it. The fusion engine already accepts an `eeg` `ModalityPrediction` — no fusion code changes.
+**Contract for the eventual real artifact.**
+```
+- Path: data/processed/eeg_clf.joblib (override via EEG_CLF_ARTIFACT env)
+- Type: any object with sklearn's predict_proba interface (e.g. RandomForest,
+        SVC with probability=True, MLPClassifier, or a thin wrapper around
+        a torch model)
+- Input: numpy array of shape (1, n_features) where n_features matches the
+        column count of eeg_pipeline.py's parquet output
+- Output: probability vector of length len(EEG_CLF_LABELS); default labels
+        are ("control", "alzheimers")
+```
+The stub fixture (`tests/fixtures/build_dummy_eeg_clf.py`) writes a `RandomForestClassifier` with the same interface, so the entire pipeline is testable before the real model arrives.
+**Tech stack.** scikit-learn (already in deps), joblib, numpy, pandas. No new dependencies.
+---
+## Asset note
+For the demo we **assume the real EEG model exists**. Tests use the stub fixture so they pass regardless. When the real artifact arrives:
+1. Save it to `data/processed/eeg_clf.joblib`.
+2. If its label order isn't `("control", "alzheimers")`, set `EEG_CLF_LABELS=label0,label1,...` env (comma-separated). The fusion engine's `signal_for_disease` already case-insensitively matches labels, so as long as one of them is `"alzheimers"` (or `"parkinsons"`), it flows.
+3. If `n_features` doesn't match the pipeline's parquet output, update the EEG pipeline's feature contract — out of scope for this plan, separate sub-plan if needed.
+---
+## File structure
+| Path | Responsibility |
+|---|---|
+| Create `src/models/eeg_model.py` | sklearn-style classifier loader + `predict_features()` |
+| Modify `src/api/routes.py` | new route `POST /predict/eeg` |
+| Modify `src/api/schemas.py` | `EEGPredictRequest` / `EEGPredictResponse` |
+| Create `tests/fixtures/build_dummy_eeg_clf.py` | stub joblib-pickled RF for tests |
+| Create `tests/models/test_eeg_model.py` | unit tests for loader + predict |
+| Create `tests/api/test_eeg_predict_route.py` | integration test through `POST /predict/eeg` |
+| Create `tests/fusion/test_eeg_modality_flow.py` | confirms an EEG prediction flows into fusion as the `eeg` modality |
+| Create `tests/models/test_eeg_model_real.py` | real-artifact sanity (skips when absent — same pattern as MRI Task 4) |
+| Modify `README.md` | document the contract + how to swap the real artifact in |
+---
+## Tasks
+### Task 1: EEG model module + dummy fixture
+**Files:**
+- Create: `src/models/eeg_model.py`
+- Create: `tests/fixtures/build_dummy_eeg_clf.py`
+- Create: `tests/models/test_eeg_model.py`
+- [ ] **Step 1: Dummy fixture.**
+`tests/fixtures/build_dummy_eeg_clf.py`:
+```python
+"""Build a stub EEG classifier (sklearn RF) for tests.
+Demo-time placeholder — produces a 2-class probability output matching the
+eeg_model.predict_features contract. Replace with the real artifact when
+the user provides it; tests don't change.
+"""
+from __future__ import annotations
+from pathlib import Path
+import joblib
+import numpy as np
+from sklearn.ensemble import RandomForestClassifier
+def build(path: Path, n_features: int = 16, seed: int = 0) -> Path:
+    """Save a fitted RandomForestClassifier at `path` and return the path."""
+    path = Path(path)
+    if path.exists():
+        return path
+    path.parent.mkdir(parents=True, exist_ok=True)
+    rng = np.random.default_rng(seed)
+    n = 200
+    n_alz = n // 2
+    # Synthetic separable features: alzheimers half has higher mean.
+    X_ctrl = rng.normal(0.0, 1.0, size=(n - n_alz, n_features))
+    X_alz  = rng.normal(2.0, 1.0, size=(n_alz, n_features))
+    X = np.vstack([X_ctrl, X_alz])
+    y = np.array([0] * (n - n_alz) + [1] * n_alz)
+    clf = RandomForestClassifier(n_estimators=12, max_depth=6, random_state=seed)
+    clf.fit(X, y)
+    joblib.dump(clf, str(path))
+    return path
+```
+- [ ] **Step 2: Failing test.**
+`tests/models/test_eeg_model.py`:
+```python
+"""Tests for src.models.eeg_model."""
+from __future__ import annotations
+from pathlib import Path
+import numpy as np
+import pytest
+from src.models import eeg_model
+from tests.fixtures.build_dummy_eeg_clf import build as build_dummy_eeg
+class TestEEGModel:
+    def test_load_missing_artifact_raises(self, tmp_path: Path) -> None:
+        with pytest.raises(FileNotFoundError, match="EEG classifier artifact not found"):
+            eeg_model.load(tmp_path / "nope.joblib")
+    def test_predict_returns_full_dict(self, tmp_path: Path) -> None:
+        ckpt = build_dummy_eeg(tmp_path / "eeg.joblib", n_features=16)
+        clf = eeg_model.load(ckpt)
+        features = np.zeros((16,), dtype=np.float32)
+        out = eeg_model.predict_features(clf, features)
+        assert set(out) == {"label", "label_text", "confidence", "probabilities"}
+        assert out["label"] in {0, 1}
+        assert out["label_text"] in eeg_model.DEFAULT_LABELS
+        assert 0.0 <= out["confidence"] <= 1.0
+        probs = out["probabilities"]
+        assert len(probs) == 2
+        assert abs(sum(p["probability"] for p in probs) - 1.0) < 1e-5
+    def test_alzheimers_separation_with_synthetic_features(self, tmp_path: Path) -> None:
+        # Synthetic stub clusters alzheimers around mean=2.0, control around 0.0.
+        ckpt = build_dummy_eeg(tmp_path / "eeg.joblib", n_features=16)
+        clf = eeg_model.load(ckpt)
+        alz_features = np.full((16,), 2.0, dtype=np.float32)
+        ctrl_features = np.zeros((16,), dtype=np.float32)
+        alz_pred = eeg_model.predict_features(clf, alz_features)
+        ctrl_pred = eeg_model.predict_features(clf, ctrl_features)
+        assert alz_pred["label_text"] == "alzheimers"
+        assert ctrl_pred["label_text"] == "control"
+    def test_label_override_via_env(self, tmp_path: Path, monkeypatch) -> None:
+        monkeypatch.setenv("EEG_CLF_LABELS", "no_disease,alzheimers")
+        ckpt = build_dummy_eeg(tmp_path / "eeg.joblib", n_features=16)
+        clf = eeg_model.load(ckpt)
+        out = eeg_model.predict_features(clf, np.zeros((16,), dtype=np.float32))
+        assert out["label_text"] in {"no_disease", "alzheimers"}
+    def test_feature_count_mismatch_raises(self, tmp_path: Path) -> None:
+        ckpt = build_dummy_eeg(tmp_path / "eeg.joblib", n_features=16)
+        clf = eeg_model.load(ckpt)
+        with pytest.raises(ValueError, match="feature count"):
+            eeg_model.predict_features(clf, np.zeros((8,), dtype=np.float32))
+```
+Run → `ModuleNotFoundError: No module named 'src.models.eeg_model'`.
+- [ ] **Step 3: Minimal impl.**
+`src/models/eeg_model.py`:
+```python
+"""EEG classifier inference utilities.
+Loads any sklearn-style classifier (object with `predict_proba`) from joblib
+and emits the same dict shape as src.models.mri_model.predict_with_proba so
+the API surface and fusion engine treat MRI and EEG predictions identically.
+The real pretrained artifact swaps in at data/processed/eeg_clf.joblib (or
+override via EEG_CLF_ARTIFACT env). Tests use a stub fixture; the real model
+drops in without code changes.
+"""
+from __future__ import annotations
+import os
+from pathlib import Path
+from typing import Any, Sequence
+import joblib
+import numpy as np
+from src.core.logger import get_logger
+logger = get_logger(__name__)
+DEFAULT_LABELS: tuple[str, ...] = ("control", "alzheimers")
+def _resolve_labels() -> tuple[str, ...]:
+    raw = os.environ.get("EEG_CLF_LABELS")
+    if not raw:
+        return DEFAULT_LABELS
+    return tuple(s.strip() for s in raw.split(",") if s.strip())
+def load(path: Path) -> Any:
+    path = Path(path)
+    if not path.exists():
+        raise FileNotFoundError(f"EEG classifier artifact not found: {path}")
+    return joblib.load(str(path))
+def predict_features(
+    model: Any,
+    features: np.ndarray,
+    labels: Sequence[str] | None = None,
+) -> dict[str, Any]:
+    """Run inference on one row of EEG features.
+    Args:
+        model: sklearn-style classifier (must expose `predict_proba`).
+        features: 1-D numpy array of shape (n_features,) matching the
+            classifier's training-time feature count.
+        labels: optional label tuple. Defaults to env-derived or ("control",
+            "alzheimers").
+    """
+    arr = np.asarray(features, dtype=np.float32).reshape(-1)
+    expected = int(getattr(model, "n_features_in_", arr.size))
+    if arr.size != expected:
+        raise ValueError(
+            f"EEG feature count mismatch: model expects {expected}, got {arr.size}"
+        )
+    proba = np.asarray(model.predict_proba(arr.reshape(1, -1))[0], dtype=np.float32)
+    label_names = tuple(labels or _resolve_labels())
+    if len(label_names) != proba.shape[0]:
+        logger.warning(
+            "EEG label count (%d) != model output dim (%d); falling back to class_0..N",
+            len(label_names), proba.shape[0],
+        )
+        label_names = tuple(f"class_{i}" for i in range(proba.shape[0]))
+    label_idx = int(np.argmax(proba))
+    return {
+        "label": label_idx,
+        "label_text": label_names[label_idx],
+        "confidence": float(proba[label_idx]),
+        "probabilities": [
+            {"label": i, "label_text": label_names[i], "probability": float(p)}
+            for i, p in enumerate(proba)
+        ],
+    }
+```
+Run tests → expect 5 passed.
+- [ ] **Step 4:** `pytest -q` no regressions.
+- [ ] **Step 5:** commit:
+```bash
+git add src/models/eeg_model.py tests/fixtures/build_dummy_eeg_clf.py tests/models/test_eeg_model.py
+git commit -m "feat(models): EEG classifier loader + predict (stub-able for hackathon demo)"
+```
+---
+### Task 2: `POST /predict/eeg` route
+**Files:**
+- Modify: `src/api/schemas.py` (add `EEGPredictRequest` / `EEGPredictResponse`)
+- Modify: `src/api/routes.py`
+- Create: `tests/api/test_eeg_predict_route.py`
+- [ ] **Step 1: Failing test.**
+`tests/api/test_eeg_predict_route.py`:
+```python
+"""Integration: POST /predict/eeg."""
+from __future__ import annotations
+from pathlib import Path
+import pytest
+from fastapi.testclient import TestClient
+from src.api.main import app
+from tests.fixtures.build_dummy_eeg_clf import build as build_dummy_eeg
+@pytest.fixture()
+def client(monkeypatch, tmp_path):
+    artifact = build_dummy_eeg(tmp_path / "eeg.joblib", n_features=16)
+    monkeypatch.setenv("EEG_CLF_ARTIFACT", str(artifact))
+    return TestClient(app)
+def test_predict_eeg_happy_path(client):
+    body = {"features": [0.0] * 16}
+    r = client.post("/predict/eeg", json=body)
+    assert r.status_code == 200, r.text
+    data = r.json()
+    assert data["label_text"] in {"control", "alzheimers"}
+    assert 0.0 <= data["confidence"] <= 1.0
+    assert len(data["probabilities"]) == 2
+def test_predict_eeg_alzheimers_profile(client):
+    body = {"features": [2.0] * 16}
+    r = client.post("/predict/eeg", json=body)
+    assert r.status_code == 200, r.text
+    data = r.json()
+    assert data["label_text"] == "alzheimers"
+def test_predict_eeg_feature_mismatch_returns_500(client):
+    # Stub was trained on 16 features; sending 8 must surface as a 500 (or 400).
+    body = {"features": [0.0] * 8}
+    r = client.post("/predict/eeg", json=body)
+    assert r.status_code in {400, 500}
+```
+- [ ] **Step 2: Schemas.**
+In `src/api/schemas.py`, append (before the fusion re-export block):
+```python
+class EEGPredictRequest(BaseModel):
+    features: list[float] = Field(
+        ..., min_length=1,
+        description="EEG features matching the classifier's training-time feature count.",
+    )
+class EEGClassProbability(BaseModel):
+    label: int
+    label_text: str
+    probability: float
+class EEGPredictResponse(BaseModel):
+    label: int
+    label_text: str
+    confidence: float
+    probabilities: list[EEGClassProbability]
+```
+- [ ] **Step 3: Route.**
+In `src/api/routes.py`, add near the existing predict routes:
+```python
+@predict_router.post("/eeg", response_model=EEGPredictResponse)
+def predict_eeg(req: EEGPredictRequest) -> EEGPredictResponse:
+    import os
+    from pathlib import Path
+    import numpy as np
+    from src.models import eeg_model
+    artifact = Path(os.environ.get("EEG_CLF_ARTIFACT", "data/processed/eeg_clf.joblib"))
+    clf = eeg_model.load(artifact)
+    features = np.asarray(req.features, dtype=np.float32)
+    out = eeg_model.predict_features(clf, features)
+    return EEGPredictResponse(**out)
+```
+Add `EEGPredictRequest`, `EEGPredictResponse` to the schema imports at the top of `routes.py`.
+- [ ] **Step 4:** `pytest tests/api/test_eeg_predict_route.py -v` → 3 passed.
+- [ ] **Step 5:** commit: `feat(api): add POST /predict/eeg route (stub-able for demo)`.
+---
+### Task 3: End-to-end fusion flow with EEG
+**Files:**
+- Create: `tests/fusion/test_eeg_modality_flow.py`
+This task validates that an EEG prediction (via `predict_features` or via `/predict/eeg`) plugs into the fusion engine's `eeg` modality without any code change in the engine.
+- [ ] **Step 1: Test.**
+`tests/fusion/test_eeg_modality_flow.py`:
+```python
+"""End-to-end: EEG classifier output flows into fusion as the `eeg` modality."""
+from __future__ import annotations
+from pathlib import Path
+import numpy as np
+import pytest
+from src.fusion import engine
+from src.fusion.types import (
+    FusionInput,
+    ModalityClassProb,
+    ModalityPrediction,
+)
+from src.models import eeg_model
+from tests.fixtures.build_dummy_eeg_clf import build as build_dummy_eeg
+def _eeg_pred_from_features(model, features: np.ndarray) -> ModalityPrediction:
+    raw = eeg_model.predict_features(model, features)
+    return ModalityPrediction(
+        label_text=raw["label_text"],
+        label=raw["label"],
+        confidence=raw["confidence"],
+        probabilities=[
+            ModalityClassProb(label_text=p["label_text"], probability=p["probability"])
+            for p in raw["probabilities"]
+        ],
+    )
+class TestEEGFusionFlow:
+    def test_alzheimers_eeg_lifts_alzheimers_disease_score(self, tmp_path: Path) -> None:
+        ckpt = build_dummy_eeg(tmp_path / "eeg.joblib", n_features=16)
+        model = eeg_model.load(ckpt)
+        eeg_pred = _eeg_pred_from_features(model, np.full((16,), 2.0, dtype=np.float32))
+        out = engine.fuse(FusionInput(eeg=eeg_pred))
+        alz = next(d for d in out.diseases if d.disease == "alzheimers")
+        assert alz.probability > 0.5
+        assert any(c.modality == "eeg" for c in alz.contributions)
+        # Missing-MRI list should mention mri (it wasn't supplied).
+        assert "mri" in out.missing_inputs
+    def test_control_eeg_does_not_inflate_alzheimers(self, tmp_path: Path) -> None:
+        ckpt = build_dummy_eeg(tmp_path / "eeg.joblib", n_features=16)
+        model = eeg_model.load(ckpt)
+        eeg_pred = _eeg_pred_from_features(model, np.zeros((16,), dtype=np.float32))
+        out = engine.fuse(FusionInput(eeg=eeg_pred))
+        alz = next(d for d in out.diseases if d.disease == "alzheimers")
+        assert alz.probability < 0.5
+```
+- [ ] **Step 2:** Run → 2 passed (engine and types unchanged; the test exercises the existing `eeg` modality path with real predictions instead of hand-built fakes).
+- [ ] **Step 3:** commit: `test(fusion): EEG classifier output flows into fusion modality end-to-end`.
+---
+### Task 4: Real-artifact sanity (skips when absent)
+**Files:**
+- Create: `tests/models/test_eeg_model_real.py`
+- [ ] **Step 1: Test.**
+```python
+"""Real-artifact EEG sanity. Skipped unless data/processed/eeg_clf.joblib exists."""
+from __future__ import annotations
+from pathlib import Path
+import numpy as np
+import pytest
+from src.models import eeg_model
+REAL_CKPT = Path("data/processed/eeg_clf.joblib")
+@pytest.mark.skipif(not REAL_CKPT.exists(), reason="real EEG checkpoint not present")
+def test_real_eeg_checkpoint_loads_and_predicts():
+    model = eeg_model.load(REAL_CKPT)
+    n_features = int(getattr(model, "n_features_in_", 16))
+    features = np.zeros((n_features,), dtype=np.float32)
+    out = eeg_model.predict_features(model, features)
+    s = sum(p["probability"] for p in out["probabilities"])
+    assert abs(s - 1.0) < 1e-5
+    assert out["label_text"]  # not empty
+```
+- [ ] **Step 2:** `pytest tests/models/test_eeg_model_real.py -v` → **skipped** today (expected). Will run automatically once the user drops the real artifact in.
+- [ ] **Step 3:** commit: `test(models): EEG real-artifact sanity (skips when absent)`.
+---
+### Task 5: Streamlit form + README
+**Files:**
+- Modify: `src/frontend/app.py` (add an EEG features input — number array or file upload of a parquet row from `eeg_pipeline`)
+- Modify: `README.md`
+- [ ] **Step 1:** Streamlit. The simplest demo path: a `st.text_area` accepting comma-separated floats, parsed and POSTed to `/predict/eeg`. Place it in the doctor-view tab next to the existing MRI predict form.
+```python
+eeg_csv = st.text_area("EEG features (comma-separated)", placeholder="0.1,0.2,...")
+if st.button("Predict (EEG)"):
+    try:
+        features = [float(x.strip()) for x in eeg_csv.split(",") if x.strip()]
+    except ValueError:
+        st.error("EEG features must be numeric.")
+    else:
+        r = httpx.post(f"{API_BASE}/predict/eeg", json={"features": features}, timeout=10.0)
+        st.json(r.json())
+```
+(Integrate with the existing `httpx`/`requests` style the file already uses. If the file uses `requests`, follow that.)
+- [ ] **Step 2:** README. Append:
+```markdown
+### EEG Pretrained Classifier
+`POST /predict/eeg` runs an sklearn-style classifier (any `predict_proba` interface) on a feature vector and returns probability + attribution. The artifact loads from `data/processed/eeg_clf.joblib` (override via `EEG_CLF_ARTIFACT` env). Default labels are `("control", "alzheimers")` — override via `EEG_CLF_LABELS=label0,label1,...`.
+For the hackathon demo a synthetic stub (`tests/fixtures/build_dummy_eeg_clf.py`) is used — drop the real `.joblib` at the artifact path to swap in production weights. The fusion engine consumes this prediction as the `eeg` modality automatically; no fusion-side code changes.
+```
+- [ ] **Step 3:** `pytest -q` → no regressions.
+- [ ] **Step 4:** commit: `feat(frontend,docs): EEG predict form + README contract`.
+---
+## Self-review checklist
+1. **Spec coverage.** User asked: "for the demo assume the EEG pretrained model exists; we can find it and put it into the project later." This plan ships a working demo today (stub fixture) and a documented swap-in path for the real artifact. ✓
+2. **Independence.** EEG plumbing uses only `src.core.logger`, sklearn (already in deps), joblib, numpy. No coupling to MRI / BBB / fusion-internal code. The fusion engine consumes the EEG `ModalityPrediction` through its existing public API only. ✓
+3. **No re-training.** The plan loads a classifier and runs `predict_proba` — never trains anything at runtime. ✓
+4. **Demo ready without the real artifact.** Tests pass green using only the stub fixture; the real-artifact sanity test auto-skips. ���
+5. **No placeholders.** Every step has full code blocks. ✓
+---
+## Execution handoff
+Save and choose: subagent-driven (recommended) or inline executing-plans.

docs/superpowers/plans/2026-05-02-external-assets-integration-roadmap.md CHANGED Viewed

@@ -8,7 +8,7 @@
 |---|---|---|
 | Pretrained MRI 2D classifier | PyTorch resnet18 trained on Kaggle's 4-class Alzheimer's MRI dataset (`MildDemented` / `ModerateDemented` / `NonDemented` / `VeryMildDemented`) | The dummy ONNX model in `tests/fixtures/build_dummy_mri_onnx.py`; the placeholder behaviour in `src/models/mri_model.py` |
 | TF-IDF RAG corpus | 14 medical PDFs (Alzheimer + Parkinson + lifestyle/nutrition/exercise) with a pre-built TF-IDF index and Turkish query expansion | The existing FAISS+fastembed RAG in `src/rag/` (or runs alongside it) |
-| OASIS tabular classifier (ipynb) | sklearn ensemble on OASIS longitudinal biomarkers (MMSE, eTIV, nWBV, ASF, …) | **Not an EEG model** — see sub-plan #3 for two routing options |
 ---
@@ -18,7 +18,7 @@
 |---|---|---|---|---|
 | 1 | `2026-05-02-mri-dl-2d-integration.md` | Real MRI deep-learning model in production path | — (parallel to fusion) | yes (Streamlit + curl) |
 | 2 | `2026-05-02-tfidf-rag-integration.md` | Lifestyle / clinical-paper RAG with Turkish support | — | yes (CLI + agent tool) |
-| 3 | `2026-05-02-oasis-tabular-fusion-integration.md` | Tabular OASIS classifier as a fusion-engine feature **OR** wait for a real EEG model | fusion engine (#1 of clinical-platform-roadmap) | yes (POST /fusion/predict) |
 ---
@@ -33,13 +33,13 @@
                       └──────────────┬───────────────────┘
                                      │
                               ┌──────▼─────────┐
-                              │ #3 OASIS       │
-                              │ classifier as  │
-                              │ fusion feature │
                               └────────────────┘
 ```
-#1 and #2 can be built in parallel (different files). #3 should follow once both are stable so the demo flows end-to-end.
 ---
@@ -47,9 +47,9 @@
 These are **not** dev gaps — they are inputs we need from outside this codebase. Each sub-plan calls them out explicitly in its preamble, but listing here so they are in one place.
-### A. MRI checkpoint file is not on this machine
-The user said the artifact lives at `outputs\checkpoints\best_model.pt` (Windows-style path). `find /Users/mertgungor` returns no `best_model.pt`. Sub-plan #1 cannot start until the file is at `data/processed/mri_dl_2d/best_model.pt` (gitignored — never commit a model binary). Confirm class index order matches the trainer:
 ```python
 CLASS_TO_IDX = {
@@ -60,18 +60,11 @@ CLASS_TO_IDX = {
 }
 ```
-If the trainer used a different ordering (`ImageFolder` alphabetises by default), the labels we surface will be wrong. Sub-plan #1 ships a sanity test that catches this.
-### B. The "EEG ipynb" is OASIS tabular, not EEG
-`/Users/mertgungor/Downloads/rag/detecting-early-alzheimer-s (1).ipynb` trains an sklearn ensemble (LogReg / SVM / DT / RF / AdaBoost) on the OASIS longitudinal MRI **tabular** dataset (`oasis_longitudinal.csv` — MMSE, eTIV, nWBV, ASF, EDUC, SES, …). It contains zero EEG signal processing and saves no model artifact.
-Sub-plan #3 has **two branches**:
-- **Branch 3a (default).** Treat the OASIS biomarker model as a clinical-tests extension to the fusion engine (already accepts MMSE etc. as features — this just adds eTIV/nWBV/ASF and re-runs the trained sklearn model in-process).
-- **Branch 3b.** If the user has a real EEG model elsewhere (a checkpoint file that consumes raw FIF / EDF data and emits class probabilities), the user must point us to it and we re-scope sub-plan #3 around that artifact.
-The user must pick the branch before sub-plan #3 starts.
 ### C. RAG corpus location

 |---|---|---|
 | Pretrained MRI 2D classifier | PyTorch resnet18 trained on Kaggle's 4-class Alzheimer's MRI dataset (`MildDemented` / `ModerateDemented` / `NonDemented` / `VeryMildDemented`) | The dummy ONNX model in `tests/fixtures/build_dummy_mri_onnx.py`; the placeholder behaviour in `src/models/mri_model.py` |
 | TF-IDF RAG corpus | 14 medical PDFs (Alzheimer + Parkinson + lifestyle/nutrition/exercise) with a pre-built TF-IDF index and Turkish query expansion | The existing FAISS+fastembed RAG in `src/rag/` (or runs alongside it) |
+| EEG pretrained classifier (assumed for demo) | Any classifier with a `predict_proba` interface that emits Alzheimer's-related class probabilities. **The real artifact is not in the repo yet** — for the hackathon demo we ship a stub-able contract; swap the real `.joblib` (or `.onnx` / `.pt`) in later. | The current EEG path which is signal-processing only (no classifier yet in `src/models/`) |
 ---
 |---|---|---|---|---|
 | 1 | `2026-05-02-mri-dl-2d-integration.md` | Real MRI deep-learning model in production path | — (parallel to fusion) | yes (Streamlit + curl) |
 | 2 | `2026-05-02-tfidf-rag-integration.md` | Lifestyle / clinical-paper RAG with Turkish support | — | yes (CLI + agent tool) |
+| 3 | `2026-05-02-eeg-stub-integration.md` | Stub-able EEG classifier contract that flows into fusion as the `eeg` modality. Real artifact swaps in later without code changes. | fusion engine (already shipped) | yes (POST /predict/eeg + fusion) |
 ---
                       └──────────────┬───────────────────┘
                                      │
                               ┌──────▼─────────┐
+                              │ #3 EEG stub    │
+                              │ (real artifact │
+                              │  drops in later)│
                               └────────────────┘
 ```
+All three are independent on file boundaries — they can be built in parallel by different subagents. The diagram shows demo flow priority, not a build dependency.
 ---
 These are **not** dev gaps — they are inputs we need from outside this codebase. Each sub-plan calls them out explicitly in its preamble, but listing here so they are in one place.
+### A. MRI checkpoint drop-in
+The artifact lives at `outputs\checkpoints\best_model.pt` on the trainer machine. Drop it at `data/processed/mri_dl_2d/best_model.pt` in this repo (gitignored — never commit a model binary). The user's BEST_PARAMS are final: `image_size=160`, `model_name=resnet18`, 4-class head with the index order below. The integration code does not retrain or second-guess; it loads and predicts.
 ```python
 CLASS_TO_IDX = {
 }
 ```
+Sub-plan #1 ships a real-artifact sanity test that runs only when the file is present (skipped otherwise) — catches any class-order or input-shape drift the trainer might surprise us with later.
+### B. EEG artifact is intentionally a stub for the demo
+Real EEG checkpoint will land later. For the hackathon, sub-plan #3 ships a stub artifact (`tests/fixtures/build_dummy_eeg_clf.py` produces a synthetic joblib-pickled `RandomForestClassifier`) and a clear contract: **input** = numpy array of shape `(n_features,)` matching the existing `eeg_pipeline.py` feature output; **output** = class probabilities for `("control", "alzheimers")`. Swapping in the real artifact later requires zero code changes — just drop the file at `data/processed/eeg_clf.joblib` and update labels in env if the real classes differ.
 ### C. RAG corpus location

docs/superpowers/plans/2026-05-02-mri-dl-2d-integration.md CHANGED Viewed

@@ -32,18 +32,13 @@ CLASS_TO_IDX = {
 ---
-## Prerequisite (controller blocker)
-The artifact `best_model.pt` is **not** present on this filesystem. Before any task starts:
-1. Copy the file from the trainer machine to `data/processed/mri_dl_2d/best_model.pt`.
-2. Confirm with `python -c "import torch; sd = torch.load('data/processed/mri_dl_2d/best_model.pt', map_location='cpu'); print(type(sd), list(sd.keys())[:5] if isinstance(sd, dict) else sd)"`. Two possible structures:
-   - **`state_dict` only** (most common): `dict[str, Tensor]`. Task 1 builds the resnet18 architecture and `load_state_dict`s.
-   - **Full model** (`torch.save(model, ...)`): a pickled `nn.Module`. Task 1 just calls `torch.load(...)`.
-   - The plan defaults to **state_dict** (more portable). If the file turns out to be a full model, Task 1 has a fallback branch.
-3. Add the artifact path to `.gitignore` if it isn't already covered (`data/processed/` should already be ignored — verify).
-If step 2 fails, **stop and surface to the user** — the trainer either produced a different artifact or saved with an unexpected structure.
 ---

 ---
+## Asset note
+The user's trained checkpoint (`outputs\checkpoints\best_model.pt` on the trainer machine) is the artifact this plan loads. Drop it at `data/processed/mri_dl_2d/best_model.pt`. The training is finished — this plan does **not** retrain or re-tune; it loads the user's exact `state_dict` (or pickled `nn.Module`) and runs inference with the documented preprocessing contract (resize to `image_size=160`, ImageNet normalisation, 4-class head).
+`data/processed/` is already gitignored — never commit the binary.
+The plan's tests use a synthetic dummy resnet18 (built on demand in `tests/fixtures/build_dummy_resnet18_2d.py`), so every TDD step runs green even before the real artifact arrives. Task 4 adds a real-artifact sanity test that auto-skips when the checkpoint is absent and runs once the user drops it in.
 ---

docs/superpowers/plans/2026-05-02-oasis-tabular-fusion-integration.md DELETED Viewed

@@ -1,641 +0,0 @@
-# OASIS Tabular Classifier — Fusion Integration Plan
-> **For agentic workers:** REQUIRED SUB-SKILL: `superpowers:subagent-driven-development`. TDD throughout.
-## ⚠️ Important context — read before executing
-The user said "I have the pretrained model for eeg, integrate it into the eeg pipeline. its the ipynb file named detecting-early-alzheimers...".
-The notebook (`/Users/mertgungor/Downloads/rag/detecting-early-alzheimer-s (1).ipynb`) is **NOT an EEG model**. It is an sklearn ensemble (LogReg / SVM / DT / RF / AdaBoost) trained on the OASIS longitudinal **tabular** dataset — features are MMSE, eTIV, nWBV, ASF, EDUC, SES, M/F, Age. Zero EEG signal processing. Zero saved model artifact (the notebook trains in-memory only).
-This plan therefore has **two branches**. Pick one with the user before executing.
-### Branch 3a — Train + integrate the OASIS *tabular* classifier as a fusion feature
-We re-train the best variant (Random Forest, AUC 84.4 % per the notebook) from the OASIS CSV, save a `joblib` artifact, and expose it as a fusion-engine modality named `tabular_oasis`. The fusion engine already handles arbitrary modality keys; this plugs in cleanly.
-**Demo value:** When a doctor has only OASIS-style biomarkers (MMSE / eTIV / nWBV / ASF / Age / EDUC / SES / M/F) but no MRI image, the fusion engine still produces an Alzheimer's confidence with attribution.
-### Branch 3b — User has a real EEG model elsewhere
-If the user can point us to a checkpoint that consumes raw FIF / EDF EEG data (e.g., a `.pt`, `.pth`, `.h5`, `.onnx`, or `.joblib` file) and emits Alzheimer's class probabilities, this plan is rewritten around that artifact: signature, expected input shape, label order. We replace `src/models/eeg_model.py` (currently absent — `eeg_pipeline.py` only does signal processing) with a new module similar to `mri_dl_2d.py`.
-**The user must pick a branch** before any task starts. The default below is **Branch 3a**, because the notebook is what's actually on disk.
----
-## Branch 3a (default): OASIS tabular classifier as fusion modality
-**Goal.** Save a Random Forest trained on OASIS biomarkers; wire it into the fusion engine as a new modality `tabular_oasis`. The doctor enters MMSE/eTIV/nWBV/ASF (fusion already takes MMSE; this extends to the other three) and gets an Alzheimer's signal that flows through the existing logit/sigmoid combiner.
-**Architecture.** New module `src/models/tabular_oasis.py` trains-or-loads a `joblib`-pickled `Pipeline(scaler -> RandomForestClassifier)`. The fusion engine grows one entry in `_CLINICAL_FNS` (or, more cleanly, a sibling `_TABULAR_FNS`) so the model's class probability for `Demented=1` becomes a signed signal. New API route `POST /predict/tabular_oasis` lets the frontend call it directly. All optional — if the OASIS CSV is absent, the module degrades gracefully and fusion ignores the modality.
-**Tech stack.** scikit-learn (already in deps), pandas, joblib (likely in deps via sklearn).
----
-## Prerequisite (controller blocker)
-The OASIS dataset is not in this repo. Two acquisition options:
-1. **Download from Kaggle** (https://www.kaggle.com/datasets/jboysen/mri-and-alzheimers, file `oasis_longitudinal.csv`). Save to `data/external/oasis_longitudinal.csv`. Gitignore (already covered by `data/external_rag/` if you broaden it; otherwise add `data/external/`).
-2. **Use a local copy** if the user already downloaded it for the notebook. Same destination.
-If the dataset is unavailable, **stop and surface to the user**. The classifier cannot be trained without it; we will not fabricate synthetic OASIS-shaped data for a clinical demo.
----
-## File structure
-| Path | Responsibility |
-|---|---|
-| Modify `requirements.txt` | confirm `joblib` (sklearn pulls it transitively but pin explicitly is safer) |
-| Modify `.gitignore` | ensure `data/external/` is ignored |
-| Create `src/models/tabular_oasis.py` | train + persist + load + predict the OASIS RF classifier |
-| Create `scripts/train_oasis.py` | one-shot CLI: trains and saves the model artifact |
-| Modify `src/fusion/types.py` | extend `ClinicalScores` with `etiv`, `nwbv`, `asf`, `educ`, `ses`, `is_male` |
-| Modify `src/fusion/weights.py` | add `tabular_oasis` weight key for `alzheimers` |
-| Modify `src/fusion/engine.py` | add `tabular_oasis` to the modality dispatch |
-| Modify `src/api/routes.py` | new route `POST /predict/tabular_oasis` |
-| Modify `src/api/schemas.py` | request/response for the new route |
-| Create `tests/models/test_tabular_oasis.py` | training + persistence + prediction tests |
-| Create `tests/fixtures/build_synthetic_oasis.py` | synthetic OASIS-shaped CSV for tests (clearly labelled non-clinical) |
-| Create `tests/fusion/test_tabular_oasis_modality.py` | fusion-side integration |
-| Create `tests/api/test_tabular_oasis_route.py` | API integration |
-| Modify `README.md` | document the modality + how to acquire the OASIS CSV |
----
-## Tasks
-### Task 0: Deps + ignore
-**Files:** `requirements.txt`, `.gitignore`
-- [ ] **Step 1:** verify `joblib` and `pandas` are in `requirements.txt`. `pandas` already is (used by every pipeline). Add `joblib>=1.3,<2.0` if not pinned.
-- [ ] **Step 2:** `.gitignore` should cover `data/external/`. Add it if needed.
-- [ ] **Step 3:** `pytest -q` baseline. Commit: `chore(oasis): pin joblib; gitignore external dataset dir`.
----
-### Task 1: Training + persistence module
-**Files:**
-- Create: `src/models/tabular_oasis.py`
-- Create: `scripts/train_oasis.py`
-- Create: `tests/fixtures/build_synthetic_oasis.py`
-- Create: `tests/models/test_tabular_oasis.py`
-- [ ] **Step 1: Synthetic-fixture helper** (clearly synthetic — never confused with real clinical data):
-`tests/fixtures/build_synthetic_oasis.py`:
-```python
-"""Build a synthetic OASIS-shaped CSV for tests. NON-CLINICAL data."""
-from __future__ import annotations
-from pathlib import Path
-import numpy as np
-import pandas as pd
-def build(path: Path, n: int = 200, seed: int = 42) -> Path:
-    """Save a synthetic CSV at `path` with the columns the trainer expects."""
-    path = Path(path)
-    if path.exists():
-        return path
-    rng = np.random.default_rng(seed)
-    n_dem = n // 2
-    # Demented half — lower MMSE, higher CDR, smaller nWBV.
-    dem = pd.DataFrame({
-        "Group":   ["Demented"] * n_dem,
-        "M/F":     rng.choice(["M", "F"], n_dem),
-        "Age":     rng.integers(70, 95, n_dem),
-        "EDUC":    rng.integers(8, 18, n_dem),
-        "SES":     rng.integers(1, 5, n_dem),
-        "MMSE":    rng.integers(15, 26, n_dem),
-        "CDR":     rng.choice([0.5, 1.0], n_dem),
-        "eTIV":    rng.integers(1200, 1700, n_dem),
-        "nWBV":    rng.uniform(0.65, 0.74, n_dem),
-        "ASF":     rng.uniform(1.0, 1.4, n_dem),
-        "Visit":   1,
-        "Hand":    "R",
-    })
-    nondem = pd.DataFrame({
-        "Group":   ["Nondemented"] * (n - n_dem),
-        "M/F":     rng.choice(["M", "F"], n - n_dem),
-        "Age":     rng.integers(60, 90, n - n_dem),
-        "EDUC":    rng.integers(10, 22, n - n_dem),
-        "SES":     rng.integers(1, 5, n - n_dem),
-        "MMSE":    rng.integers(26, 31, n - n_dem),
-        "CDR":     rng.choice([0.0], n - n_dem),
-        "eTIV":    rng.integers(1300, 1900, n - n_dem),
-        "nWBV":    rng.uniform(0.70, 0.83, n - n_dem),
-        "ASF":     rng.uniform(0.9, 1.5, n - n_dem),
-        "Visit":   1,
-        "Hand":    "R",
-    })
-    pd.concat([dem, nondem], ignore_index=True).to_csv(path, index=False)
-    return path
-```
-- [ ] **Step 2: Failing test.**
-`tests/models/test_tabular_oasis.py`:
-```python
-"""Tests for src.models.tabular_oasis."""
-from __future__ import annotations
-from pathlib import Path
-import pytest
-from src.models import tabular_oasis
-from tests.fixtures.build_synthetic_oasis import build as build_synth
-class TestTrainAndPredict:
-    def test_train_persists_loadable_artifact(self, tmp_path: Path) -> None:
-        csv = build_synth(tmp_path / "oasis.csv")
-        artifact = tabular_oasis.train_from_csv(csv, tmp_path / "rf.joblib")
-        assert artifact.exists()
-        loaded = tabular_oasis.load(artifact)
-        assert hasattr(loaded, "predict_proba")
-    def test_predict_returns_full_dict(self, tmp_path: Path) -> None:
-        csv = build_synth(tmp_path / "oasis.csv")
-        artifact = tabular_oasis.train_from_csv(csv, tmp_path / "rf.joblib")
-        model = tabular_oasis.load(artifact)
-        out = tabular_oasis.predict_one(model, {
-            "is_male": 1, "age": 80, "educ": 10, "ses": 3.0,
-            "mmse": 18.0, "etiv": 1500.0, "nwbv": 0.68, "asf": 1.2,
-        })
-        assert set(out) == {"label", "label_text", "confidence", "probabilities"}
-        assert out["label"] in {0, 1}
-        assert out["label_text"] in {"Nondemented", "Demented"}
-        assert 0.0 <= out["confidence"] <= 1.0
-        probs = out["probabilities"]
-        assert len(probs) == 2
-        assert abs(sum(p["probability"] for p in probs) - 1.0) < 1e-5
-    def test_predict_with_synthetic_demented_profile_yields_demented_label(self, tmp_path: Path) -> None:
-        # The synthetic data has clean separation, so a clearly-demented profile
-        # (MMSE=15, low nWBV, age 88) should classify as Demented.
-        csv = build_synth(tmp_path / "oasis.csv")
-        artifact = tabular_oasis.train_from_csv(csv, tmp_path / "rf.joblib")
-        model = tabular_oasis.load(artifact)
-        out = tabular_oasis.predict_one(model, {
-            "is_male": 1, "age": 88, "educ": 8, "ses": 3.0,
-            "mmse": 15.0, "etiv": 1300.0, "nwbv": 0.66, "asf": 1.3,
-        })
-        assert out["label_text"] == "Demented"
-    def test_load_missing_artifact_raises(self, tmp_path: Path) -> None:
-        with pytest.raises(FileNotFoundError, match="OASIS classifier artifact not found"):
-            tabular_oasis.load(tmp_path / "missing.joblib")
-```
-Run → ImportError.
-- [ ] **Step 3: Minimal impl.**
-`src/models/tabular_oasis.py`:
-```python
-"""OASIS tabular Alzheimer's classifier — Random Forest with full pipeline."""
-from __future__ import annotations
-from pathlib import Path
-from typing import Any
-import joblib
-import numpy as np
-import pandas as pd
-from sklearn.ensemble import RandomForestClassifier
-from sklearn.pipeline import Pipeline
-from sklearn.preprocessing import MinMaxScaler
-from src.core.logger import get_logger
-logger = get_logger(__name__)
-FEATURE_ORDER: tuple[str, ...] = (
-    "is_male", "age", "educ", "ses", "mmse", "etiv", "nwbv", "asf",
-)
-LABEL_NAMES: tuple[str, ...] = ("Nondemented", "Demented")
-def _df_from_oasis_csv(csv_path: Path) -> tuple[pd.DataFrame, pd.Series]:
-    """Replicate the notebook's preprocessing: first visit only, M/F encoded,
-    Converted-as-Demented, drop unused columns, median-impute SES on EDUC."""
-    df = pd.read_csv(csv_path)
-    df = df.loc[df["Visit"] == 1].reset_index(drop=True)
-    df["M/F"] = df["M/F"].replace({"F": 0, "M": 1})
-    df["Group"] = df["Group"].replace({"Converted": "Demented"}).replace(
-        {"Demented": 1, "Nondemented": 0}
-    )
-    df = df.drop(columns=[c for c in ("MRI ID", "Visit", "Hand") if c in df.columns])
-    df["SES"] = df["SES"].fillna(df.groupby("EDUC")["SES"].transform("median"))
-    feature_df = pd.DataFrame({
-        "is_male": df["M/F"].astype(float),
-        "age":     df["Age"].astype(float),
-        "educ":    df["EDUC"].astype(float),
-        "ses":     df["SES"].astype(float),
-        "mmse":    df["MMSE"].astype(float),
-        "etiv":    df["eTIV"].astype(float),
-        "nwbv":    df["nWBV"].astype(float),
-        "asf":     df["ASF"].astype(float),
-    })[list(FEATURE_ORDER)]
-    return feature_df, df["Group"].astype(int)
-def train_from_csv(csv_path: Path, artifact_path: Path) -> Path:
-    """Train and persist a MinMaxScaler→RandomForest pipeline. Returns artifact path."""
-    csv_path = Path(csv_path)
-    artifact_path = Path(artifact_path)
-    if not csv_path.exists():
-        raise FileNotFoundError(f"OASIS CSV not found: {csv_path}")
-    X, y = _df_from_oasis_csv(csv_path)
-    pipeline = Pipeline([
-        ("scaler", MinMaxScaler()),
-        ("rf",     RandomForestClassifier(
-            n_estimators=12, max_depth=8, max_features=8,
-            n_jobs=4, random_state=0,
-        )),
-    ])
-    pipeline.fit(X, y)
-    artifact_path.parent.mkdir(parents=True, exist_ok=True)
-    joblib.dump(pipeline, artifact_path)
-    logger.info("trained OASIS RF: n=%d, artifact=%s", len(X), artifact_path)
-    return artifact_path
-def load(artifact_path: Path) -> Pipeline:
-    p = Path(artifact_path)
-    if not p.exists():
-        raise FileNotFoundError(f"OASIS classifier artifact not found: {p}")
-    return joblib.load(p)
-def predict_one(model: Pipeline, features: dict[str, float]) -> dict[str, Any]:
-    """Predict for a single subject. `features` must have all FEATURE_ORDER keys."""
-    missing = [k for k in FEATURE_ORDER if k not in features]
-    if missing:
-        raise ValueError(f"OASIS prediction missing features: {missing}")
-    row = pd.DataFrame([{k: float(features[k]) for k in FEATURE_ORDER}])
-    probs = np.asarray(model.predict_proba(row))[0]
-    label_idx = int(np.argmax(probs))
-    return {
-        "label": label_idx,
-        "label_text": LABEL_NAMES[label_idx],
-        "confidence": float(probs[label_idx]),
-        "probabilities": [
-            {"label": i, "label_text": LABEL_NAMES[i], "probability": float(p)}
-            for i, p in enumerate(probs)
-        ],
-    }
-```
-`scripts/train_oasis.py`:
-```python
-"""CLI: train the OASIS RF classifier and save it.
-Usage:
-    python scripts/train_oasis.py data/external/oasis_longitudinal.csv data/processed/oasis_rf.joblib
-"""
-from __future__ import annotations
-import sys
-from pathlib import Path
-from src.models.tabular_oasis import train_from_csv
-def main() -> None:
-    if len(sys.argv) != 3:
-        print(__doc__)
-        sys.exit(1)
-    csv = Path(sys.argv[1])
-    out = Path(sys.argv[2])
-    train_from_csv(csv, out)
-    print(f"saved: {out}")
-if __name__ == "__main__":
-    main()
-```
-Run tests → 4 passed.
-- [ ] **Step 4:** commit: `feat(models): OASIS tabular Alzheimer's RF classifier (joblib + train CLI)`.
----
-### Task 2: Extend fusion's clinical inputs
-**Files:**
-- Modify: `src/fusion/types.py` (extend `ClinicalScores`)
-- Modify: `src/fusion/clinical.py` (add normalisers for the new fields)
-- Modify: `tests/fusion/test_types.py` (loosen / extend bound tests)
-- Modify: `tests/fusion/test_clinical.py` (add new normaliser tests)
-- [ ] **Step 1: Failing test for new ClinicalScores fields.**
-In `tests/fusion/test_types.py`, append:
-```python
-class TestExtendedClinicalScores:
-    def test_etiv_in_range(self) -> None:
-        s = ClinicalScores(etiv=1500.0)
-        assert s.etiv == pytest.approx(1500.0)
-    def test_etiv_out_of_range_rejected(self) -> None:
-        with pytest.raises(ValidationError):
-            ClinicalScores(etiv=5000.0)
-    def test_nwbv_in_range(self) -> None:
-        s = ClinicalScores(nwbv=0.72)
-        assert s.nwbv == pytest.approx(0.72)
-```
-- [ ] **Step 2: Update `src/fusion/types.py` ClinicalScores.**
-Add fields (preserve existing ones):
-```python
-class ClinicalScores(BaseModel):
-    mmse: Annotated[float, Field(ge=0.0, le=30.0)] | None = None
-    moca: Annotated[float, Field(ge=0.0, le=30.0)] | None = None
-    updrs: Annotated[float, Field(ge=0.0, le=199.0)] | None = None
-    gait_speed_m_s: Annotated[float, Field(ge=0.0, le=2.5)] | None = None
-    age_years: Annotated[float, Field(ge=0.0, le=120.0)] | None = None
-    # OASIS biomarkers — used by the tabular_oasis modality.
-    etiv: Annotated[float, Field(ge=900.0, le=2200.0)] | None = None
-    nwbv: Annotated[float, Field(ge=0.5, le=0.95)] | None = None
-    asf:  Annotated[float, Field(ge=0.5, le=2.0)]  | None = None
-    educ: Annotated[float, Field(ge=0.0, le=30.0)] | None = None
-    ses:  Annotated[float, Field(ge=1.0, le=5.0)]  | None = None
-    is_male: Annotated[int, Field(ge=0, le=1)]     | None = None
-```
-- [ ] **Step 3:** the tests should pass after the type change. `pytest tests/fusion/test_types.py -v`.
-- [ ] **Step 4:** commit: `feat(fusion): extend ClinicalScores with OASIS biomarker fields`.
----
-### Task 3: Wire `tabular_oasis` modality into the fusion engine
-**Files:**
-- Modify: `src/fusion/weights.py`
-- Modify: `src/fusion/engine.py`
-- Create: `tests/fusion/test_tabular_oasis_modality.py`
-- [ ] **Step 1: Update weights.**
-`src/fusion/weights.py`, in the `alzheimers` table:
-```python
-"alzheimers": {
-    "mri":              0.25,   # was 0.35
-    "eeg":              0.15,   # was 0.20
-    "tabular_oasis":    0.20,   # new
-    "clinical_mmse":    0.20,
-    "clinical_moca":    0.10,   # was 0.15
-    "clinical_age":     0.10,
-},
-```
-Re-balance so the table still sums to 1.0. Add a comment that re-balancing changed the existing tests' tolerances — verify which tests need updating.
-- [ ] **Step 2: Failing fusion-modality test.**
-`tests/fusion/test_tabular_oasis_modality.py`:
-```python
-"""Tests: tabular_oasis modality contributes to alzheimers fusion score."""
-from __future__ import annotations
-import os
-from pathlib import Path
-import pytest
-from src.fusion import engine
-from src.fusion.types import ClinicalScores, FusionInput
-from src.models.tabular_oasis import train_from_csv
-from tests.fixtures.build_synthetic_oasis import build as build_synth
-@pytest.fixture()
-def trained_artifact(tmp_path: Path, monkeypatch) -> Path:
-    csv = build_synth(tmp_path / "oasis.csv")
-    art = train_from_csv(csv, tmp_path / "rf.joblib")
-    monkeypatch.setenv("OASIS_RF_ARTIFACT", str(art))
-    return art
-class TestTabularOasisModality:
-    def test_demented_profile_raises_alzheimers(self, trained_artifact: Path) -> None:
-        out = engine.fuse(FusionInput(clinical=ClinicalScores(
-            is_male=1, age_years=88, educ=8, ses=3.0,
-            mmse=15.0, etiv=1300.0, nwbv=0.66, asf=1.3,
-        )))
-        alz = next(d for d in out.diseases if d.disease == "alzheimers")
-        assert alz.probability > 0.6
-        assert any(c.modality == "tabular_oasis" for c in alz.contributions)
-    def test_missing_oasis_inputs_skips_modality(self, trained_artifact: Path) -> None:
-        # MMSE alone but no etiv/nwbv → tabular_oasis should be skipped, not error.
-        out = engine.fuse(FusionInput(clinical=ClinicalScores(mmse=12.0)))
-        alz = next(d for d in out.diseases if d.disease == "alzheimers")
-        names = {c.modality for c in alz.contributions}
-        assert "tabular_oasis" not in names
-```
-- [ ] **Step 3: Update the engine.**
-In `src/fusion/engine.py`, add a tabular-modality dispatcher that lazy-loads the joblib artifact once and treats the OASIS classifier's `P(Demented)` as the alzheimers signal `2*P-1`:
-```python
-import os
-_oasis_cache: dict[str, Any] = {}
-def _signal_for_tabular_oasis(disease: str, clinical: ClinicalScores) -> float | None:
-    if disease != "alzheimers":
-        return None
-    required = ("is_male", "age_years", "educ", "ses", "mmse", "etiv", "nwbv", "asf")
-    if any(getattr(clinical, k, None) is None for k in required):
-        return None
-    artifact = os.environ.get("OASIS_RF_ARTIFACT", "data/processed/oasis_rf.joblib")
-    artifact_path = Path(artifact)
-    if not artifact_path.exists():
-        logger.warning("tabular_oasis artifact missing at %s; skipping modality", artifact_path)
-        return None
-    if "model" not in _oasis_cache:
-        from src.models.tabular_oasis import load
-        _oasis_cache["model"] = load(artifact_path)
-    from src.models.tabular_oasis import predict_one
-    feats = {
-        "is_male": int(clinical.is_male),
-        "age":     float(clinical.age_years),
-        "educ":    float(clinical.educ),
-        "ses":     float(clinical.ses),
-        "mmse":    float(clinical.mmse),
-        "etiv":    float(clinical.etiv),
-        "nwbv":    float(clinical.nwbv),
-        "asf":     float(clinical.asf),
-    }
-    pred = predict_one(_oasis_cache["model"], feats)
-    p_dem = next(p["probability"] for p in pred["probabilities"] if p["label_text"] == "Demented")
-    return 2.0 * p_dem - 1.0
-```
-In `_signal_for_modality`, add the dispatch:
-```python
-if modality_key == "tabular_oasis":
-    return _signal_for_tabular_oasis(disease, clinical)
-```
-- [ ] **Step 4:** `pytest tests/fusion/ -v` — expect re-balancing to perturb a couple of existing thresholds. Adjust thresholds in the affected tests (e.g., the disagreement test) so they still hold with the new weights, OR adjust the new weights so existing tests still pass within tolerance. Prefer the latter — existing thresholds were chosen carefully.
-- [ ] **Step 5:** commit: `feat(fusion): add tabular_oasis modality with lazy joblib load`.
----
-### Task 4: API + Streamlit + README
-**Files:**
-- Modify: `src/api/routes.py` — add `POST /predict/tabular_oasis`
-- Modify: `src/api/schemas.py` — request/response schemas
-- Modify: `src/frontend/app.py` — extend the Doctor view's clinical-input form with eTIV / nWBV / ASF / EDUC / SES
-- Modify: `README.md` — describe the new modality and the OASIS dataset path
-- [ ] **Step 1: New schemas.**
-`src/api/schemas.py`:
-```python
-class TabularOasisRequest(BaseModel):
-    is_male: int = Field(..., ge=0, le=1)
-    age: float = Field(..., ge=0.0, le=120.0)
-    educ: float = Field(..., ge=0.0, le=30.0)
-    ses: float = Field(..., ge=1.0, le=5.0)
-    mmse: float = Field(..., ge=0.0, le=30.0)
-    etiv: float = Field(..., ge=900.0, le=2200.0)
-    nwbv: float = Field(..., ge=0.5, le=0.95)
-    asf: float = Field(..., ge=0.5, le=2.0)
-class TabularOasisProbability(BaseModel):
-    label: int
-    label_text: str
-    probability: float
-class TabularOasisResponse(BaseModel):
-    label: int
-    label_text: str
-    confidence: float
-    probabilities: list[TabularOasisProbability]
-```
-- [ ] **Step 2: Route.**
-`src/api/routes.py`:
-```python
-@predict_router.post("/tabular_oasis", response_model=TabularOasisResponse)
-def predict_tabular_oasis(req: TabularOasisRequest) -> TabularOasisResponse:
-    from src.models.tabular_oasis import load, predict_one
-    artifact = Path(os.environ.get("OASIS_RF_ARTIFACT", "data/processed/oasis_rf.joblib"))
-    model = load(artifact)
-    out = predict_one(model, req.model_dump())
-    return TabularOasisResponse(**out)
-```
-- [ ] **Step 3: Test (`tests/api/test_tabular_oasis_route.py`).**
-```python
-"""Integration: POST /predict/tabular_oasis."""
-from __future__ import annotations
-from pathlib import Path
-import pytest
-from fastapi.testclient import TestClient
-from src.api.main import app
-from src.models.tabular_oasis import train_from_csv
-from tests.fixtures.build_synthetic_oasis import build as build_synth
-@pytest.fixture()
-def client(monkeypatch, tmp_path):
-    csv = build_synth(tmp_path / "oasis.csv")
-    artifact = train_from_csv(csv, tmp_path / "rf.joblib")
-    monkeypatch.setenv("OASIS_RF_ARTIFACT", str(artifact))
-    return TestClient(app)
-def test_predict_tabular_oasis_demented_profile(client):
-    body = {
-        "is_male": 1, "age": 88, "educ": 8, "ses": 3.0,
-        "mmse": 15.0, "etiv": 1300.0, "nwbv": 0.66, "asf": 1.3,
-    }
-    r = client.post("/predict/tabular_oasis", json=body)
-    assert r.status_code == 200, r.text
-    data = r.json()
-    assert data["label_text"] == "Demented"
-```
-- [ ] **Step 4:** Streamlit form extension. In `src/frontend/app.py`, find the clinical-inputs section the doctor view exposes (likely under a "Clinical scores" expander; if absent, add it under the fusion tab). Add number_input widgets for the seven new fields (`is_male`, `age`, `educ`, `ses`, `etiv`, `nwbv`, `asf`) that flow into the existing `/fusion/predict` payload's `clinical` block.
-- [ ] **Step 5:** README. Append:
-```markdown
-### OASIS Tabular Alzheimer's Classifier
-A scikit-learn Random Forest trained on the OASIS longitudinal dataset (https://www.oasis-brains.org/) classifies Demented vs Nondemented from 8 biomarkers (sex, age, education, SES, MMSE, eTIV, nWBV, ASF). It contributes to the fusion engine as modality `tabular_oasis` (weight 0.20 for Alzheimer's).
-To use: download `oasis_longitudinal.csv` from Kaggle, save to `data/external/oasis_longitudinal.csv`, then:
-```bash
-python scripts/train_oasis.py data/external/oasis_longitudinal.csv data/processed/oasis_rf.joblib
-export OASIS_RF_ARTIFACT=data/processed/oasis_rf.joblib
-```
-The fusion engine and `POST /predict/tabular_oasis` will pick it up. If the artifact is missing, the modality is skipped — fusion still works.
-```
-- [ ] **Step 6:** commit: `feat(oasis): /predict/tabular_oasis route + Streamlit form + README`.
----
-## Self-review checklist
-1. **Independence.** OASIS classifier and fusion remain decoupled when the artifact is absent (`OASIS_RF_ARTIFACT` unset → modality skipped). ✓
-2. **No real-data fabrication.** Tests use a clearly-labelled synthetic CSV. The real OASIS dataset is never committed. ✓
-3. **Backward compatibility.** Existing `ClinicalScores` fields untouched. New fields are all `Optional`. ✓
-4. **Branch 3a vs 3b.** This plan is Branch 3a. If the user picks Branch 3b, this plan is replaced wholesale.
----
-## Execution handoff
-Save and choose: subagent-driven (recommended) or inline executing-plans.
-**Reminder to controller:** before starting any task, confirm with the user: "Do you have a real EEG checkpoint I'm missing, or shall I proceed with Branch 3a (OASIS tabular Alzheimer's classifier)?"