Spaces:

mekosotto
/

hackathon

Running

App Files Files Community

bekir32419 commited on 5 days ago

Commit

c0a7163

1 Parent(s): 5af70c3

Add project files

Browse files

Files changed (27) hide show

.dockerignore +3 -0
.gitattributes +1 -0
AGENTS.md +69 -38
Dockerfile +3 -0
Dockerfile.hf +3 -0
PROJECT_OVERVIEW.md +80 -35
README.md +97 -23
conftest.py +1 -1
data/knowledge_base/README.md +5 -3
docker-compose.yml +3 -0
docker-entrypoint.sh +30 -0
requirements.txt +2 -0
src/agents/orchestrator.py +217 -10
src/agents/prompts.py +1 -0
src/agents/routing.py +81 -0
src/agents/schemas.py +4 -1
src/agents/tools.py +2 -1
src/api/routes.py +77 -8
src/api/schemas.py +36 -0
src/frontend/app.py +49 -0
src/models/mri_model.py +149 -0
tests/agents/test_agent_route.py +1 -1
tests/agents/test_orchestrator.py +70 -0
tests/agents/test_tools.py +25 -1
tests/api/test_routes.py +68 -0
tests/fixtures/build_dummy_mri_onnx.py +20 -0
tests/models/test_mri_model.py +54 -0

.dockerignore CHANGED Viewed

@@ -11,6 +11,9 @@ mlruns/
 .github/
 docs/
 tests/
 !tests/fixtures/
 .streamlit/
 notebooks/

 .github/
 docs/
 tests/
+!tests/
+tests/*
 !tests/fixtures/
+!tests/fixtures/**
 .streamlit/
 notebooks/

.gitattributes ADDED Viewed

	@@ -0,0 +1 @@


1	+ *.sh text eol=lf

AGENTS.md CHANGED Viewed

@@ -50,10 +50,14 @@ All experiment runs are tracked in **MLflow**. All services ship as **Docker** i
 │   │   ├── storage.py        # Parquet read/write helpers (snappy, single-threaded, deterministic)
 │   │   └── tracking.py       # MLflow `track_pipeline_run` context manager (see §7)
 │   ├── pipelines/            # One file per modality. Pure functions + a `run_pipeline()` entry.
-│   ├── models/               # Downstream decision-layer models (consume processed features)
-│   │   └── bbb_model.py      # BBB-permeability classifier + SHAP explainer + trainer CLI
 │   └── frontend/
-│       └── app.py            # Streamlit dashboard (3 tabs, one per modality)
 └── tests/
     ├── core/
     ├── api/
@@ -148,31 +152,43 @@ The repo-wide `conftest.py` autouse fixture pins `MLFLOW_TRACKING_URI` to a tmp
 ## 8. Decision Layer (Downstream Models)
 Pipelines produce features (`data/processed/<modality>_features.parquet`).
-Downstream models live in `src/models/` and consume those features:
 | Model | File | Output | Endpoint |
 |---|---|---|---|
 | BBB permeability | `src/models/bbb_model.py` | `data/processed/bbb_model.joblib` | `POST /predict/bbb` |
-Each downstream model module exposes a uniform surface:
 - `train(df, label_col, ...)` → fitted classifier
 - `save(model, path)` / `load(path)` → joblib artifact I/O
 - `predict_with_proba(model, smiles)` → `{label, confidence}` (confidence is the max-class probability)
 - `explain_prediction(model, smiles, top_k)` → SHAP top-k attributions sorted by `|shap_value|` descending
-The API loads the joblib artifact at request time. If the artifact is
-missing, the endpoint returns **HTTP 503** with a remediation hint pointing
-at the trainer CLI (`python -m src.models.<name>`). This keeps the API
-process startup fast and lets operators retrain without redeploying — the
-Day-5 analog of Day-4's `NEUROBRIDGE_DISABLE_MLFLOW` lifeline.
-**Determinism**: all classifiers are seeded (`random_state=42` default),
-`n_jobs=1` (no tree-parallelism races). Re-running the trainer on the same
-Parquet produces identical predictions.
 **Override `BBB_MODEL_PATH`** env var to point the API at a non-default
 artifact location (used by tests for tmp_path isolation).
 **Calibration metadata** (Day 6): `train()` does an 80/20 stratified split,
 computes precision-at-confidence-threshold bins on the held-out test set,
 and stashes them on `model._neurobridge_calibration: list[dict]` (sorted
@@ -282,8 +298,9 @@ metrics, params). `POST /experiments/diff {run_id_a, run_id_b}`
 returns a side-by-side metric+param diff (`RunDiffRow`).
 When `NEUROBRIDGE_DISABLE_MLFLOW=1`, both endpoints return empty
-responses without raising — required for the HF Spaces deployment
-where there is no writable mlruns/ tree. Unknown run ids → 404.
 The Streamlit "Experiments" tab is the user-facing surface. Cached
 in session state with an explicit Refresh button.
@@ -293,15 +310,22 @@ in session state with an explicit Refresh button.
 `Dockerfile.hf` is the Hugging Face Spaces image. Single container,
 two processes (FastAPI :8000 + Streamlit :7860) launched via
 `supervisord.conf`. Build-time `RUN python -m src.models.bbb_model`
-bakes the model artifact into the image so the first `/predict/bbb`
-call is instant on cold start.
-Default environment: `DEPLOY_ENV=hf_spaces`,
-`NEUROBRIDGE_DISABLE_MLFLOW=1`. The LLM kill-switch is **not** set —
-deployed Spaces use the real OpenRouter free-tier chain (§11) when
 `OPENROUTER_API_KEY` is configured in the Space's Secrets panel. Set
-`NEUROBRIDGE_DISABLE_LLM=1` only when you want to force the
-deterministic template path for a fully-reproducible demo.
 The README's YAML front-matter declares the Space metadata
 (SDK=docker, port=7860, app_file=src/frontend/app.py).
@@ -309,24 +333,30 @@ The README's YAML front-matter declares the Space metadata
 ## 15. Orchestrator Agent Surface
 `src/agents/orchestrator.py` exposes a single-agent function-calling
-loop over the openai SDK (no LangChain / framework dep). The agent
-holds 4 tools, defined in `src/agents/tools.py`:
 - `run_bbb_pipeline(smiles, top_k)` — wraps `POST /predict/bbb`
 - `run_eeg_pipeline(input_path)` — wraps `POST /pipeline/eeg`
-- `run_mri_pipeline(input_dir, sites_csv)` — wraps `POST /pipeline/mri`
 - `retrieve_context(query, k)` — wraps `src/rag/retrieve.py`
 The system prompt (`src/agents/prompts.py:ORCHESTRATOR_SYSTEM_PROMPT`)
-locks the workflow: pick exactly one pipeline → run it → formulate a
-focused retrieval query → call retrieve_context → synthesize a
-3-5 sentence response that cites at least one chunk. Language of the
-final response is mirrored from the user's question.
-`POST /agent/run` is the public surface. Default model is
-`google/gemini-2.0-flash-exp:free` on OpenRouter (function-calling
-support verified). Override via `NEUROBRIDGE_AGENT_MODEL` env var.
-Returns 503 when `OPENROUTER_API_KEY` is unset.
 Diagnostics: `GET /diag/agent` returns key presence, configured model,
 RAG index status (chunk count), and the registered tool names.
@@ -345,9 +375,10 @@ user-supplied `.md` / `.txt` / `.pdf`). Build the FAISS index with:
 Defaults: input=`data/knowledge_base/`, output=`data/processed/faiss_index/`.
 The Dockerfile runs this at build time so deployed Spaces start with
-a populated index. Empty KB → empty index → `retrieve_context`
-returns 0 chunks; the agent surfaces this and answers from the
-pipeline result alone.
 `tests/fixtures/kb_sample/` ships 3 seed markdown files (Lipinski,
 ComBat, MNE+ICA) — these double as test fixtures and as the demo

 │   │   ├── storage.py        # Parquet read/write helpers (snappy, single-threaded, deterministic)
 │   │   └── tracking.py       # MLflow `track_pipeline_run` context manager (see §7)
 │   ├── pipelines/            # One file per modality. Pure functions + a `run_pipeline()` entry.
+│   ├── models/               # Downstream decision-layer models
+│   │   ├── bbb_model.py      # BBB-permeability classifier + SHAP explainer + trainer CLI
+│   │   └── mri_model.py      # Volumetric MRI ONNX inference surface (external training)
+│   ├── llm/                  # Natural-language explainers (template + OpenRouter fallback)
+│   ├── rag/                  # Fastembed + FAISS retrieval layer
+│   ├── agents/               # Tool registry + guarded OpenRouter orchestrator
 │   └── frontend/
+│       └── app.py            # Streamlit dashboard
 └── tests/
     ├── core/
     ├── api/
 ## 8. Decision Layer (Downstream Models)
 Pipelines produce features (`data/processed/<modality>_features.parquet`).
+Downstream models live in `src/models/` and consume processed features or a
+deterministic model-local preprocessing contract:
 | Model | File | Output | Endpoint |
 |---|---|---|---|
 | BBB permeability | `src/models/bbb_model.py` | `data/processed/bbb_model.joblib` | `POST /predict/bbb` |
+| MRI image classifier | `src/models/mri_model.py` | `data/processed/mri_model.onnx` | `POST /predict/mri` |
+In-repo trainable downstream model modules expose a uniform surface:
 - `train(df, label_col, ...)` → fitted classifier
 - `save(model, path)` / `load(path)` → joblib artifact I/O
 - `predict_with_proba(model, smiles)` → `{label, confidence}` (confidence is the max-class probability)
 - `explain_prediction(model, smiles, top_k)` → SHAP top-k attributions sorted by `|shap_value|` descending
+MRI DL exception: training happens outside this repo and exports ONNX, so it
+does not expose `train()` or SHAP. Runtime
+loads the ONNX artifact with `mri_model.load()`, preprocesses one NIfTI via the
+same deterministic resize + z-score contract used during training
+(`preprocess_nifti()`), then returns class probabilities via `predict_nifti()`.
+The API loads model artifacts at request time. If an artifact is missing,
+the endpoint returns **HTTP 503** with a remediation hint instead of failing
+process startup. BBB points at the trainer CLI (`python -m src.models.bbb_model`);
+MRI points at the external ONNX export path.
+**Determinism**: all in-repo classifiers are seeded (`random_state=42`
+default), `n_jobs=1` (no tree-parallelism races). Re-running the BBB trainer
+on the same Parquet produces identical predictions. MRI ONNX determinism is
+bounded by the exported model plus the fixed runtime preprocessing contract.
 **Override `BBB_MODEL_PATH`** env var to point the API at a non-default
 artifact location (used by tests for tmp_path isolation).
+**Override `MRI_MODEL_PATH`** env var to point the API at a non-default ONNX
+artifact location. If the ONNX artifact is missing, `POST /predict/mri`
+returns **HTTP 503** with a remediation hint.
 **Calibration metadata** (Day 6): `train()` does an 80/20 stratified split,
 computes precision-at-confidence-threshold bins on the held-out test set,
 and stashes them on `model._neurobridge_calibration: list[dict]` (sorted
 returns a side-by-side metric+param diff (`RunDiffRow`).
 When `NEUROBRIDGE_DISABLE_MLFLOW=1`, both endpoints return empty
+responses without raising — useful for deployments where there is no
+writable `mlruns/` tree or the tracking server is unavailable. Unknown
+run ids → 404.
 The Streamlit "Experiments" tab is the user-facing surface. Cached
 in session state with an explicit Refresh button.
 `Dockerfile.hf` is the Hugging Face Spaces image. Single container,
 two processes (FastAPI :8000 + Streamlit :7860) launched via
 `supervisord.conf`. Build-time `RUN python -m src.models.bbb_model`
+bakes the BBB model artifact into the image so the first `/predict/bbb`
+call is instant on cold start. Build-time RAG ingest creates
+`data/processed/faiss_index/`.
+`docker-entrypoint.sh` is the runtime guard for local Docker/Compose demos:
+when a mounted `./data` volume hides image-built artifacts, it seeds fixture
+raw data, rebuilds missing BBB features/model artifacts, and rebuilds the
+FAISS index before starting supervisord. It does not bake
+`NEUROBRIDGE_DISABLE_MLFLOW=1` into the image; operators may set that env at
+runtime if their tracking service is unavailable.
+Default environment: `DEPLOY_ENV=hf_spaces`. The LLM kill-switch is **not**
+set — deployed Spaces use the real OpenRouter free-tier chain (§11) when
 `OPENROUTER_API_KEY` is configured in the Space's Secrets panel. Set
+`NEUROBRIDGE_DISABLE_LLM=1` only when you want to force the deterministic
+template path for a fully-reproducible demo.
 The README's YAML front-matter declares the Space metadata
 (SDK=docker, port=7860, app_file=src/frontend/app.py).
 ## 15. Orchestrator Agent Surface
 `src/agents/orchestrator.py` exposes a single-agent function-calling
+loop over the openai SDK (no LangChain / framework dep). The API enables
+the guarded workflow mode: if the LLM skips or mis-shapes a required tool
+call, deterministic routing in `src/agents/routing.py` falls back to exactly
+one pipeline tool, then exactly one retrieval tool, then final synthesis.
+The agent holds 4 tools, defined in `src/agents/tools.py`:
 - `run_bbb_pipeline(smiles, top_k)` — wraps `POST /predict/bbb`
 - `run_eeg_pipeline(input_path)` — wraps `POST /pipeline/eeg`
+- `run_mri_pipeline(input_dir, sites_csv=None)` — wraps `POST /pipeline/mri`
+  and defaults `sites_csv` to `<input_dir>/sites.csv`
 - `retrieve_context(query, k)` — wraps `src/rag/retrieve.py`
 The system prompt (`src/agents/prompts.py:ORCHESTRATOR_SYSTEM_PROMPT`)
+describes the workflow: pick exactly one pipeline → run it → formulate a
+focused retrieval query → call retrieve_context → synthesize a 3-5 sentence
+response that cites at least one chunk. The API-side workflow guard enforces
+that order in code; the prompt is guidance, not the only control plane.
+Language of the final response is mirrored from the user's question.
+`POST /agent/run` is the public surface. It accepts `user_input`,
+optional `user_question`, and optional MRI `sites_csv`. Default model is
+`google/gemini-2.0-flash-exp:free` on OpenRouter (function-calling support
+verified). Override via `NEUROBRIDGE_AGENT_MODEL` env var. Returns 503 when
+`OPENROUTER_API_KEY` is unset.
 Diagnostics: `GET /diag/agent` returns key presence, configured model,
 RAG index status (chunk count), and the registered tool names.
 Defaults: input=`data/knowledge_base/`, output=`data/processed/faiss_index/`.
 The Dockerfile runs this at build time so deployed Spaces start with
+a populated index. `docker-entrypoint.sh` also rebuilds the index at
+startup when a mounted `data/` volume hides the image-built artifacts.
+Empty KB → empty index → `retrieve_context` returns 0 chunks; the agent
+surfaces this and answers from the pipeline result alone.
 `tests/fixtures/kb_sample/` ships 3 seed markdown files (Lipinski,
 ComBat, MNE+ICA) — these double as test fixtures and as the demo

Dockerfile CHANGED Viewed

@@ -30,6 +30,8 @@ RUN pip install -r requirements.txt
 COPY src/ ./src/
 COPY tests/fixtures/ ./tests/fixtures/
 COPY supervisord.conf ./supervisord.conf
 # Seed raw data from fixtures so the deployed Signal/Image/Molecule tabs
 # work on first click. Then run all three pipelines so mlruns/ contains
@@ -55,4 +57,5 @@ RUN python -m src.rag.ingest data/knowledge_base data/processed/faiss_index
 EXPOSE 7860
 # --- launch FastAPI + Streamlit under supervisord ---
 CMD ["supervisord", "-n", "-c", "/app/supervisord.conf"]

 COPY src/ ./src/
 COPY tests/fixtures/ ./tests/fixtures/
 COPY supervisord.conf ./supervisord.conf
+COPY docker-entrypoint.sh ./docker-entrypoint.sh
+RUN chmod +x /app/docker-entrypoint.sh
 # Seed raw data from fixtures so the deployed Signal/Image/Molecule tabs
 # work on first click. Then run all three pipelines so mlruns/ contains
 EXPOSE 7860
 # --- launch FastAPI + Streamlit under supervisord ---
+ENTRYPOINT ["/app/docker-entrypoint.sh"]
 CMD ["supervisord", "-n", "-c", "/app/supervisord.conf"]

Dockerfile.hf CHANGED Viewed

@@ -30,6 +30,8 @@ RUN pip install -r requirements.txt
 COPY src/ ./src/
 COPY tests/fixtures/ ./tests/fixtures/
 COPY supervisord.conf ./supervisord.conf
 # Seed raw data from fixtures so the deployed Signal/Image/Molecule tabs
 # work on first click. Then run all three pipelines so mlruns/ contains
@@ -55,4 +57,5 @@ RUN python -m src.rag.ingest data/knowledge_base data/processed/faiss_index
 EXPOSE 7860
 # --- launch FastAPI + Streamlit under supervisord ---
 CMD ["supervisord", "-n", "-c", "/app/supervisord.conf"]

 COPY src/ ./src/
 COPY tests/fixtures/ ./tests/fixtures/
 COPY supervisord.conf ./supervisord.conf
+COPY docker-entrypoint.sh ./docker-entrypoint.sh
+RUN chmod +x /app/docker-entrypoint.sh
 # Seed raw data from fixtures so the deployed Signal/Image/Molecule tabs
 # work on first click. Then run all three pipelines so mlruns/ contains
 EXPOSE 7860
 # --- launch FastAPI + Streamlit under supervisord ---
+ENTRYPOINT ["/app/docker-entrypoint.sh"]
 CMD ["supervisord", "-n", "-c", "/app/supervisord.conf"]

PROJECT_OVERVIEW.md CHANGED Viewed

@@ -9,7 +9,7 @@
 ## 1. Tek Cümleyle Ne Yaptık?
-Üç farklı klinik veri tipini (molekül / EEG sinyali / MRI görüntüsü) tek bir API + tek bir web arayüzü arkasında işleyen, her tahmin için **etiket + güven skoru + kalibrasyon + drift sinyali + MLflow izlenebilirlik bilgisi + doğal-dilde AI açıklaması** döndüren, jüri demosu için olası tüm çökme noktalarını "kill-switch" ile koruyan bir B2B "Living Decision System" inşa ettik.
 Hackathon teması: **"Building AI Systems for Neurotechnology & Health"** — ve jüri 6 boyutta puanlıyor (Problem Depth, System Quality, Robustness, Interaction, Execution, Creativity). Her boyutu spesifik feature'larla cevapladık. Detayları aşağıda.
@@ -106,6 +106,19 @@ Pipeline iki kez çalışır (pre + post ComBat) ve long-format DataFrame döner
 **Neden ComBat?** ComBat orijinal olarak gen ekspresyon batch effect'leri için icat edildi (Johnson et al. 2007), neuroimaging'e (Fortin et al. 2017, 2018) uyarlandı. Empirical Bayes yaklaşımıyla site-bağımlı location + scale parametrelerini öğreniyor, **biological signal'i koruyarak** site bias'ını kaldırıyor. Tek başına z-score normalization farkı tam kapatamaz; ComBat hem mean hem variance'ı düzeltir.
 ---
 ## 4. "Living Decision System" — Yedi Şeffaflık Katmanı
@@ -144,9 +157,11 @@ Bu yedi katman birlikte "Black-Box AI ≠ Trust" mit'ini çürütür: kara kutu
 │  /pipeline/{bbb,eeg,mri}        → batch processing       │
 │  /pipeline/mri/diagnostics      → pre/post ComBat KPIs   │
 │  /predict/bbb                   → single-molecule infer  │
 │  /explain/{bbb,eeg,mri}         → LLM/template rationale │
 │  /experiments/runs              → MLflow run list        │
 │  /experiments/diff              → side-by-side run diff  │
 │  /health                        → liveness check         │
 └─┬────────────┬────────────┬────────────┬─────────────────┘
   │            │            │            │
@@ -156,6 +171,8 @@ bbb_pipeline eeg_pipeline mri_pipeline llm.explainer
 + shap                                  + template fallback
 ```
 ### 5.2 Process Modeli
 Tek Docker container'da supervisord iki process çalıştırıyor:
@@ -169,7 +186,9 @@ Streamlit, container içinde `httpx.post("http://127.0.0.1:8000/...")` ile FastA
 | Veri | Yer | Yaşam süresi |
 |---|---|---|
 | Trained BBB model (joblib) | `data/processed/bbb_model.joblib` | Container build-time'da train, image içinde gömülü |
-| MLflow runs | `mlruns/` (default backend: SQLite) | HF Spaces'te `NEUROBRIDGE_DISABLE_MLFLOW=1` ile devre dışı (filesystem read-only edge case) |
 | Worker drift deque | In-memory (`collections.deque(maxlen=100)`) | Container restart'a kadar; Worker restart = state reset |
 | Streamlit session state | Browser sekmesi | Sekme kapanana kadar |
@@ -182,7 +201,7 @@ Streamlit, container içinde `httpx.post("http://127.0.0.1:8000/...")` ile FastA
 - **Type-safe schemas:** Pydantic v2 ile request/response Pydantic modelleri otomatik validation + 422 errors
 - **OpenAPI auto-generation:** `/docs` endpoint'i jüri'ye Swagger UI sunar, integration documentation ücretsiz
 - **Async-ready:** Bizim use case sync ama gerekirse async pipeline kolayca eklenebilir
-- **Test-friendly:** `fastapi.testclient.TestClient` 175 testin çoğunu destekliyor
 - **Alternatif neden değil:** Flask çok ham (her şeyi elden yazarsın), Django overkill (admin + ORM gereksiz)
 ### 6.2 Frontend: Streamlit
@@ -241,17 +260,26 @@ Streamlit, container içinde `httpx.post("http://127.0.0.1:8000/...")` ile FastA
 - **Self-contained:** Tüm dependencies + kod + data tek image'da
 - **Portable:** Aynı image local'de, HF'de, ileride Railway/AWS'de çalışır
-- **Build-time train:** `RUN python -m src.pipelines.bbb_pipeline && python -m src.models.bbb_model` — model image'a gömülü, cold start instant
 - **Supervisord:** İki process tek container'da minimal overhead
 - **Alternatif neden değil:** docker-compose multi-container çözüm güzel ama HF Spaces tek container istiyor
 ---
 ## 7. Test Disiplini: TDD + Subagent-Driven Development
 ### 7.1 Sayılar
-- **184 test, hepsi yeşil**
 - 8 günlük sprint, ~50 atomik commit
 - Her test-bearing task **RED → GREEN → REFACTOR** disipliniyle yazıldı
 - Her task ayrı bir Subagent (Claude Code) tarafından implementasyon edildi, ana ajan koordine + review
@@ -260,20 +288,22 @@ Streamlit, container içinde `httpx.post("http://127.0.0.1:8000/...")` ile FastA
 ```
 tests/
-├── core/test_logger.py           4 tests (logger idempotency)
 ├── pipelines/
-│   ├── test_bbb_pipeline.py     24 tests (SMILES validation, FP, drop+log, idempotence)
-│   ├── test_eeg_pipeline.py     14 tests (filter, ICA, epoching, feature extraction)
-│   └── test_mri_pipeline.py     42 tests (volume validation, masking, ComBat split, diagnostics)
-├── models/test_bbb_model.py     16 tests (train, save/load, predict, SHAP, calibration, train_stats)
-├── api/
-│   ├── test_main.py              2 tests (FastAPI app boots, /health responds)
-│   └── test_routes.py           14 tests (all 6 routers, error mapping, drift/calibration/provenance)
-├── llm/test_explainer.py         7 tests (template determinism, modality dispatch, kill-switch)
-├── frontend/test_app_import.py   2 tests (Streamlit module imports cleanly)
-└── deploy/test_dockerfile_hf.py  2 tests (Dockerfile.hf well-formed, expected stages present)
-Total: 184 ✓
 ```
 ### 7.3 UserWarning Gate
@@ -289,7 +319,7 @@ pytest -W error::UserWarning tests/
 - Her feature'ın **kabul kriteri** test koduyla yazılıyor → spec ambigüitesi sıfır
 - Implementation-first yapsak refactor cesareti olmaz
 - Subagent dispatch yaparken her task'ın **net bitiş koşulu** var: "tests pass + lint clean"
-- 184 test demosu jüri için "production-aware" sinyali
 ---
@@ -323,17 +353,20 @@ except httpx.HTTPStatusError as e:
 400 → `st.warning` ayrımı önemli: jüri "sistem reject etti ama çökmedi" hikâyesini görsün, kırmızı ERROR yerine sarı WARNING.
-### 8.3 Three Demo Lifelines (kill-switches)
-Demo gününde her şey ters gidebilir. Üç env variable her felaket senaryoyu kurtarır:
 | Env | Etkisi |
 |---|---|
 | `NEUROBRIDGE_DISABLE_LLM=1` | OpenRouter çağrısı yapılmaz, deterministic template path'i her zaman cevap verir |
 | `NEUROBRIDGE_DISABLE_MLFLOW=1` | MLflow lookup yapılmaz, provenance badge "—" gösterir, sistem çalışmaya devam eder |
 | `BBB_MODEL_PATH=...` | Default `data/processed/bbb_model.joblib` yerine farklı yol |
-HF Spaces deploy'ında **sadece** `NEUROBRIDGE_DISABLE_MLFLOW=1` set edilir (filesystem read-only edge case). LLM **default ON** — `Dockerfile` ve `Dockerfile.hf` artık `NEUROBRIDGE_DISABLE_LLM`'i hard-code etmiyor; deployed Space `OPENROUTER_API_KEY` Secret'ı varsa free-tier chain'i kullanır, yoksa template'e düşer. LLM'i jüri demosunda %100 deterministik istersen Space → Settings → Variables → `NEUROBRIDGE_DISABLE_LLM=1` ekle. LLM'in hangi state'te olduğunu canlı görmek için sidebar'daki "🔧 Diagnose LLM" butonu (`GET /diag/openrouter`'a vurur) key presence + chain head + 8-token probe döner.
 ### 8.4 Drift detection
@@ -392,10 +425,10 @@ Bu "Adapt Over Time" katmanı (Living Systems pillar). Sistem **kendi tahminleri
 | Boyut | Puan | Kanıt |
 |---|---|---|
 | **Problem Depth** | 9.5/10 | 3 zor real-world problem (BBB drug-discovery, EEG artifact, MRI multi-site domain shift); slayt 11'in "blood-brain barrier" örneğine doğrudan referans |
-| **System Quality** | 9.7/10 | 184 test, TDD, 50+ atomik commit, FastAPI+Streamlit+MLflow+Docker, error mapping (400/404/422/503), 3 lifeline gate |
 | **Robustness** | 9.5/10 | Edge-case dropdown (5 probe), HTTP 400 → graceful warning, fallback chains everywhere |
 | **Interaction** | 9.8/10 | 5 tab + edge-case probes + calibration caption + drift caption + AI Assistant chat (3 modalite × inline expander + standalone tab) |
-| **Execution** | 9.8/10 | 8-day disciplined sprint, atomic commits, AGENTS.md 14 sections, README executive summary + demo recipe, all DoD checks green |
 | **Creativity** | 9.7/10 | LLM hybrid (template fallback) + drift z-score + ComBat KDE faceted + "Living Decision System" framing + Track-1 multi-modal AI agents + Track-5 Experiments tab |
 | **TOPLAM** | **58.0/60 (~96.8%)** | |
@@ -415,7 +448,7 @@ Bu "Adapt Over Time" katmanı (Living Systems pillar). Sistem **kendi tahminleri
 - ✅ Working Prototype (FastAPI + Streamlit + Docker, end-to-end functional)
 - ✅ Interactive System (5 tab, real-time predictions, custom SMILES, AI Assistant chat)
 - ✅ Explanation of Behavior (7-layer transparency stack)
-- ✅ Tested Under Real Conditions (184 tests + edge-case dropdown probes)
 - ❌ No slides-only — gerçek çalışan sistem
 - ❌ No perfect-data-only — edge-case dropdown bunun kanıtı
@@ -433,6 +466,7 @@ Bu "Adapt Over Time" katmanı (Living Systems pillar). Sistem **kendi tahminleri
 | Day 6 | Edge-case dropdown + calibration metadata + ComBat diagnostics endpoint + altair faceted KDE | 165 |
 | Day 7 | Drift detection (deque + z-score) + MLflow provenance badge + LLM explainer (OpenRouter hybrid) + AI Assistant tab | 175 |
 | Day 8 | Multi-modal explain (`/explain/{eeg,mri}`) + Experiments tab (MLflow runs + diff) + HF Spaces deploy (Dockerfile.hf + supervisord) + README pitch craft | 184 |
 Her gün için ayrı plan ve spec dosyası: `docs/superpowers/plans/` ve `docs/superpowers/specs/`.
@@ -498,8 +532,8 @@ HF free tier'da değil. Production'da:
 - Drift deque per-worker olduğu için 4 worker = 4 bağımsız buffer (production'da Redis sentinel'a çekilir)
 - ComBat batch (~500 subject/dakika single-thread, vectorize edilmiş)
-### "Test sayısı 184 ama nasıl?"
-TDD disipliniyle her feature'a 2-4 test yazıldı. Pipeline'lar fixture-driven (synthetic NIfTI, sample SMILES CSV, sample EEG FIF). API testleri `fastapi.testclient.TestClient` üzerinden (no real network). LLM testleri env-gated (kill-switch ile force-template path).
 ---
@@ -527,25 +561,36 @@ hackathon/
 ├── src/
 │   ├── api/                    # FastAPI app + routes + schemas
 │   │   ├── main.py
-│   │   ├── routes.py           # 4 routers: pipeline, predict, explain, experiments
-│   │   └── schemas.py          # 15+ Pydantic models
 │   ├── core/
-│   │   └── logger.py           # Structured logging (no print())
 │   ├── pipelines/
 │   │   ├── bbb_pipeline.py     # SMILES → Morgan FP → Parquet
 │   │   ├── eeg_pipeline.py     # FIF/EDF → ICA → epochs → features
 │   │   └── mri_pipeline.py     # NIfTI → ROI → ComBat → diagnostics
 │   ├── models/
-│   │   └── bbb_model.py        # RF train + SHAP + calibration + train_stats
 │   ├── llm/
 │   │   ├── __init__.py
 │   │   └── explainer.py        # OpenRouter + deterministic template fallback
 │   └── frontend/
 │       └── app.py              # Streamlit 5-tab dashboard (editorial redesign)
-├── tests/                      # 184 tests across 9 modules
 ├── data/
 │   ├── raw/                    # Input data (gitignored)
-│   └── processed/              # Pipeline outputs + trained model joblib
 ├── docs/
 │   └── superpowers/
 │       ├── plans/              # 8 day-by-day implementation plans
@@ -554,7 +599,7 @@ hackathon/
 ├── Dockerfile                  # Alias for HF (auto-discovery)
 ├── supervisord.conf            # Two-process launcher
 ├── requirements.txt            # Pinned deps (fastapi==0.115, sklearn==1.5.1, openai==1.51, ...)
-├── AGENTS.md                   # Team contract — 14 sections
 ├── README.md                   # Public-facing overview + Demo Recipe
 └── PROJECT_OVERVIEW.md         # This file
 ```
@@ -667,7 +712,7 @@ Yukarıdaki bölümlerde geçen teknik terimlerin sade Türkçe karşılıkları
 ## 17. Kapanış
-NeuroBridge Enterprise hackathon'un sloganına ("**Stop Building Ideas. Start Building Systems.**") en doğrudan cevap. 8 gün boyunca disiplinli TDD + Subagent-Driven Development ile inşa ettik. Public deploy'lu, jüri tarayıcıdan tıklayıp dokunabiliyor. 184 test green, 96.8% jüri skoru projeksiyonu, 5/5 hackathon track strong, 4/4 Living Systems pillar full.
 Şampiyonluğa oynuyoruz.

 ## 1. Tek Cümleyle Ne Yaptık?
+Üç farklı klinik veri tipini (molekül / EEG sinyali / MRI görüntüsü) tek bir API + tek bir web arayüzü arkasında işleyen, her tahmin için **etiket + güven skoru + kalibrasyon + drift sinyali + MLflow izlenebilirlik bilgisi + doğal-dilde AI açıklaması** döndüren, MRI tarafında dışarıda eğitilecek volumetrik deep-learning modelini ONNX üzerinden `POST /predict/mri` ile sisteme bağlayan, RAG destekli agent yüzeyiyle pipeline araçlarını orkestre eden ve jüri demosu için olası çökme noktalarını "kill-switch" ile koruyan bir B2B "Living Decision System" inşa ettik.
 Hackathon teması: **"Building AI Systems for Neurotechnology & Health"** — ve jüri 6 boyutta puanlıyor (Problem Depth, System Quality, Robustness, Interaction, Execution, Creativity). Her boyutu spesifik feature'larla cevapladık. Detayları aşağıda.
 **Neden ComBat?** ComBat orijinal olarak gen ekspresyon batch effect'leri için icat edildi (Johnson et al. 2007), neuroimaging'e (Fortin et al. 2017, 2018) uyarlandı. Empirical Bayes yaklaşımıyla site-bağımlı location + scale parametrelerini öğreniyor, **biological signal'i koruyarak** site bias'ını kaldırıyor. Tek başına z-score normalization farkı tam kapatamaz; ComBat hem mean hem variance'ı düzeltir.
+### 3.4 MRI Image Deep Learning Modeli (External Training → ONNX)
+MRI için eğiteceğimiz deep-learning model bu repoda train edilmiyor. Eğitim ayrı GPU ortamında yapılacak, export edilen ONNX artifact'i NeuroBridge runtime'a takılacak:
+- Artifact yolu: `data/processed/mri_model.onnx`
+- Override: `MRI_MODEL_PATH=/path/to/model.onnx`
+- Input: `.nii` / `.nii.gz` NIfTI volume
+- Preprocess: 3D finite-volume validation → trilinear resize (`64×64×64` default) → non-zero voxel z-score normalization → `[1, 1, D, H, W]` float32 tensor
+- Output: `[1, C]` class vector; logits veya probability kabul edilir
+- API: `POST /predict/mri`
+Bu ayrım önemli: `src/pipelines/mri_pipeline.py` çok-merkezli MRI verisini temizleyip ComBat ile harmonize eder; `src/models/mri_model.py` ise klinik sınıflandırma için dışarıda eğitilmiş volumetrik modeli inference aşamasında çalıştırır. Artifact yoksa endpoint HTTP 503 döner ve operatöre ONNX export yolunu söyler.
 ---
 ## 4. "Living Decision System" — Yedi Şeffaflık Katmanı
 │  /pipeline/{bbb,eeg,mri}        → batch processing       │
 │  /pipeline/mri/diagnostics      → pre/post ComBat KPIs   │
 │  /predict/bbb                   → single-molecule infer  │
+│  /predict/mri                   → volumetric ONNX infer  │
 │  /explain/{bbb,eeg,mri}         → LLM/template rationale │
 │  /experiments/runs              → MLflow run list        │
 │  /experiments/diff              → side-by-side run diff  │
+│  /agent/run                     → pipeline tools + RAG   │
 │  /health                        → liveness check         │
 └─┬────────────┬────────────┬────────────┬─────────────────┘
   │            │            │            │
 + shap                                  + template fallback
 ```
+Agent yüzeyi (`src/agents/orchestrator.py`) LLM function-calling'i dener; model tool çağrısını atlar veya yanlış sıraya girerse guarded workflow devreye girer. Deterministik router (`src/agents/routing.py`) bir pipeline seçer, ilgili tool'u çalıştırır, `retrieve_context` ile FAISS/RAG bağlamını alır ve son sentezi yine aynı API kontratıyla döndürür.
 ### 5.2 Process Modeli
 Tek Docker container'da supervisord iki process çalıştırıyor:
 | Veri | Yer | Yaşam süresi |
 |---|---|---|
 | Trained BBB model (joblib) | `data/processed/bbb_model.joblib` | Container build-time'da train, image içinde gömülü |
+| MRI DL model (ONNX) | `data/processed/mri_model.onnx` veya `MRI_MODEL_PATH` | Dış eğitim ortamından export edilir; runtime yalnızca load + inference yapar |
+| RAG FAISS index | `data/processed/faiss_index/` | Build-time ingest + container startup guard ile eksikse yeniden oluşturulur |
+| MLflow runs | `mlruns/` (default backend: SQLite) | Runtime ortamına bağlı; `NEUROBRIDGE_DISABLE_MLFLOW=1` ile kapatılabilir |
 | Worker drift deque | In-memory (`collections.deque(maxlen=100)`) | Container restart'a kadar; Worker restart = state reset |
 | Streamlit session state | Browser sekmesi | Sekme kapanana kadar |
 - **Type-safe schemas:** Pydantic v2 ile request/response Pydantic modelleri otomatik validation + 422 errors
 - **OpenAPI auto-generation:** `/docs` endpoint'i jüri'ye Swagger UI sunar, integration documentation ücretsiz
 - **Async-ready:** Bizim use case sync ama gerekirse async pipeline kolayca eklenebilir
+- **Test-friendly:** `fastapi.testclient.TestClient` API testlerinin çoğunu gerçek network olmadan çalıştırıyor
 - **Alternatif neden değil:** Flask çok ham (her şeyi elden yazarsın), Django overkill (admin + ORM gereksiz)
 ### 6.2 Frontend: Streamlit
 - **Self-contained:** Tüm dependencies + kod + data tek image'da
 - **Portable:** Aynı image local'de, HF'de, ileride Railway/AWS'de çalışır
+- **Build-time artifacts:** BBB model train edilir, RAG index ingest edilir; cold start'ta ana demo artifact'leri hazır gelir
+- **Runtime guard:** `docker-entrypoint.sh`, host volume boş geldiyse fixture data'dan BBB modeli ve FAISS index'i yeniden üretir
 - **Supervisord:** İki process tek container'da minimal overhead
 - **Alternatif neden değil:** docker-compose multi-container çözüm güzel ama HF Spaces tek container istiyor
+### 6.9 Agent + RAG
+- **Tool-first orchestration:** Agent'ın kullanabildiği tool'lar pipeline surface'in aynısı: BBB predict, EEG pipeline, MRI pipeline ve RAG retrieval.
+- **Guarded workflow:** LLM function-calling başarısız olursa API deterministik router ile pipeline → retrieval → synthesis sırasını zorlar; demo sırasında "agent tool çağırmadı" riski kalmaz.
+- **RAG stack:** `fastembed` (`BAAI/bge-small-en-v1.5`, 384 dim) + `faiss-cpu` (`IndexFlatIP`, L2 normalize edilmiş cosine search). Torch bağımlılığı yok.
+- **Knowledge base:** `data/knowledge_base/` altındaki `.md`, `.txt`, `.pdf` dosyaları `python -m src.rag.ingest` ile `data/processed/faiss_index/` altına yazılır.
+- **Default agent model:** `NEUROBRIDGE_AGENT_MODEL` override edilebilir; `OPENROUTER_API_KEY` yoksa `/agent/run` HTTP 503 döner.
 ---
 ## 7. Test Disiplini: TDD + Subagent-Driven Development
 ### 7.1 Sayılar
+- **242 passed, 2 skipped** (Windows / Python 3.11 doğrulaması)
 - 8 günlük sprint, ~50 atomik commit
 - Her test-bearing task **RED → GREEN → REFACTOR** disipliniyle yazıldı
 - Her task ayrı bir Subagent (Claude Code) tarafından implementasyon edildi, ana ajan koordine + review
 ```
 tests/
+├── core/                         logger, storage, tracking, determinism
 ├── pipelines/
+│   ├── test_bbb_pipeline.py      SMILES validation, FP, drop+log, idempotence
+│   ├── test_eeg_pipeline.py      filter, ICA, epoching, feature extraction
+│   └── test_mri_pipeline.py      volume validation, masking, ComBat split, diagnostics
+├── models/
+│   ├── test_bbb_model.py         train, save/load, predict, SHAP, calibration, train_stats
+│   └── test_mri_model.py         NIfTI preprocess + ONNX inference contract
+├── api/                          route contracts, error mapping, drift/calibration/provenance
+���── llm/                          template determinism, modality dispatch, kill-switch
+├── rag/                          ingest, empty-index behavior, retrieval
+├── agents/                       tool schemas, guarded orchestration, agent route
+├── frontend/                     Streamlit module import smoke
+└── deploy/                       Dockerfile.hf / startup contract
+Total: 242 passed, 2 skipped
 ```
 ### 7.3 UserWarning Gate
 - Her feature'ın **kabul kriteri** test koduyla yazılıyor → spec ambigüitesi sıfır
 - Implementation-first yapsak refactor cesareti olmaz
 - Subagent dispatch yaparken her task'ın **net bitiş koşulu** var: "tests pass + lint clean"
+- 242 passed / 2 skipped demosu jüri için "production-aware" sinyali
 ---
 400 → `st.warning` ayrımı önemli: jüri "sistem reject etti ama çökmedi" hikâyesini görsün, kırmızı ERROR yerine sarı WARNING.
+### 8.3 Demo Lifelines (kill-switches + artifact overrides)
+Demo gününde her şey ters gidebilir. Bu env variable'lar kritik senaryoları kontrollü hale getirir:
 | Env | Etkisi |
 |---|---|
 | `NEUROBRIDGE_DISABLE_LLM=1` | OpenRouter çağrısı yapılmaz, deterministic template path'i her zaman cevap verir |
 | `NEUROBRIDGE_DISABLE_MLFLOW=1` | MLflow lookup yapılmaz, provenance badge "—" gösterir, sistem çalışmaya devam eder |
 | `BBB_MODEL_PATH=...` | Default `data/processed/bbb_model.joblib` yerine farklı yol |
+| `MRI_MODEL_PATH=...` | Default `data/processed/mri_model.onnx` yerine dışarıda eğitilen ONNX artifact yolu |
+| `OPENROUTER_API_KEY=...` | LLM explainer ve orchestrator agent'ı gerçek OpenRouter çağrılarıyla açar |
+| `NEUROBRIDGE_AGENT_MODEL=...` | Agent'ın OpenRouter modelini override eder |
+Docker image artık `NEUROBRIDGE_DISABLE_MLFLOW=1` değerini hard-code etmez; operatör ortamına göre açar/kapatır. LLM **default ON** — `Dockerfile` ve `Dockerfile.hf` `NEUROBRIDGE_DISABLE_LLM`'i hard-code etmiyor; deployed Space `OPENROUTER_API_KEY` Secret'ı varsa free-tier chain'i kullanır, yoksa template'e düşer. LLM'i jüri demosunda %100 deterministik istersen Space → Settings → Variables → `NEUROBRIDGE_DISABLE_LLM=1` ekle. LLM'in hangi state'te olduğunu canlı görmek için sidebar'daki "🔧 Diagnose LLM" butonu (`GET /diag/openrouter`'a vurur) key presence + chain head + 8-token probe döner.
 ### 8.4 Drift detection
 | Boyut | Puan | Kanıt |
 |---|---|---|
 | **Problem Depth** | 9.5/10 | 3 zor real-world problem (BBB drug-discovery, EEG artifact, MRI multi-site domain shift); slayt 11'in "blood-brain barrier" örneğine doğrudan referans |
+| **System Quality** | 9.7/10 | 242 passed / 2 skipped, TDD, 50+ atomik commit, FastAPI+Streamlit+MLflow+Docker, error mapping (400/404/422/503), lifeline gates |
 | **Robustness** | 9.5/10 | Edge-case dropdown (5 probe), HTTP 400 → graceful warning, fallback chains everywhere |
 | **Interaction** | 9.8/10 | 5 tab + edge-case probes + calibration caption + drift caption + AI Assistant chat (3 modalite × inline expander + standalone tab) |
+| **Execution** | 9.8/10 | 8-day disciplined sprint + post-Day-8 hardening, atomic commits, AGENTS.md contract, README executive summary + demo recipe, all DoD checks green |
 | **Creativity** | 9.7/10 | LLM hybrid (template fallback) + drift z-score + ComBat KDE faceted + "Living Decision System" framing + Track-1 multi-modal AI agents + Track-5 Experiments tab |
 | **TOPLAM** | **58.0/60 (~96.8%)** | |
 - ✅ Working Prototype (FastAPI + Streamlit + Docker, end-to-end functional)
 - ✅ Interactive System (5 tab, real-time predictions, custom SMILES, AI Assistant chat)
 - ✅ Explanation of Behavior (7-layer transparency stack)
+- ✅ Tested Under Real Conditions (242 passed / 2 skipped + edge-case dropdown probes)
 - ❌ No slides-only — gerçek çalışan sistem
 - ❌ No perfect-data-only — edge-case dropdown bunun kanıtı
 | Day 6 | Edge-case dropdown + calibration metadata + ComBat diagnostics endpoint + altair faceted KDE | 165 |
 | Day 7 | Drift detection (deque + z-score) + MLflow provenance badge + LLM explainer (OpenRouter hybrid) + AI Assistant tab | 175 |
 | Day 8 | Multi-modal explain (`/explain/{eeg,mri}`) + Experiments tab (MLflow runs + diff) + HF Spaces deploy (Dockerfile.hf + supervisord) + README pitch craft | 184 |
+| Day 9 | Agent/RAG hardening + guarded orchestration + Docker startup guard + Windows-safe MLflow tests + MRI ONNX decision layer (`/predict/mri`) | 242 passed, 2 skipped |
 Her gün için ayrı plan ve spec dosyası: `docs/superpowers/plans/` ve `docs/superpowers/specs/`.
 - Drift deque per-worker olduğu için 4 worker = 4 bağımsız buffer (production'da Redis sentinel'a çekilir)
 - ComBat batch (~500 subject/dakika single-thread, vectorize edilmiş)
+### "Test sayısı 242 passed / 2 skipped ama nasıl?"
+TDD disipliniyle her feature'a 2-4 test yazıldı. Pipeline'lar fixture-driven (synthetic NIfTI, sample SMILES CSV, sample EEG FIF). API testleri `fastapi.testclient.TestClient` üzerinden (no real network). LLM ve agent testleri env-gated; RAG testleri fixture knowledge base ile çalışır; MRI ONNX kontratı dummy ONNX artifact ile doğrulanır.
 ---
 ├── src/
 │   ├── api/                    # FastAPI app + routes + schemas
 │   │   ├── main.py
+│   │   ├── routes.py           # pipeline, predict, explain, experiments, agent routers
+│   │   └── schemas.py          # Pydantic request/response contracts
 │   ├── core/
+│   │   ├── logger.py           # Structured logging (no print())
+│   │   ├── storage.py          # Deterministic Parquet helpers
+│   │   └── tracking.py         # MLflow tracking context
 │   ├── pipelines/
 │   │   ├── bbb_pipeline.py     # SMILES → Morgan FP → Parquet
 │   │   ├── eeg_pipeline.py     # FIF/EDF → ICA → epochs → features
 │   │   └── mri_pipeline.py     # NIfTI → ROI → ComBat → diagnostics
 │   ├── models/
+│   │   ├── bbb_model.py        # RF train + SHAP + calibration + train_stats
+│   │   └── mri_model.py        # External ONNX MRI inference surface
 │   ├── llm/
 │   │   ├── __init__.py
 │   │   └── explainer.py        # OpenRouter + deterministic template fallback
+│   ├── rag/
+│   │   ├── ingest.py           # KB → chunks + FAISS index
+│   │   └── retrieve.py         # Top-k retrieval API
+│   ├── agents/
+│   │   ├── orchestrator.py     # OpenRouter function-calling + guarded workflow
+│   │   ├── routing.py          # Deterministic pipeline/query routing fallback
+│   │   └── tools.py            # Pipeline/RAG tool registry
 │   └── frontend/
 │       └── app.py              # Streamlit 5-tab dashboard (editorial redesign)
+├── tests/                      # 242 passed, 2 skipped across core/api/pipelines/models/rag/agents
 ├── data/
 │   ├── raw/                    # Input data (gitignored)
+│   ├── knowledge_base/         # User-supplied RAG docs (gitignored)
+│   └── processed/              # Pipeline outputs + model artifacts + FAISS index
 ├── docs/
 │   └── superpowers/
 │       ├── plans/              # 8 day-by-day implementation plans
 ├── Dockerfile                  # Alias for HF (auto-discovery)
 ├── supervisord.conf            # Two-process launcher
 ├── requirements.txt            # Pinned deps (fastapi==0.115, sklearn==1.5.1, openai==1.51, ...)
+├── AGENTS.md                   # Team contract
 ├── README.md                   # Public-facing overview + Demo Recipe
 └── PROJECT_OVERVIEW.md         # This file
 ```
 ## 17. Kapanış
+NeuroBridge Enterprise hackathon'un sloganına ("**Stop Building Ideas. Start Building Systems.**") en doğrudan cevap. 8 günlük sprint sonrası agent/RAG hardening ve MRI ONNX decision layer ile sistemi daha ileri taşıdık. Public deploy'lu, jüri tarayıcıdan tıklayıp dokunabiliyor. 242 passed / 2 skipped, 96.8% jüri skoru projeksiyonu, 5/5 hackathon track strong, 4/4 Living Systems pillar full.
 Şampiyonluğa oynuyoruz.

README.md CHANGED Viewed

@@ -19,26 +19,27 @@ short_description: Living decision system for BBB, EEG, and MRI clinical ML
 **1.** Multi-site clinical ML pipelines fail in production because they assume clean data, single-site distributions, and black-box trust — all of which break in real labs. NeuroBridge Enterprise is the *living decision system* that closes those three gaps end-to-end across BBB drug-screening, EEG signal-cleaning, and MRI multi-site harmonization.
-**2.** Three production pipelines (RDKit + Morgan, MNE+ICA, neuroHarmonize ComBat) sit behind one FastAPI surface and one Streamlit dashboard, with a Random Forest BBB classifier on top — every inference returns label + confidence + 6-bin precision-at-threshold calibration + top-k SHAP attributions + drift z-score + MLflow provenance + an LLM/template natural-language rationale.
 **3.** Robustness is demoed live: a curated edge-case dropdown probes invalid SMILES, OOD molecules, and boundary inputs — the system never crashes, always degrades gracefully (HTTP 400 → recoverable warning, low confidence + lower drift score, calibration caption hedge).
 **4.** Adapt-Over-Time is built in: each FastAPI worker keeps a rolling 100-prediction window; the trailing median is z-scored against the train-time confidence distribution and surfaced both in the API response and the UI ("trailing-100 confidence median is +1.42σ from training distribution — mild distribution shift").
-**5.** 184 tests green, 8-day disciplined sprint, ~30 atomic commits, three demo lifelines (`NEUROBRIDGE_DISABLE_MLFLOW=1`, `NEUROBRIDGE_DISABLE_LLM=1`, `BBB_MODEL_PATH` env) so the system is jury-day bulletproof. Public-deployable on Hugging Face Spaces with one push.
 ## Status
 | Day | Modality | Pipeline | Status |
 |-----|----------|----------|--------|
-| 1 | Tabular (BBB / molecules) | [`bbb_pipeline.py`](src/pipelines/bbb_pipeline.py) | Shipped — 30 tests green |
-| 2 | Signal (EEG) | [`eeg_pipeline.py`](src/pipelines/eeg_pipeline.py) | Shipped — 67 tests green |
-| 3 | Image (MRI / fMRI) | [`mri_pipeline.py`](src/pipelines/mri_pipeline.py) | Shipped — 106 tests green |
-| 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
-| 5 | Decision Layer (Model + XAI + Interactive UI) | [`bbb_model.py`](src/models/bbb_model.py) — RandomForest + SHAP + `POST /predict/bbb` | Shipped — 158 tests green |
-| 6 | Final Polish & Demo Features (Edge cases + Calibration + ComBat viz) | Calibration metadata + edge-case probes + `POST /pipeline/mri/diagnostics` | Shipped — 165 tests green |
-| 7 | Final 5% (Drift, Traceability & Agents) | Per-worker drift z-score + MLflow provenance badge + `POST /explain/bbb` (LLM + template fallback) + AI Assistant tab | Shipped — 175 tests green |
-| Day 8 — The Grand Finale (Multi-Modal Agents, Track 5 & Public Deploy) | Shipped — 184 tests green |
 ## Quick Start
@@ -49,7 +50,7 @@ short_description: Living decision system for BBB, EEG, and MRI clinical ML
 # 1. Create venv and install
 python3.12 -m venv .venv312 && source .venv312/bin/activate && pip install -r requirements.txt
-# 2. Verify — expect 106 passed
 pytest -v
 # 3. Smoke run with the bundled 6-row fixture
@@ -99,6 +100,37 @@ curl -s -X POST http://localhost:8000/predict/bbb \
   -d '{"smiles": "CCO", "top_k": 5}' | python3 -m json.tool
 ```
 ### Run the full stack with Docker
 ```bash
@@ -112,6 +144,22 @@ Then browse to:
 Live-demo robustness: if the MLflow service is unreachable, set `NEUROBRIDGE_DISABLE_MLFLOW=1` to make the pipelines run without tracking.
 ## Repository Layout
 ```text
@@ -126,14 +174,21 @@ Live-demo robustness: if the MLflow service is unreachable, set `NEUROBRIDGE_DIS
 │   └── processed/            # Parquet outputs from pipelines; gitignored
 ├── docs/superpowers/plans/   # Per-day implementation plans
 ├── src/
-│   ├── core/logger.py        # Shared structured logger (mandatory in every pipeline)
 │   ├── pipelines/
 │   │   ├── bbb_pipeline.py   # Day-1 pipeline (4 public funcs + CLI entry)
 │   │   ├── eeg_pipeline.py   # Day-2 pipeline (6 public funcs + CLI entry)
 │   │   └── mri_pipeline.py   # Day-3 pipeline (5 public funcs + CLI entry)
-│   └── api/                  # FastAPI surface (placeholder until Day 4+)
 └── tests/
-    ├── core/, pipelines/     # Mirror src/ structure
     └── fixtures/          # bbbp_sample.csv, eeg_sample.fif, mri_sample/ + build_*.py
 ```
@@ -175,6 +230,23 @@ The pipeline is seeded (`random_state=97`) and produces byte-identical Parquet o
 Output schema: one row per surviving subject with columns `subject_id, site, feat_roi{i}_<stat>` (8 ROIs × 6 stats = 48 features). All `feat_*` are float64 (preserved through the Parquet round-trip).
 ## Storage Format
 Pipeline outputs are written as Parquet files using the `pyarrow` engine with snappy
@@ -186,16 +258,17 @@ for the `float64` EEG features Day 2 produces. See AGENTS.md §6.
 All pipeline functions and the shared logger were built TDD-first across Days 1–3 (RED → GREEN →
 REFACTOR). Each task ended in a green commit; review-and-fix loops landed as separate
-commits with `fix:` / `refactor:` prefixes. Run `pytest -v` at any time — the full suite
-finishes in under 4 seconds on a 2024 laptop.
 ## Roadmap
 - **Day 2 (shipped):** `eeg_pipeline.py` — bandpass + MNE ICA artifact removal + PSD + statistical features → Parquet.
-- **Day 3 (shipped):** `mri_pipeline.py` — NIfTI volume loading, brain masking, ROI feature extraction, ComBat harmonization (`neuroHarmonize`) for site-level domain shift → Parquet (48 features, 106 tests green).
-- **Day 4 (shipped):** FastAPI surface in `src/api/` (POST `/pipeline/{bbb,eeg,mri}` + `/health`), MLflow experiment tracking via `src.core.tracking` (see AGENTS.md §7), Streamlit dashboard at `src/frontend/app.py`, and Docker / `docker-compose.yml` for the api + MLflow stack — 142 tests green.
-- **Day 5 (shipped):** Decision layer in `src/models/bbb_model.py` — RandomForest BBB classifier on Morgan fingerprints, SHAP top-k explanations, `POST /predict/bbb` endpoint, interactive Streamlit BBB tab with SMILES input + decision card + SHAP bar chart, and trainer CLI (`python -m src.models.bbb_model`). See AGENTS.md §8 — 158 tests green.
-- **Day 6 (shipped):** Final polish & demo features — calibration metadata bins on the BBB classifier (precision-at-confidence in `BBBPredictResponse.calibration`), edge-case dropdown in the Streamlit BBB tab (5 curated robustness probes), trust caption on the decision card, and `POST /pipeline/mri/diagnostics` returning Pre/Post ComBat long-format data + site-gap KPIs visualized as a faceted altair KDE in the MRI tab. See AGENTS.md §8 (calibration) + §9 (demo features) — 165 tests green.
 ## Where to Look
@@ -214,7 +287,8 @@ finishes in under 4 seconds on a 2024 laptop.
 - **Container stack:** [`Dockerfile`](Dockerfile), [`docker-compose.yml`](docker-compose.yml)
 - **Day-4 tests:** [`tests/api/`](tests/api/), [`tests/frontend/`](tests/frontend/), [`tests/pipelines/test_cross_pipeline_smoke.py`](tests/pipelines/test_cross_pipeline_smoke.py)
 - **Day-5 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md`](docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md)
-- **BBB downstream model (classifier + SHAP explainer + trainer CLI):** [`src/models/bbb_model.py`](src/models/bbb_model.py) + [`tests/models/test_bbb_model.py`](tests/models/test_bbb_model.py) (12 tests)
 - **Day-6 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md`](docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md)
 - **MRI ComBat diagnostics surface (pre/post site-gap KPIs):** `POST /pipeline/mri/diagnostics` — see [`src/api/routes.py`](src/api/routes.py) + [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py)
 - **Day-7 design spec:** [`docs/superpowers/specs/2026-05-05-day7-drift-traceability-agents-design.md`](docs/superpowers/specs/2026-05-05-day7-drift-traceability-agents-design.md)
@@ -225,10 +299,10 @@ finishes in under 4 seconds on a 2024 laptop.
 - **New surfaces:** `POST /explain/eeg`, `POST /explain/mri`, `GET /experiments/runs`, `POST /experiments/diff`
 - **New deploy artifacts:** `Dockerfile.hf`, `supervisord.conf`
 - **LLM hardening (post-Day 8):** real OpenRouter LLM is now the default in deployed Spaces — `Dockerfile`/`Dockerfile.hf` no longer hard-code `NEUROBRIDGE_DISABLE_LLM=1`. Free-tier fallback chain (10 models, smartest → smallest) in [`src/llm/explainer.py`](src/llm/explainer.py), 401/400 status classification, and language-matching / intent-split prompt. Diagnostic endpoint `GET /diag/openrouter` ([`src/api/main.py`](src/api/main.py)) + Streamlit sidebar "🔧 Diagnose LLM" button. Live verification helper: [`scripts/diagnose_openrouter.py`](scripts/diagnose_openrouter.py).
-- **Orchestrator agent (Task 13):** [`src/agents/orchestrator.py`](src/agents/orchestrator.py), [`src/agents/tools.py`](src/agents/tools.py), [`src/agents/prompts.py`](src/agents/prompts.py)
 - **RAG layer:** [`src/rag/`](src/rag/) — chunker, embedder (fastembed), FAISS store, retriever, ingest CLI
 - **Agent endpoint:** `POST /agent/run` (orchestrator + RAG); diagnostic at `GET /diag/agent`
-- **Streamlit Agent tab:** "🤖 Agent" tab in [`src/frontend/app.py`](src/frontend/app.py) — input box + decision-trace expander
 - **RAG knowledge base:** drop `.md`/`.pdf` into [`data/knowledge_base/`](data/knowledge_base/) — see its README
 ## Day 7 — Demo Recipe

 **1.** Multi-site clinical ML pipelines fail in production because they assume clean data, single-site distributions, and black-box trust — all of which break in real labs. NeuroBridge Enterprise is the *living decision system* that closes those three gaps end-to-end across BBB drug-screening, EEG signal-cleaning, and MRI multi-site harmonization.
+**2.** Three production pipelines (RDKit + Morgan, MNE+ICA, neuroHarmonize ComBat) sit behind one FastAPI surface and one Streamlit dashboard, with decision layers on top: a Random Forest BBB classifier today and an MRI image ONNX inference surface ready for an externally-trained volumetric deep-learning model. The agent surface can route a user request to exactly one pipeline tool, retrieve FAISS-backed context, and synthesize a cited answer.
 **3.** Robustness is demoed live: a curated edge-case dropdown probes invalid SMILES, OOD molecules, and boundary inputs — the system never crashes, always degrades gracefully (HTTP 400 → recoverable warning, low confidence + lower drift score, calibration caption hedge).
 **4.** Adapt-Over-Time is built in: each FastAPI worker keeps a rolling 100-prediction window; the trailing median is z-scored against the train-time confidence distribution and surfaced both in the API response and the UI ("trailing-100 confidence median is +1.42σ from training distribution — mild distribution shift").
+**5.** Current verification: 242 passed, 2 skipped. Demo lifelines (`NEUROBRIDGE_DISABLE_MLFLOW=1`, `NEUROBRIDGE_DISABLE_LLM=1`, `BBB_MODEL_PATH`, `MRI_MODEL_PATH`) keep the system usable when MLflow, OpenRouter, or model artifacts are unavailable.
 ## Status
 | Day | Modality | Pipeline | Status |
 |-----|----------|----------|--------|
+| 1 | Tabular (BBB / molecules) | [`bbb_pipeline.py`](src/pipelines/bbb_pipeline.py) | Shipped |
+| 2 | Signal (EEG) | [`eeg_pipeline.py`](src/pipelines/eeg_pipeline.py) | Shipped |
+| 3 | Image (MRI / fMRI) | [`mri_pipeline.py`](src/pipelines/mri_pipeline.py) | Shipped |
+| 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped |
+| 5 | Decision Layer (Model + XAI + Interactive UI) | [`bbb_model.py`](src/models/bbb_model.py) — RandomForest + SHAP + `POST /predict/bbb` | Shipped |
+| 6 | Final Polish & Demo Features (Edge cases + Calibration + ComBat viz) | Calibration metadata + edge-case probes + `POST /pipeline/mri/diagnostics` | Shipped |
+| 7 | Final 5% (Drift, Traceability & Agents) | Per-worker drift z-score + MLflow provenance badge + `POST /explain/bbb` (LLM + template fallback) + AI Assistant tab | Shipped |
+| 8 | Grand Finale (Multi-Modal Agents, Track 5 & Public Deploy) | Multi-modal explainers + experiments + deploy surface | Shipped |
+| 9 | Agent/RAG hardening + MRI DL decision layer | Guarded orchestration + `POST /predict/mri` ONNX surface | Shipped — 242 passed, 2 skipped |
 ## Quick Start
 # 1. Create venv and install
 python3.12 -m venv .venv312 && source .venv312/bin/activate && pip install -r requirements.txt
+# 2. Verify — current full suite: 242 passed, 2 skipped
 pytest -v
 # 3. Smoke run with the bundled 6-row fixture
   -d '{"smiles": "CCO", "top_k": 5}' | python3 -m json.tool
 ```
+### Add the MRI image deep-learning model
+MRI deep-learning training happens outside this repository. Export the trained
+volumetric model to ONNX and place it at:
+```text
+data/processed/mri_model.onnx
+```
+The runtime contract is:
+- Input file: one `.nii` / `.nii.gz` MRI volume.
+- Preprocess: trilinear resize to `target_shape` (default `[64, 64, 64]`), z-score normalization over non-zero voxels, then tensor shape `[1, 1, D, H, W]`.
+- ONNX output: one class vector `[1, C]`, either logits or probabilities.
+- Override artifact path with `MRI_MODEL_PATH=/path/to/model.onnx`.
+Try the endpoint after adding the artifact:
+```bash
+curl -s -X POST http://localhost:8000/predict/mri \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "input_path": "tests/fixtures/mri_sample/subject_0.nii.gz",
+    "target_shape": [64, 64, 64],
+    "label_names": ["control", "abnormal"]
+  }' | python3 -m json.tool
+```
+If the ONNX artifact is missing, the endpoint returns HTTP 503 with a
+remediation hint instead of crashing.
 ### Run the full stack with Docker
 ```bash
 Live-demo robustness: if the MLflow service is unreachable, set `NEUROBRIDGE_DISABLE_MLFLOW=1` to make the pipelines run without tracking.
+The container startup script also protects local demos with a mounted `./data`
+directory: if the host volume is empty, it seeds fixture data, trains the BBB
+model artifact, and builds the RAG FAISS index before launching the app.
+## Runtime Configuration
+| Variable | Purpose |
+|---|---|
+| `BBB_MODEL_PATH` | Override the BBB joblib artifact path (`data/processed/bbb_model.joblib`). |
+| `MRI_MODEL_PATH` | Override the MRI ONNX artifact path (`data/processed/mri_model.onnx`). |
+| `OPENROUTER_API_KEY` | Enables LLM explainer and orchestrator agent calls through OpenRouter. |
+| `OPENROUTER_FREE_MODELS` | Optional comma-separated fallback chain for the explainer. |
+| `NEUROBRIDGE_AGENT_MODEL` | OpenRouter model id for `/agent/run`. |
+| `NEUROBRIDGE_DISABLE_LLM=1` | Forces deterministic template explanations. |
+| `NEUROBRIDGE_DISABLE_MLFLOW=1` | Skips MLflow tracking/lookups when the tracking service is unavailable. |
 ## Repository Layout
 ```text
 │   └── processed/            # Parquet outputs from pipelines; gitignored
 ├── docs/superpowers/plans/   # Per-day implementation plans
 ├── src/
+│   ├── core/                 # logger, deterministic storage, MLflow tracking
 │   ├── pipelines/
 │   │   ├── bbb_pipeline.py   # Day-1 pipeline (4 public funcs + CLI entry)
 │   │   ├── eeg_pipeline.py   # Day-2 pipeline (6 public funcs + CLI entry)
 │   │   └── mri_pipeline.py   # Day-3 pipeline (5 public funcs + CLI entry)
+│   ├── models/
+│   │   ├── bbb_model.py      # RandomForest BBB classifier + SHAP
+│   │   └── mri_model.py      # External ONNX MRI inference surface
+│   ├── rag/                  # fastembed + FAISS ingest/retrieve layer
+│   ├── agents/               # OpenRouter orchestrator + guarded routing + tools
+│   ├── llm/                  # LLM/template explanation surface
+│   ├── api/                  # FastAPI routes + schemas
+│   └── frontend/             # Streamlit dashboard
 └── tests/
+    ├── core/, pipelines/, models/, rag/, agents/
     └── fixtures/          # bbbp_sample.csv, eeg_sample.fif, mri_sample/ + build_*.py
 ```
 Output schema: one row per surviving subject with columns `subject_id, site, feat_roi{i}_<stat>` (8 ROIs × 6 stats = 48 features). All `feat_*` are float64 (preserved through the Parquet round-trip).
+## MRI Image Model
+`src/models/mri_model.py` is intentionally separate from `mri_pipeline.py`.
+The pipeline remains the deterministic ComBat feature-preparation surface. The
+image model is a decision layer for externally-trained volumetric DL models:
+| Function | Purpose |
+|---|---|
+| `load(path)` | Loads an ONNX artifact with `onnxruntime` CPU execution. |
+| `load_nifti_volume(path)` | Reads one `.nii` / `.nii.gz` volume as `float32`. |
+| `preprocess_volume(volume, target_shape)` | Validates 3-D finite data, resizes, z-scores, returns `[1, 1, D, H, W]`. |
+| `predict_nifti(model, input_path, target_shape, label_names)` | Runs preprocessing + ONNX inference and returns label, confidence, probabilities. |
+Public API: `POST /predict/mri`. Streamlit exposes it in the Image tab under
+"MRI Image Model". The trained artifact is not committed; put it in
+`data/processed/mri_model.onnx` or set `MRI_MODEL_PATH`.
 ## Storage Format
 Pipeline outputs are written as Parquet files using the `pyarrow` engine with snappy
 All pipeline functions and the shared logger were built TDD-first across Days 1–3 (RED → GREEN →
 REFACTOR). Each task ended in a green commit; review-and-fix loops landed as separate
+commits with `fix:` / `refactor:` prefixes. Run `pytest -v` at any time. Current
+verification on Windows/Python 3.11: `242 passed, 2 skipped`.
 ## Roadmap
 - **Day 2 (shipped):** `eeg_pipeline.py` — bandpass + MNE ICA artifact removal + PSD + statistical features → Parquet.
+- **Day 3 (shipped):** `mri_pipeline.py` — NIfTI volume loading, brain masking, ROI feature extraction, ComBat harmonization (`neuroHarmonize`) for site-level domain shift → Parquet.
+- **Day 4 (shipped):** FastAPI surface in `src/api/` (POST `/pipeline/{bbb,eeg,mri}` + `/health`), MLflow experiment tracking via `src.core.tracking` (see AGENTS.md §7), Streamlit dashboard at `src/frontend/app.py`, and Docker / `docker-compose.yml` for the api + MLflow stack.
+- **Day 5 (shipped):** Decision layer in `src/models/bbb_model.py` — RandomForest BBB classifier on Morgan fingerprints, SHAP top-k explanations, `POST /predict/bbb` endpoint, interactive Streamlit BBB tab with SMILES input + decision card + SHAP bar chart, and trainer CLI (`python -m src.models.bbb_model`). See AGENTS.md §8.
+- **Day 6 (shipped):** Final polish & demo features — calibration metadata bins on the BBB classifier (precision-at-confidence in `BBBPredictResponse.calibration`), edge-case dropdown in the Streamlit BBB tab (5 curated robustness probes), trust caption on the decision card, and `POST /pipeline/mri/diagnostics` returning Pre/Post ComBat long-format data + site-gap KPIs visualized as a faceted altair KDE in the MRI tab. See AGENTS.md §8 (calibration) + §9 (demo features).
+- **Post-Day-8 hardening (shipped):** Orchestrator workflow guard enforces pipeline → RAG → synthesis even when the LLM skips tool calls; Docker startup guard rebuilds missing demo artifacts behind a mounted `data/`; Windows-safe MLflow test URI; MRI ONNX image decision layer at `POST /predict/mri` — 242 passed, 2 skipped.
 ## Where to Look
 - **Container stack:** [`Dockerfile`](Dockerfile), [`docker-compose.yml`](docker-compose.yml)
 - **Day-4 tests:** [`tests/api/`](tests/api/), [`tests/frontend/`](tests/frontend/), [`tests/pipelines/test_cross_pipeline_smoke.py`](tests/pipelines/test_cross_pipeline_smoke.py)
 - **Day-5 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md`](docs/superpowers/plans/2026-05-03-day5-downstream-model-xai-interactive.md)
+- **BBB downstream model (classifier + SHAP explainer + trainer CLI):** [`src/models/bbb_model.py`](src/models/bbb_model.py) + [`tests/models/test_bbb_model.py`](tests/models/test_bbb_model.py)
+- **MRI image DL decision layer:** [`src/models/mri_model.py`](src/models/mri_model.py) + [`tests/models/test_mri_model.py`](tests/models/test_mri_model.py); `POST /predict/mri` consumes an externally-trained ONNX artifact at `data/processed/mri_model.onnx` (`MRI_MODEL_PATH` override).
 - **Day-6 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md`](docs/superpowers/plans/2026-05-04-day6-final-polish-demo-features.md)
 - **MRI ComBat diagnostics surface (pre/post site-gap KPIs):** `POST /pipeline/mri/diagnostics` — see [`src/api/routes.py`](src/api/routes.py) + [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py)
 - **Day-7 design spec:** [`docs/superpowers/specs/2026-05-05-day7-drift-traceability-agents-design.md`](docs/superpowers/specs/2026-05-05-day7-drift-traceability-agents-design.md)
 - **New surfaces:** `POST /explain/eeg`, `POST /explain/mri`, `GET /experiments/runs`, `POST /experiments/diff`
 - **New deploy artifacts:** `Dockerfile.hf`, `supervisord.conf`
 - **LLM hardening (post-Day 8):** real OpenRouter LLM is now the default in deployed Spaces — `Dockerfile`/`Dockerfile.hf` no longer hard-code `NEUROBRIDGE_DISABLE_LLM=1`. Free-tier fallback chain (10 models, smartest → smallest) in [`src/llm/explainer.py`](src/llm/explainer.py), 401/400 status classification, and language-matching / intent-split prompt. Diagnostic endpoint `GET /diag/openrouter` ([`src/api/main.py`](src/api/main.py)) + Streamlit sidebar "🔧 Diagnose LLM" button. Live verification helper: [`scripts/diagnose_openrouter.py`](scripts/diagnose_openrouter.py).
+- **Orchestrator agent (Task 13):** [`src/agents/orchestrator.py`](src/agents/orchestrator.py), [`src/agents/routing.py`](src/agents/routing.py), [`src/agents/tools.py`](src/agents/tools.py), [`src/agents/prompts.py`](src/agents/prompts.py). Guarded workflow enforces one pipeline tool, then `retrieve_context`, then final synthesis.
 - **RAG layer:** [`src/rag/`](src/rag/) — chunker, embedder (fastembed), FAISS store, retriever, ingest CLI
 - **Agent endpoint:** `POST /agent/run` (orchestrator + RAG); diagnostic at `GET /diag/agent`
+- **Streamlit Agent tab:** "🤖 Agent" tab in [`src/frontend/app.py`](src/frontend/app.py) — input box + optional MRI `sites_csv` + decision-trace expander.
 - **RAG knowledge base:** drop `.md`/`.pdf` into [`data/knowledge_base/`](data/knowledge_base/) — see its README
 ## Day 7 — Demo Recipe

conftest.py CHANGED Viewed

@@ -17,7 +17,7 @@ import pytest
 @pytest.fixture(autouse=True, scope="session")
 def _isolate_mlflow_tracking_uri() -> Iterator[None]:
     tmp_root = Path(tempfile.mkdtemp(prefix="mlflow_test_"))
-    os.environ["MLFLOW_TRACKING_URI"] = f"file://{tmp_root}"
     yield
     # Don't rmtree — pytest tmpdir cleanup or OS handles it; rmtree
     # races with mlflow background writes on slow CI.

 @pytest.fixture(autouse=True, scope="session")
 def _isolate_mlflow_tracking_uri() -> Iterator[None]:
     tmp_root = Path(tempfile.mkdtemp(prefix="mlflow_test_"))
+    os.environ["MLFLOW_TRACKING_URI"] = tmp_root.as_uri()
     yield
     # Don't rmtree — pytest tmpdir cleanup or OS handles it; rmtree
     # races with mlflow background writes on slow CI.

data/knowledge_base/README.md CHANGED Viewed

@@ -1,8 +1,10 @@
 # RAG Knowledge Base
-Drop reference documents here (`.md`, `.txt`, or `.pdf`). They will be
-ingested by `python -m src.rag.ingest` at Docker build time and surfaced
-to the orchestrator agent via the `retrieve_context` tool.
 ## Recommended seed set

 # RAG Knowledge Base
+Drop reference documents here (`.md`, `.txt`, or `.pdf`). They are ingested by
+`python -m src.rag.ingest` at Docker build time and surfaced to the orchestrator
+agent via the `retrieve_context` tool. The container entrypoint also rebuilds
+the index at startup when a mounted `data/` volume does not already contain
+`data/processed/faiss_index/`.
 ## Recommended seed set

docker-compose.yml CHANGED Viewed

@@ -18,6 +18,9 @@ services:
       - "8000:8000"
     environment:
       MLFLOW_TRACKING_URI: http://mlflow:5000
     depends_on:
       - mlflow
     volumes:

       - "8000:8000"
     environment:
       MLFLOW_TRACKING_URI: http://mlflow:5000
+      NEUROBRIDGE_DISABLE_MLFLOW: "0"
+      OPENROUTER_API_KEY: ${OPENROUTER_API_KEY:-}
+      NEUROBRIDGE_AGENT_MODEL: ${NEUROBRIDGE_AGENT_MODEL:-google/gemini-2.0-flash-exp:free}
     depends_on:
       - mlflow
     volumes:

docker-entrypoint.sh ADDED Viewed

	@@ -0,0 +1,30 @@

+#!/bin/sh
+set -eu
+mkdir -p data/raw data/processed data/knowledge_base/seed
+if [ -f tests/fixtures/bbbp_sample.csv ] && [ ! -f data/raw/bbbp.csv ]; then
+  cp tests/fixtures/bbbp_sample.csv data/raw/bbbp.csv
+fi
+if [ -f tests/fixtures/eeg_sample.fif ] && [ ! -f data/raw/eeg.fif ]; then
+  cp tests/fixtures/eeg_sample.fif data/raw/eeg.fif
+fi
+if [ -d tests/fixtures/kb_sample ] && [ ! -f data/knowledge_base/seed/lipinski_rule_of_five.md ]; then
+  cp tests/fixtures/kb_sample/* data/knowledge_base/seed/
+fi
+if [ ! -f data/processed/bbbp_features.parquet ]; then
+  NEUROBRIDGE_DISABLE_MLFLOW=1 python -m src.pipelines.bbb_pipeline
+fi
+if [ ! -f data/processed/bbb_model.joblib ]; then
+  python -m src.models.bbb_model
+fi
+if [ ! -f data/processed/faiss_index/index.bin ]; then
+  python -m src.rag.ingest data/knowledge_base data/processed/faiss_index
+fi
+exec "$@"

requirements.txt CHANGED Viewed

@@ -31,6 +31,7 @@ mlflow==2.16.0
 # --- Downstream ML / XAI (Day 5 decision layer) ---
 shap==0.46.0
 joblib==1.4.2
 # --- Tooling / tests ---
 pytest==8.3.3
@@ -47,3 +48,4 @@ streamlit==1.39.0
 # --- LLM provider (Day 7 explainer) ---
 openai==1.51.0  # OpenRouter SDK (Day-7 LLM explainer; deterministic-template fallback always available)

 # --- Downstream ML / XAI (Day 5 decision layer) ---
 shap==0.46.0
 joblib==1.4.2
+onnxruntime==1.19.2  # MRI volumetric ONNX inference (external DL artifact)
 # --- Tooling / tests ---
 pytest==8.3.3
 # --- LLM provider (Day 7 explainer) ---
 openai==1.51.0  # OpenRouter SDK (Day-7 LLM explainer; deterministic-template fallback always available)
+python-dotenv==1.0.1  # Load OPENROUTER_API_KEY from local .env for API/agent demos

src/agents/orchestrator.py CHANGED Viewed

@@ -10,6 +10,7 @@ Returns an `AgentResult` with synthesized text + full tool-call trace.
 from __future__ import annotations
 import json
 from typing import Any
 from src.agents.schemas import AgentResult, ToolTraceItem
@@ -19,6 +20,10 @@ from src.core.logger import get_logger
 logger = get_logger(__name__)
 class Orchestrator:
     """Single-agent function-calling loop. Stops on (a) text response, (b) max steps."""
@@ -30,16 +35,34 @@ class Orchestrator:
         model: str,
         max_steps: int = 5,
         temperature: float = 0.0,
     ) -> None:
         self._client = llm_client
         self._tools_by_name = {t.name: t for t in tools}
         self._tool_schemas = [t.openai_schema() for t in tools]
         self._system_prompt = system_prompt
         self._model = model
         self._max_steps = max_steps
         self._temperature = temperature
-    def run(self, user_input: str) -> AgentResult:
         messages: list[dict[str, Any]] = [
             {"role": "system", "content": self._system_prompt},
             {"role": "user", "content": user_input},
@@ -47,16 +70,33 @@ class Orchestrator:
         trace: list[ToolTraceItem] = []
         for _step in range(self._max_steps):
-            response = self._client.chat.completions.create(
-                model=self._model,
-                messages=messages,
-                tools=self._tool_schemas,
-                tool_choice="auto",
-                temperature=self._temperature,
-            )
             msg = response.choices[0].message
             if not getattr(msg, "tool_calls", None):
                 return AgentResult(
                     text=(msg.content or "").strip(),
                     trace=trace,
@@ -64,13 +104,37 @@ class Orchestrator:
                     finish_reason="complete",
                 )
             messages.append({
                 "role": "assistant",
                 "content": msg.content,
-                "tool_calls": [tc.model_dump() for tc in msg.tool_calls],
             })
-            for tc in msg.tool_calls:
                 name = tc.function.name
                 tool = self._tools_by_name.get(name)
                 if tool is None:
@@ -106,3 +170,146 @@ class Orchestrator:
             model=self._model,
             finish_reason="max_steps",
         )

 from __future__ import annotations
 import json
+from collections.abc import Callable
 from typing import Any
 from src.agents.schemas import AgentResult, ToolTraceItem
 logger = get_logger(__name__)
+WorkflowRouter = Callable[[str, dict[str, Any] | None], tuple[str, dict[str, Any]] | None]
+WorkflowQueryBuilder = Callable[[str, ToolTraceItem, dict[str, Any] | None], str]
 class Orchestrator:
     """Single-agent function-calling loop. Stops on (a) text response, (b) max steps."""
         model: str,
         max_steps: int = 5,
         temperature: float = 0.0,
+        enforce_workflow: bool = False,
+        workflow_pipeline_tools: set[str] | None = None,
+        workflow_retrieval_tool: str | None = None,
+        workflow_router: WorkflowRouter | None = None,
+        workflow_query_builder: WorkflowQueryBuilder | None = None,
     ) -> None:
         self._client = llm_client
         self._tools_by_name = {t.name: t for t in tools}
         self._tool_schemas = [t.openai_schema() for t in tools]
+        self._tool_schemas_by_name = {
+            t.name: t.openai_schema()
+            for t in tools
+        }
         self._system_prompt = system_prompt
         self._model = model
         self._max_steps = max_steps
         self._temperature = temperature
+        self._enforce_workflow = enforce_workflow
+        self._workflow_pipeline_tools = workflow_pipeline_tools or set()
+        self._workflow_retrieval_tool = workflow_retrieval_tool
+        self._workflow_router = workflow_router
+        self._workflow_query_builder = workflow_query_builder
+    def run(
+        self,
+        user_input: str,
+        context: dict[str, Any] | None = None,
+    ) -> AgentResult:
         messages: list[dict[str, Any]] = [
             {"role": "system", "content": self._system_prompt},
             {"role": "user", "content": user_input},
         trace: list[ToolTraceItem] = []
         for _step in range(self._max_steps):
+            stage = self._workflow_stage(trace)
+            request_kwargs = self._completion_kwargs(messages, stage)
+            response = self._client.chat.completions.create(**request_kwargs)
             msg = response.choices[0].message
             if not getattr(msg, "tool_calls", None):
+                if self._enforce_workflow and stage == "pipeline":
+                    if self._invoke_routed_pipeline(user_input, context, trace, messages):
+                        continue
+                    return AgentResult(
+                        text=(
+                            "Cannot identify modality. Provide a SMILES, .fif/.edf "
+                            "path, or NIfTI directory."
+                        ),
+                        trace=trace,
+                        model=self._model,
+                        finish_reason="error",
+                    )
+                if self._enforce_workflow and stage == "retrieve":
+                    if self._invoke_fallback_retrieval(user_input, context, trace, messages):
+                        continue
+                    return AgentResult(
+                        text="Pipeline completed, but retrieval could not be executed.",
+                        trace=trace,
+                        model=self._model,
+                        finish_reason="error",
+                    )
                 return AgentResult(
                     text=(msg.content or "").strip(),
                     trace=trace,
                     finish_reason="complete",
                 )
+            selected_tool_calls = self._select_tool_calls(msg.tool_calls, stage)
+            if self._enforce_workflow and not selected_tool_calls:
+                if stage == "pipeline":
+                    if self._invoke_routed_pipeline(user_input, context, trace, messages):
+                        continue
+                    return AgentResult(
+                        text=(
+                            "Cannot identify modality. Provide a SMILES, .fif/.edf "
+                            "path, or NIfTI directory."
+                        ),
+                        trace=trace,
+                        model=self._model,
+                        finish_reason="error",
+                    )
+                if stage == "retrieve":
+                    if self._invoke_fallback_retrieval(user_input, context, trace, messages):
+                        continue
+                    return AgentResult(
+                        text="Pipeline completed, but retrieval could not be executed.",
+                        trace=trace,
+                        model=self._model,
+                        finish_reason="error",
+                    )
             messages.append({
                 "role": "assistant",
                 "content": msg.content,
+                "tool_calls": [tc.model_dump() for tc in selected_tool_calls],
             })
+            for tc in selected_tool_calls:
                 name = tc.function.name
                 tool = self._tools_by_name.get(name)
                 if tool is None:
             model=self._model,
             finish_reason="max_steps",
         )
+    def _completion_kwargs(
+        self,
+        messages: list[dict[str, Any]],
+        stage: str,
+    ) -> dict[str, Any]:
+        kwargs: dict[str, Any] = {
+            "model": self._model,
+            "messages": messages,
+            "temperature": self._temperature,
+        }
+        if not self._enforce_workflow:
+            kwargs["tools"] = self._tool_schemas
+            kwargs["tool_choice"] = "auto"
+            return kwargs
+        schemas = self._schemas_for_stage(stage)
+        if schemas:
+            kwargs["tools"] = schemas
+            kwargs["tool_choice"] = "auto"
+        return kwargs
+    def _schemas_for_stage(self, stage: str) -> list[dict[str, Any]]:
+        if stage == "pipeline":
+            return [
+                self._tool_schemas_by_name[name]
+                for name in sorted(self._workflow_pipeline_tools)
+                if name in self._tool_schemas_by_name
+            ]
+        if stage == "retrieve" and self._workflow_retrieval_tool:
+            schema = self._tool_schemas_by_name.get(self._workflow_retrieval_tool)
+            return [schema] if schema else []
+        return []
+    def _workflow_stage(self, trace: list[ToolTraceItem]) -> str:
+        if not self._enforce_workflow:
+            return "open"
+        has_pipeline = any(
+            t.name in self._workflow_pipeline_tools and t.result is not None and t.error is None
+            for t in trace
+        )
+        if not has_pipeline:
+            return "pipeline"
+        has_retrieval = any(
+            t.name == self._workflow_retrieval_tool and t.result is not None and t.error is None
+            for t in trace
+        )
+        return "final" if has_retrieval else "retrieve"
+    def _select_tool_calls(self, tool_calls: list[Any], stage: str) -> list[Any]:
+        if not self._enforce_workflow:
+            return list(tool_calls)
+        if stage == "pipeline":
+            for tc in tool_calls:
+                if tc.function.name in self._workflow_pipeline_tools:
+                    return [tc]
+            return []
+        if stage == "retrieve":
+            for tc in tool_calls:
+                if tc.function.name == self._workflow_retrieval_tool:
+                    return [tc]
+            return []
+        return []
+    def _invoke_routed_pipeline(
+        self,
+        user_input: str,
+        context: dict[str, Any] | None,
+        trace: list[ToolTraceItem],
+        messages: list[dict[str, Any]],
+    ) -> bool:
+        if self._workflow_router is None:
+            return False
+        routed = self._workflow_router(user_input, context)
+        if routed is None:
+            return False
+        name, args = routed
+        tool = self._tools_by_name.get(name)
+        if tool is None:
+            trace.append(ToolTraceItem(name=name, args=args, error=f"unknown tool: {name}"))
+            return False
+        try:
+            result = tool.invoke(args)
+            trace.append(ToolTraceItem(name=name, args=args, result=result))
+            messages.append({
+                "role": "user",
+                "content": (
+                    "Workflow guard executed the required pipeline tool. "
+                    f"Tool: {name}. Result: {json.dumps(result, default=str)}. "
+                    "Now call retrieve_context with a focused scientific query."
+                ),
+            })
+            return True
+        except Exception as e:
+            trace.append(ToolTraceItem(name=name, args=args, error=str(e)))
+            return False
+    def _invoke_fallback_retrieval(
+        self,
+        user_input: str,
+        context: dict[str, Any] | None,
+        trace: list[ToolTraceItem],
+        messages: list[dict[str, Any]],
+    ) -> bool:
+        if self._workflow_retrieval_tool is None or self._workflow_query_builder is None:
+            return False
+        pipeline_trace = next(
+            (
+                t for t in trace
+                if t.name in self._workflow_pipeline_tools and t.result is not None and t.error is None
+            ),
+            None,
+        )
+        if pipeline_trace is None:
+            return False
+        tool = self._tools_by_name.get(self._workflow_retrieval_tool)
+        if tool is None:
+            return False
+        query = self._workflow_query_builder(user_input, pipeline_trace, context)
+        args = {"query": query, "k": 4}
+        try:
+            result = tool.invoke(args)
+            trace.append(ToolTraceItem(
+                name=self._workflow_retrieval_tool,
+                args=args,
+                result=result,
+            ))
+            messages.append({
+                "role": "user",
+                "content": (
+                    "Workflow guard executed retrieve_context. "
+                    f"Result: {json.dumps(result, default=str)}. "
+                    "Now synthesize the final answer in the user's language."
+                ),
+            })
+            return True
+        except Exception as e:
+            trace.append(ToolTraceItem(
+                name=self._workflow_retrieval_tool,
+                args=args,
+                error=str(e),
+            ))
+            return False

src/agents/prompts.py CHANGED Viewed

@@ -20,6 +20,7 @@ Workflow — follow exactly:
    - SMILES (short, all-letters/digits, no slashes, no .ext)        → run_bbb_pipeline
    - Path ending in .fif or .edf                                    → run_eeg_pipeline
    - Path that is a directory (no file extension at the tail)       → run_mri_pipeline
    If ambiguous, prefer SMILES if it parses; otherwise return:
    "Cannot identify modality. Provide a SMILES, .fif/.edf path, or NIfTI directory."

    - SMILES (short, all-letters/digits, no slashes, no .ext)        → run_bbb_pipeline
    - Path ending in .fif or .edf                                    → run_eeg_pipeline
    - Path that is a directory (no file extension at the tail)       → run_mri_pipeline
+     Use sites_csv="<input_dir>/sites.csv" unless the user explicitly gives another CSV.
    If ambiguous, prefer SMILES if it parses; otherwise return:
    "Cannot identify modality. Provide a SMILES, .fif/.edf path, or NIfTI directory."

src/agents/routing.py ADDED Viewed

	@@ -0,0 +1,81 @@

+"""Deterministic fallbacks for the orchestrator workflow.
+The LLM remains responsible for normal function-calling, but these helpers
+keep the public agent route reliable when a model skips or mis-shapes a tool
+call during a live demo.
+"""
+from __future__ import annotations
+from pathlib import Path
+from typing import Any
+from src.agents.schemas import ToolTraceItem
+_EEG_SUFFIXES = {".fif", ".edf"}
+def route_pipeline_input(
+    user_input: str,
+    context: dict[str, Any] | None = None,
+) -> tuple[str, dict[str, Any]] | None:
+    """Map raw user input to exactly one pipeline tool and argument dict."""
+    text = _primary_input(user_input)
+    if not text:
+        return None
+    path = Path(text)
+    lower = text.lower()
+    if path.suffix.lower() in _EEG_SUFFIXES:
+        return "run_eeg_pipeline", {"input_path": text}
+    if _looks_like_mri_input(path, lower):
+        input_dir = path.parent if lower.endswith(".nii.gz") or path.suffix.lower() == ".nii" else path
+        sites_csv = _sites_csv_for(input_dir, context)
+        return "run_mri_pipeline", {
+            "input_dir": str(input_dir),
+            "sites_csv": sites_csv,
+        }
+    if _looks_like_path(text):
+        return None
+    return "run_bbb_pipeline", {"smiles": text, "top_k": 5}
+def build_retrieval_query(
+    user_input: str,
+    pipeline_trace: ToolTraceItem,
+    context: dict[str, Any] | None = None,
+) -> str:
+    """Build the canonical scientific RAG query for a completed pipeline tool."""
+    if pipeline_trace.name == "run_eeg_pipeline":
+        return "ICA artifact removal in multi-channel EEG"
+    if pipeline_trace.name == "run_mri_pipeline":
+        return "ComBat scanner site harmonization in multi-center MRI"
+    return "BBB permeability of small lipophilic molecules"
+def _primary_input(user_input: str) -> str:
+    """Return the first non-empty input line, excluding appended user questions."""
+    before_question = user_input.split("\n\nUser question:", 1)[0]
+    return before_question.strip().strip("\"'")
+def _looks_like_mri_input(path: Path, lower: str) -> bool:
+    if lower.endswith(".nii.gz") or path.suffix.lower() == ".nii":
+        return True
+    if path.exists() and path.is_dir():
+        return True
+    return not path.suffix and _looks_like_path(str(path))
+def _looks_like_path(text: str) -> bool:
+    return "/" in text or "\\" in text
+def _sites_csv_for(input_dir: Path, context: dict[str, Any] | None) -> str:
+    explicit = (context or {}).get("sites_csv")
+    if explicit:
+        return str(explicit)
+    return str(input_dir / "sites.csv")

src/agents/schemas.py CHANGED Viewed

@@ -28,7 +28,10 @@ class EEGPipelineInput(BaseModel):
 class MRIPipelineInput(BaseModel):
     """Input for `run_mri_pipeline` — directory of NIfTI files + sites CSV."""
     input_dir: str = Field(..., description="Directory containing .nii.gz volumes")
-    sites_csv: str = Field(..., description="CSV mapping subject_id → site")
 class RetrieveContextInput(BaseModel):

 class MRIPipelineInput(BaseModel):
     """Input for `run_mri_pipeline` — directory of NIfTI files + sites CSV."""
     input_dir: str = Field(..., description="Directory containing .nii.gz volumes")
+    sites_csv: str | None = Field(
+        None,
+        description="CSV mapping subject_id → site; defaults to <input_dir>/sites.csv",
+    )
 class RetrieveContextInput(BaseModel):

src/agents/tools.py CHANGED Viewed

@@ -130,11 +130,12 @@ def _make_mri_executor(processed_dir: Path) -> Callable[[MRIPipelineInput], MRIP
         from src.api import routes as api_routes
         from fastapi import HTTPException
         out_path = processed_dir / "mri_features.parquet"
         try:
             response = api_routes.run_mri(
                 MRIRequest(
                     input_dir=inp.input_dir,
-                    sites_csv=inp.sites_csv,
                     output_path=str(out_path),
                 )
             )

         from src.api import routes as api_routes
         from fastapi import HTTPException
         out_path = processed_dir / "mri_features.parquet"
+        sites_csv = inp.sites_csv or str(Path(inp.input_dir) / "sites.csv")
         try:
             response = api_routes.run_mri(
                 MRIRequest(
                     input_dir=inp.input_dir,
+                    sites_csv=sites_csv,
                     output_path=str(out_path),
                 )
             )

src/api/routes.py CHANGED Viewed

@@ -37,8 +37,11 @@ from src.api.schemas import (
     ModelProvenance,
     MRIDiagnosticsRequest,
     MRIDiagnosticsResponse,
     MRIExplainRequest,
     MRIExplainResponse,
     MRIRequest,
     PipelineResponse,
     RunDiffRequest,
@@ -47,7 +50,7 @@ from src.api.schemas import (
 )
 from src.core.logger import get_logger
 from src.llm import explainer as llm_explainer
-from src.models import bbb_model
 from src.pipelines import bbb_pipeline, eeg_pipeline, mri_pipeline
 logger = get_logger(__name__)
@@ -75,12 +78,7 @@ def _wrap(
     duration_sec = time.perf_counter() - started
     df = pd.read_parquet(output_path)
-    runs = mlflow.search_runs(
-        experiment_names=[experiment_name],
-        max_results=1,
-        order_by=["start_time DESC"],
-    )
-    run_id = runs.iloc[0]["run_id"] if len(runs) else None
     return PipelineResponse(
         status="ok",
@@ -92,6 +90,22 @@ def _wrap(
     )
 @router.post("/bbb", response_model=PipelineResponse)
 def run_bbb(req: BBBRequest) -> PipelineResponse:
     """Run the BBB pipeline; return rows/cols/duration + the MLflow run id."""
@@ -142,6 +156,7 @@ def run_mri(req: MRIRequest) -> PipelineResponse:
 # Default artifact location. Overridable via BBB_MODEL_PATH env var so tests
 # can point at a tmp-built model without touching production paths.
 _DEFAULT_BBB_MODEL_PATH = Path("data/processed/bbb_model.joblib")
 def _bbb_model_path() -> Path:
@@ -149,6 +164,11 @@ def _bbb_model_path() -> Path:
     return Path(os.environ.get("BBB_MODEL_PATH", str(_DEFAULT_BBB_MODEL_PATH)))
 # Per-worker rolling window of recent prediction confidences.
 # Cleared on worker restart; multi-worker setups have independent windows.
 WORKER_CONFIDENCE_DEQUE: deque[float] = deque(maxlen=100)
@@ -295,6 +315,45 @@ def predict_bbb(req: BBBPredictRequest) -> BBBPredictResponse:
     )
 @router.post("/mri/diagnostics", response_model=MRIDiagnosticsResponse)
 def mri_diagnostics(req: MRIDiagnosticsRequest) -> MRIDiagnosticsResponse:
     """Run the MRI pipeline twice and return pre/post ComBat data + site-gap KPIs."""
@@ -521,6 +580,7 @@ def _build_orchestrator():
     from src.agents.orchestrator import Orchestrator
     from src.agents.prompts import ORCHESTRATOR_SYSTEM_PROMPT
     from src.agents.tools import build_default_tools
     api_key = os.environ.get("OPENROUTER_API_KEY")
@@ -543,6 +603,15 @@ def _build_orchestrator():
         system_prompt=ORCHESTRATOR_SYSTEM_PROMPT,
         model=model,
         max_steps=5,
     )
@@ -553,7 +622,7 @@ def run_agent(req: AgentRunRequest) -> AgentRunResponse:
     user_text = req.user_input
     if req.user_question:
         user_text = f"{req.user_input}\n\nUser question: {req.user_question}"
-    result = orch.run(user_text)
     return AgentRunResponse(
         text=result.text,
         trace=[

     ModelProvenance,
     MRIDiagnosticsRequest,
     MRIDiagnosticsResponse,
+    MRIClassProbability,
     MRIExplainRequest,
     MRIExplainResponse,
+    MRIPredictRequest,
+    MRIPredictResponse,
     MRIRequest,
     PipelineResponse,
     RunDiffRequest,
 )
 from src.core.logger import get_logger
 from src.llm import explainer as llm_explainer
+from src.models import bbb_model, mri_model
 from src.pipelines import bbb_pipeline, eeg_pipeline, mri_pipeline
 logger = get_logger(__name__)
     duration_sec = time.perf_counter() - started
     df = pd.read_parquet(output_path)
+    run_id = _latest_mlflow_run_id(experiment_name)
     return PipelineResponse(
         status="ok",
     )
+def _latest_mlflow_run_id(experiment_name: str) -> str | None:
+    """Return the newest MLflow run id, degrading to None when tracking is off."""
+    if os.environ.get("NEUROBRIDGE_DISABLE_MLFLOW") == "1":
+        return None
+    try:
+        runs = mlflow.search_runs(
+            experiment_names=[experiment_name],
+            max_results=1,
+            order_by=["start_time DESC"],
+        )
+    except Exception as e:
+        logger.warning("MLflow run lookup failed for %s: %s", experiment_name, e)
+        return None
+    return str(runs.iloc[0]["run_id"]) if len(runs) else None
 @router.post("/bbb", response_model=PipelineResponse)
 def run_bbb(req: BBBRequest) -> PipelineResponse:
     """Run the BBB pipeline; return rows/cols/duration + the MLflow run id."""
 # Default artifact location. Overridable via BBB_MODEL_PATH env var so tests
 # can point at a tmp-built model without touching production paths.
 _DEFAULT_BBB_MODEL_PATH = Path("data/processed/bbb_model.joblib")
+_DEFAULT_MRI_MODEL_PATH = Path("data/processed/mri_model.onnx")
 def _bbb_model_path() -> Path:
     return Path(os.environ.get("BBB_MODEL_PATH", str(_DEFAULT_BBB_MODEL_PATH)))
+def _mri_model_path() -> Path:
+    """Return the MRI ONNX model artifact path, overridable via MRI_MODEL_PATH."""
+    return Path(os.environ.get("MRI_MODEL_PATH", str(_DEFAULT_MRI_MODEL_PATH)))
 # Per-worker rolling window of recent prediction confidences.
 # Cleared on worker restart; multi-worker setups have independent windows.
 WORKER_CONFIDENCE_DEQUE: deque[float] = deque(maxlen=100)
     )
+@predict_router.post("/mri", response_model=MRIPredictResponse)
+def predict_mri(req: MRIPredictRequest) -> MRIPredictResponse:
+    """Predict from one MRI NIfTI image using an externally-trained ONNX model."""
+    artifact = _mri_model_path()
+    if not artifact.exists():
+        raise HTTPException(
+            status_code=503,
+            detail=(
+                f"MRI model artifact not available at {artifact}. "
+                "Export the trained volumetric model to ONNX and place it there, "
+                "or set MRI_MODEL_PATH."
+            ),
+        )
+    try:
+        model = mri_model.load(artifact)
+        pred = mri_model.predict_nifti(
+            model,
+            Path(req.input_path),
+            target_shape=req.target_shape,
+            label_names=req.label_names,
+        )
+    except FileNotFoundError as e:
+        raise HTTPException(status_code=404, detail=str(e))
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+    return MRIPredictResponse(
+        label=int(pred["label"]),
+        label_text=str(pred["label_text"]),
+        confidence=float(pred["confidence"]),
+        probabilities=[
+            MRIClassProbability(**p)
+            for p in pred["probabilities"]
+        ],
+        input_path=req.input_path,
+        model_path=str(artifact),
+    )
 @router.post("/mri/diagnostics", response_model=MRIDiagnosticsResponse)
 def mri_diagnostics(req: MRIDiagnosticsRequest) -> MRIDiagnosticsResponse:
     """Run the MRI pipeline twice and return pre/post ComBat data + site-gap KPIs."""
     from src.agents.orchestrator import Orchestrator
     from src.agents.prompts import ORCHESTRATOR_SYSTEM_PROMPT
+    from src.agents.routing import build_retrieval_query, route_pipeline_input
     from src.agents.tools import build_default_tools
     api_key = os.environ.get("OPENROUTER_API_KEY")
         system_prompt=ORCHESTRATOR_SYSTEM_PROMPT,
         model=model,
         max_steps=5,
+        enforce_workflow=True,
+        workflow_pipeline_tools={
+            "run_bbb_pipeline",
+            "run_eeg_pipeline",
+            "run_mri_pipeline",
+        },
+        workflow_retrieval_tool="retrieve_context",
+        workflow_router=route_pipeline_input,
+        workflow_query_builder=build_retrieval_query,
     )
     user_text = req.user_input
     if req.user_question:
         user_text = f"{req.user_input}\n\nUser question: {req.user_question}"
+    result = orch.run(user_text, context={"sites_csv": req.sites_csv})
     return AgentRunResponse(
         text=result.text,
         trace=[

src/api/schemas.py CHANGED Viewed

@@ -113,6 +113,38 @@ class BBBPredictResponse(BaseModel):
     )
 class MRIDiagnosticsRequest(BaseModel):
     """Request body for /pipeline/mri/diagnostics — same as MRIRequest minus output_path."""
     input_dir: str = Field(..., description="Directory of .nii.gz files")
@@ -238,6 +270,10 @@ class AgentRunRequest(BaseModel):
     user_question: str | None = Field(
         None, description="Optional natural-language question to language-match the response"
     )
 class AgentToolTraceItem(BaseModel):

     )
+class MRIPredictRequest(BaseModel):
+    """Single-subject MRI image prediction request."""
+    input_path: str = Field(..., description="Path to one .nii or .nii.gz MRI volume")
+    target_shape: tuple[int, int, int] = Field(
+        (64, 64, 64),
+        description="Model preprocessing resize target as (D, H, W)",
+    )
+    label_names: list[str] | None = Field(
+        None,
+        description="Optional class labels matching ONNX output order",
+    )
+class MRIClassProbability(BaseModel):
+    """One MRI model class probability."""
+    label: int
+    label_text: str
+    probability: float
+class MRIPredictResponse(BaseModel):
+    """MRI DL decision payload from a volumetric ONNX model."""
+    model_config = ConfigDict(protected_namespaces=())
+    label: int
+    label_text: str
+    confidence: float
+    probabilities: list[MRIClassProbability]
+    input_path: str
+    model_path: str
 class MRIDiagnosticsRequest(BaseModel):
     """Request body for /pipeline/mri/diagnostics — same as MRIRequest minus output_path."""
     input_dir: str = Field(..., description="Directory of .nii.gz files")
     user_question: str | None = Field(
         None, description="Optional natural-language question to language-match the response"
     )
+    sites_csv: str | None = Field(
+        None,
+        description="Optional MRI sites CSV. Defaults to <user_input>/sites.csv for directory inputs.",
+    )
 class AgentToolTraceItem(BaseModel):

src/frontend/app.py CHANGED Viewed

@@ -1318,6 +1318,48 @@ def _render_mri_tab() -> None:
             except httpx.RequestError as e:
                 st.error(f"Cannot reach FastAPI at {_API_URL}: {e!r}")
 def _render_prediction_card(result: dict) -> None:
     """Editorial decision card: provenance · verdict · signals · SHAP."""
@@ -1790,6 +1832,11 @@ def main() -> None:
                 value="",
                 help="Ask in any language — the agent will mirror it in the response",
             )
             submitted = st.form_submit_button("Run agent")
         if submitted and agent_input:
@@ -1798,6 +1845,8 @@ def main() -> None:
                     payload: dict = {"user_input": agent_input}
                     if agent_question:
                         payload["user_question"] = agent_question
                     response = _post("/agent/run", payload, timeout=120.0)
                 except Exception as e:
                     st.error(f"Agent run failed: {e}")

             except httpx.RequestError as e:
                 st.error(f"Cannot reach FastAPI at {_API_URL}: {e!r}")
+    st.markdown("#### MRI Image Model")
+    mri_image = st.text_input(
+        "NIfTI image",
+        "tests/fixtures/mri_sample/subject_0.nii.gz",
+        key="mri_predict_image",
+    )
+    mri_labels = st.text_input(
+        "Class labels",
+        "control,abnormal",
+        key="mri_predict_labels",
+    )
+    if st.button("Predict MRI image", key="mri_predict"):
+        labels = [x.strip() for x in mri_labels.split(",") if x.strip()]
+        payload: dict = {
+            "input_path": mri_image,
+            "target_shape": [64, 64, 64],
+        }
+        if labels:
+            payload["label_names"] = labels
+        with st.spinner("Running MRI image model..."):
+            try:
+                result = _post("/predict/mri", payload, timeout=120.0)
+            except httpx.HTTPStatusError as e:
+                detail = e.response.text
+                if e.response.status_code == 503:
+                    st.warning(
+                        "MRI model artifact is not available yet. Export the trained "
+                        "ONNX model to `data/processed/mri_model.onnx` or set `MRI_MODEL_PATH`."
+                    )
+                else:
+                    st.error(f"MRI prediction failed (HTTP {e.response.status_code}): {detail}")
+            except httpx.RequestError as e:
+                st.error(f"Cannot reach FastAPI at {_API_URL}: {e!r}")
+            else:
+                st.metric(
+                    label=result.get("label_text", "prediction"),
+                    value=f"{float(result.get('confidence', 0.0)) * 100:.1f}%",
+                )
+                probs = result.get("probabilities", [])
+                if probs:
+                    st.dataframe(probs, use_container_width=True, hide_index=True)
 def _render_prediction_card(result: dict) -> None:
     """Editorial decision card: provenance · verdict · signals · SHAP."""
                 value="",
                 help="Ask in any language — the agent will mirror it in the response",
             )
+            agent_sites_csv = st.text_input(
+                "MRI sites CSV (optional)",
+                value="",
+                help="Defaults to <MRI input directory>/sites.csv",
+            )
             submitted = st.form_submit_button("Run agent")
         if submitted and agent_input:
                     payload: dict = {"user_input": agent_input}
                     if agent_question:
                         payload["user_question"] = agent_question
+                    if agent_sites_csv:
+                        payload["sites_csv"] = agent_sites_csv
                     response = _post("/agent/run", payload, timeout=120.0)
                 except Exception as e:
                     st.error(f"Agent run failed: {e}")

src/models/mri_model.py ADDED Viewed

	@@ -0,0 +1,149 @@

+"""MRI image deep-learning inference utilities.
+This module is the decision-layer bridge for an externally-trained volumetric
+MRI model. The training code can live outside this repo; production only needs
+an ONNX artifact plus the preprocessing contract below.
+"""
+from __future__ import annotations
+from pathlib import Path
+from typing import Any, Sequence
+import nibabel as nib
+import numpy as np
+from scipy import ndimage as scipy_ndimage
+from src.core.logger import get_logger
+from src.pipelines.mri_pipeline import is_valid_volume
+logger = get_logger(__name__)
+DEFAULT_MODEL_PATH = Path("data/processed/mri_model.onnx")
+DEFAULT_TARGET_SHAPE: tuple[int, int, int] = (64, 64, 64)
+DEFAULT_LABEL_NAMES: tuple[str, ...] = ("class_0", "class_1")
+_MIN_STD = 1e-6
+def load(path: Path) -> Any:
+    """Load an ONNX MRI model artifact.
+    Args:
+        path: Path to an externally-trained `.onnx` artifact.
+    Returns:
+        An `onnxruntime.InferenceSession`.
+    Raises:
+        FileNotFoundError: if the artifact does not exist.
+    """
+    path = Path(path)
+    if not path.exists():
+        raise FileNotFoundError(f"MRI model artifact not found: {path}")
+    import onnxruntime as ort
+    return ort.InferenceSession(str(path), providers=["CPUExecutionProvider"])
+def load_nifti_volume(path: Path) -> np.ndarray:
+    """Read a NIfTI volume from disk as float32."""
+    path = Path(path)
+    if not path.exists():
+        raise FileNotFoundError(f"MRI input not found: {path}")
+    img = nib.load(str(path))
+    return np.asarray(img.get_fdata(dtype=np.float32), dtype=np.float32)
+def preprocess_volume(
+    volume: np.ndarray,
+    target_shape: tuple[int, int, int] = DEFAULT_TARGET_SHAPE,
+) -> np.ndarray:
+    """Convert a 3-D MRI volume into model input `[1, 1, D, H, W]`.
+    The external trainer must use the same contract: trilinear resize to
+    `target_shape`, z-score over non-zero voxels when present, then add batch
+    and channel dimensions.
+    """
+    if not is_valid_volume(volume):
+        raise ValueError("MRI volume must be a finite numeric 3-D array")
+    if len(target_shape) != 3 or any(int(x) <= 0 for x in target_shape):
+        raise ValueError(f"target_shape must contain three positive integers: {target_shape}")
+    resized = _resize_volume(np.asarray(volume, dtype=np.float32), target_shape)
+    normalized = _zscore_volume(resized)
+    return normalized[np.newaxis, np.newaxis, :, :, :].astype(np.float32, copy=False)
+def preprocess_nifti(
+    input_path: Path,
+    target_shape: tuple[int, int, int] = DEFAULT_TARGET_SHAPE,
+) -> np.ndarray:
+    """Read and preprocess one NIfTI file for ONNX inference."""
+    return preprocess_volume(load_nifti_volume(input_path), target_shape=target_shape)
+def predict_with_proba(
+    model: Any,
+    model_input: np.ndarray,
+    label_names: Sequence[str] | None = None,
+) -> dict[str, object]:
+    """Run an ONNX model and return label, confidence, and per-class probabilities."""
+    labels = tuple(label_names or DEFAULT_LABEL_NAMES)
+    if model_input.ndim != 5:
+        raise ValueError(f"model_input must have shape [1, 1, D, H, W], got {model_input.shape}")
+    input_name = model.get_inputs()[0].name
+    output = model.run(None, {input_name: model_input.astype(np.float32, copy=False)})[0]
+    proba = _as_probabilities(np.asarray(output, dtype=np.float32))
+    if len(labels) != proba.shape[0]:
+        labels = tuple(f"class_{i}" for i in range(proba.shape[0]))
+    label_idx = int(np.argmax(proba))
+    return {
+        "label": label_idx,
+        "label_text": labels[label_idx],
+        "confidence": float(proba[label_idx]),
+        "probabilities": [
+            {"label": i, "label_text": labels[i], "probability": float(p)}
+            for i, p in enumerate(proba)
+        ],
+    }
+def predict_nifti(
+    model: Any,
+    input_path: Path,
+    target_shape: tuple[int, int, int] = DEFAULT_TARGET_SHAPE,
+    label_names: Sequence[str] | None = None,
+) -> dict[str, object]:
+    """Preprocess one NIfTI image and run MRI model inference."""
+    model_input = preprocess_nifti(input_path, target_shape=target_shape)
+    return predict_with_proba(model, model_input, label_names=label_names)
+def _resize_volume(volume: np.ndarray, target_shape: tuple[int, int, int]) -> np.ndarray:
+    zoom = tuple(t / s for t, s in zip(target_shape, volume.shape, strict=True))
+    return scipy_ndimage.zoom(volume, zoom=zoom, order=1).astype(np.float32, copy=False)
+def _zscore_volume(volume: np.ndarray) -> np.ndarray:
+    mask = volume != 0
+    ref = volume[mask] if np.any(mask) else volume.reshape(-1)
+    mean = float(ref.mean())
+    std = float(ref.std())
+    if std < _MIN_STD:
+        return np.zeros_like(volume, dtype=np.float32)
+    return ((volume - mean) / std).astype(np.float32, copy=False)
+def _as_probabilities(raw_output: np.ndarray) -> np.ndarray:
+    logits = np.squeeze(raw_output)
+    if logits.ndim != 1:
+        raise ValueError(f"MRI model output must be one class vector, got shape {raw_output.shape}")
+    if logits.size < 2:
+        raise ValueError("MRI model output must contain at least two class scores")
+    if np.all(logits >= 0.0) and np.all(logits <= 1.0) and np.isclose(logits.sum(), 1.0, atol=1e-4):
+        return logits.astype(np.float32, copy=False)
+    shifted = logits - np.max(logits)
+    exp = np.exp(shifted)
+    return (exp / exp.sum()).astype(np.float32, copy=False)

tests/agents/test_agent_route.py CHANGED Viewed

@@ -19,7 +19,7 @@ class _FakeOrchestrator:
     def __init__(self, *args: Any, **kwargs: Any) -> None:
         pass
-    def run(self, user_input: str) -> AgentResult:
         return AgentResult(
             text=f"Synthesized answer for: {user_input}",
             trace=[

     def __init__(self, *args: Any, **kwargs: Any) -> None:
         pass
+    def run(self, user_input: str, context: dict[str, Any] | None = None) -> AgentResult:
         return AgentResult(
             text=f"Synthesized answer for: {user_input}",
             trace=[

tests/agents/test_orchestrator.py CHANGED Viewed

@@ -67,6 +67,45 @@ def _make_ping_tool() -> Tool:
     )
 # --- Tests ------------------------------------------------------------------
@@ -159,3 +198,34 @@ class TestOrchestrator:
         result = orch.run("trivial input")
         assert result.text == "Direct answer."
         assert result.trace == []

     )
+class _BBBInput(BaseModel):
+    smiles: str
+class _BBBOutput(BaseModel):
+    label_text: str
+    confidence: float
+class _RetrieveInput(BaseModel):
+    query: str
+    k: int = 4
+class _RetrieveOutput(BaseModel):
+    chunks: list[dict[str, Any]]
+def _make_workflow_tools() -> list[Tool]:
+    return [
+        Tool(
+            name="run_bbb_pipeline",
+            description="Run BBB.",
+            input_model=_BBBInput,
+            output_model=_BBBOutput,
+            execute=lambda inp: _BBBOutput(label_text="permeable", confidence=0.82),
+        ),
+        Tool(
+            name="retrieve_context",
+            description="Retrieve context.",
+            input_model=_RetrieveInput,
+            output_model=_RetrieveOutput,
+            execute=lambda inp: _RetrieveOutput(
+                chunks=[{"source": "lipinski.md", "text": "BBB context"}]
+            ),
+        ),
+    ]
 # --- Tests ------------------------------------------------------------------
         result = orch.run("trivial input")
         assert result.text == "Direct answer."
         assert result.trace == []
+    def test_enforced_workflow_falls_back_when_model_skips_tool_calls(self) -> None:
+        client = MagicMock()
+        client.chat.completions.create.side_effect = [
+            _fake_choice_with_text("I will answer directly."),
+            _fake_choice_with_text("Still no retrieval."),
+            _fake_choice_with_text("Grounded final answer."),
+        ]
+        orch = Orchestrator(
+            llm_client=client,
+            tools=_make_workflow_tools(),
+            system_prompt="sys",
+            model="stub-model",
+            max_steps=5,
+            enforce_workflow=True,
+            workflow_pipeline_tools={"run_bbb_pipeline"},
+            workflow_retrieval_tool="retrieve_context",
+            workflow_router=lambda user_input, context: (
+                "run_bbb_pipeline",
+                {"smiles": user_input},
+            ),
+            workflow_query_builder=lambda user_input, pipeline_trace, context: (
+                "BBB permeability of small lipophilic molecules"
+            ),
+        )
+        result = orch.run("CCO")
+        assert result.finish_reason == "complete"
+        assert result.text == "Grounded final answer."
+        assert [t.name for t in result.trace] == ["run_bbb_pipeline", "retrieve_context"]
+        assert result.trace[0].result == {"label_text": "permeable", "confidence": 0.82}
+        assert result.trace[1].args["query"] == "BBB permeability of small lipophilic molecules"

tests/agents/test_tools.py CHANGED Viewed

@@ -2,6 +2,8 @@
 from __future__ import annotations
 from pathlib import Path
 import pytest
 from pydantic import BaseModel
@@ -91,6 +93,7 @@ class TestBuildDefaultTools:
         assert "input_path" in EEGPipelineInput.model_fields
         assert "input_dir" in MRIPipelineInput.model_fields
         assert "sites_csv" in MRIPipelineInput.model_fields
         assert "query" in RetrieveContextInput.model_fields
         assert "k" in RetrieveContextInput.model_fields
@@ -116,7 +119,6 @@ class TestBuildDefaultTools:
         assert len(tools) == 4
     def test_bbb_executor_translates_httpexception_to_valueerror(self) -> None:
-        from unittest.mock import patch
         from fastapi import HTTPException
         tools = build_default_tools(rag_index_dir=None)
@@ -126,3 +128,25 @@ class TestBuildDefaultTools:
                    side_effect=HTTPException(status_code=503, detail="model missing")):
             with pytest.raises(ValueError, match="bbb tool failed"):
                 bbb.invoke({"smiles": "CCO"})

 from __future__ import annotations
 from pathlib import Path
+from types import SimpleNamespace
+from unittest.mock import patch
 import pytest
 from pydantic import BaseModel
         assert "input_path" in EEGPipelineInput.model_fields
         assert "input_dir" in MRIPipelineInput.model_fields
         assert "sites_csv" in MRIPipelineInput.model_fields
+        assert "sites_csv" not in MRIPipelineInput.model_json_schema().get("required", [])
         assert "query" in RetrieveContextInput.model_fields
         assert "k" in RetrieveContextInput.model_fields
         assert len(tools) == 4
     def test_bbb_executor_translates_httpexception_to_valueerror(self) -> None:
         from fastapi import HTTPException
         tools = build_default_tools(rag_index_dir=None)
                    side_effect=HTTPException(status_code=503, detail="model missing")):
             with pytest.raises(ValueError, match="bbb tool failed"):
                 bbb.invoke({"smiles": "CCO"})
+    def test_mri_executor_defaults_sites_csv_to_input_dir_sites_csv(self, tmp_path: Path) -> None:
+        tools = build_default_tools(rag_index_dir=None, processed_dir=tmp_path / "processed")
+        mri = next(t for t in tools if t.name == "run_mri_pipeline")
+        input_dir = tmp_path / "mri"
+        input_dir.mkdir()
+        with patch(
+            "src.api.routes.run_mri",
+            return_value=SimpleNamespace(
+                output_path=str(tmp_path / "processed" / "mri_features.parquet"),
+                rows=2,
+                columns=3,
+                duration_sec=0.1,
+            ),
+        ) as run_mri:
+            out = mri.invoke({"input_dir": str(input_dir)})
+        assert out["rows"] == 2
+        req = run_mri.call_args.args[0]
+        assert req.input_dir == str(input_dir)
+        assert req.sites_csv == str(input_dir / "sites.csv")

tests/api/test_routes.py CHANGED Viewed

@@ -2,7 +2,9 @@
 from __future__ import annotations
 from pathlib import Path
 import pytest
 from fastapi.testclient import TestClient
@@ -73,6 +75,22 @@ class TestMRIRoute:
         assert resp.json()["rows"] > 0
 class TestBBBPredictRoute:
     def _setup_model_artifact(self, tmp_path: Path) -> Path:
         """Build features + train + save a tiny model. Returns artifact path."""
@@ -198,6 +216,56 @@ class TestBBBPredictRoute:
         assert resp.status_code == 503
 class TestMRIDiagnosticsRoute:
     def test_returns_200_with_pre_and_post_data(self, tmp_path: Path):
         from tests.fixtures.build_mri_fixture import build as build_mri

 from __future__ import annotations
 from pathlib import Path
+from unittest.mock import patch
+import pandas as pd
 import pytest
 from fastapi.testclient import TestClient
         assert resp.json()["rows"] > 0
+class TestPipelineWrap:
+    def test_wrap_skips_mlflow_lookup_when_disabled(self, tmp_path: Path, monkeypatch):
+        from src.api import routes
+        out = tmp_path / "out.parquet"
+        pd.DataFrame({"x": [1]}).to_parquet(out)
+        monkeypatch.setenv("NEUROBRIDGE_DISABLE_MLFLOW", "1")
+        with patch("src.api.routes.mlflow.search_runs") as search_runs:
+            resp = routes._wrap("bbb_pipeline", out, lambda: None)
+        search_runs.assert_not_called()
+        assert resp.status == "ok"
+        assert resp.mlflow_run_id is None
 class TestBBBPredictRoute:
     def _setup_model_artifact(self, tmp_path: Path) -> Path:
         """Build features + train + save a tiny model. Returns artifact path."""
         assert resp.status_code == 503
+class TestMRIPredictRoute:
+    def test_returns_503_when_artifact_missing(self, tmp_path: Path, monkeypatch):
+        monkeypatch.setenv("MRI_MODEL_PATH", str(tmp_path / "missing.onnx"))
+        resp = client.post(
+            "/predict/mri",
+            json={"input_path": str(_FIXTURES / "mri_sample" / "subject_0.nii.gz")},
+        )
+        assert resp.status_code == 503
+        assert "MRI model artifact not available" in resp.text
+    def test_returns_404_when_input_missing(self, tmp_path: Path, monkeypatch):
+        from tests.fixtures.build_dummy_mri_onnx import build as build_dummy_mri_onnx
+        artifact = build_dummy_mri_onnx(tmp_path / "mri_model.onnx")
+        monkeypatch.setenv("MRI_MODEL_PATH", str(artifact))
+        resp = client.post(
+            "/predict/mri",
+            json={"input_path": str(tmp_path / "missing.nii.gz"), "target_shape": [8, 8, 8]},
+        )
+        assert resp.status_code == 404
+    def test_returns_200_with_prediction(self, tmp_path: Path, monkeypatch):
+        from tests.fixtures.build_dummy_mri_onnx import build as build_dummy_mri_onnx
+        artifact = build_dummy_mri_onnx(tmp_path / "mri_model.onnx")
+        monkeypatch.setenv("MRI_MODEL_PATH", str(artifact))
+        resp = client.post(
+            "/predict/mri",
+            json={
+                "input_path": str(_FIXTURES / "mri_sample" / "subject_0.nii.gz"),
+                "target_shape": [8, 8, 8],
+                "label_names": ["control", "abnormal"],
+            },
+        )
+        assert resp.status_code == 200, resp.text
+        body = resp.json()
+        assert body["label"] == 1
+        assert body["label_text"] == "abnormal"
+        assert body["confidence"] > 0.5
+        assert body["input_path"].endswith("subject_0.nii.gz")
+        assert body["model_path"] == str(artifact)
+        assert len(body["probabilities"]) == 2
 class TestMRIDiagnosticsRoute:
     def test_returns_200_with_pre_and_post_data(self, tmp_path: Path):
         from tests.fixtures.build_mri_fixture import build as build_mri

tests/fixtures/build_dummy_mri_onnx.py ADDED Viewed

	@@ -0,0 +1,20 @@

+"""Build a tiny ONNX MRI classifier fixture for API/model tests."""
+from __future__ import annotations
+from pathlib import Path
+def build(path: Path, logits: tuple[float, float] = (0.1, 2.0)) -> Path:
+    """Write an ONNX model that returns constant logits for any MRI tensor."""
+    import onnx
+    from onnx import TensorProto, helper
+    input_info = helper.make_tensor_value_info("input", TensorProto.FLOAT, [1, 1, 8, 8, 8])
+    output_info = helper.make_tensor_value_info("logits", TensorProto.FLOAT, [1, 2])
+    value = helper.make_tensor("const_logits", TensorProto.FLOAT, [1, 2], list(logits))
+    node = helper.make_node("Constant", inputs=[], outputs=["logits"], value=value)
+    graph = helper.make_graph([node], "dummy_mri_classifier", [input_info], [output_info])
+    model = helper.make_model(graph, opset_imports=[helper.make_opsetid("", 13)])
+    model.ir_version = 10
+    onnx.save(model, path)
+    return path

tests/models/test_mri_model.py ADDED Viewed

	@@ -0,0 +1,54 @@

+"""Tests for src.models.mri_model — image-based MRI DL inference surface."""
+from __future__ import annotations
+from pathlib import Path
+import numpy as np
+import pytest
+from src.models import mri_model
+from tests.fixtures.build_dummy_mri_onnx import build as build_dummy_mri_onnx
+_FIXTURE_MRI = Path(__file__).resolve().parents[1] / "fixtures" / "mri_sample" / "subject_0.nii.gz"
+class TestMRIDLModel:
+    def test_preprocess_volume_returns_batch_channel_tensor(self) -> None:
+        volume = np.ones((4, 5, 6), dtype=np.float32)
+        volume[1:3, 1:4, 2:5] = 5.0
+        out = mri_model.preprocess_volume(volume, target_shape=(8, 8, 8))
+        assert out.shape == (1, 1, 8, 8, 8)
+        assert out.dtype == np.float32
+        assert np.all(np.isfinite(out))
+    def test_preprocess_rejects_nan_volume(self) -> None:
+        volume = np.zeros((4, 4, 4), dtype=np.float32)
+        volume[0, 0, 0] = np.nan
+        with pytest.raises(ValueError, match="finite numeric 3-D"):
+            mri_model.preprocess_volume(volume, target_shape=(8, 8, 8))
+    def test_load_missing_artifact_raises(self, tmp_path: Path) -> None:
+        with pytest.raises(FileNotFoundError, match="MRI model artifact not found"):
+            mri_model.load(tmp_path / "missing.onnx")
+    def test_predict_nifti_with_dummy_onnx(self, tmp_path: Path) -> None:
+        artifact = build_dummy_mri_onnx(tmp_path / "mri_model.onnx")
+        model = mri_model.load(artifact)
+        result = mri_model.predict_nifti(
+            model,
+            _FIXTURE_MRI,
+            target_shape=(8, 8, 8),
+            label_names=("control", "abnormal"),
+        )
+        assert result["label"] == 1
+        assert result["label_text"] == "abnormal"
+        assert result["confidence"] > 0.5
+        probs = result["probabilities"]
+        assert len(probs) == 2
+        assert sum(p["probability"] for p in probs) == pytest.approx(1.0, abs=1e-6)