File size: 21,852 Bytes
29e929f b9e6d2f 29e929f d3d1ac7 29e929f d3d1ac7 29e929f 915880e 29e929f d3d1ac7 29e929f c0a7163 d3d1ac7 c0a7163 29e929f d3d1ac7 29e929f b06b105 29e929f 7edd13e 29e929f 1a15285 b9e6d2f 29e929f b06b105 29e929f 915880e ea055f0 915880e d3d1ac7 53256ed c0a7163 53256ed c0a7163 53256ed c0a7163 53256ed c0a7163 53256ed c0a7163 53256ed d05fcf1 c0a7163 d05fcf1 3c2d45f 3acc658 3c2d45f 3acc658 3c2d45f 3acc658 3c2d45f 3f6ac7b 3acc658 3f6ac7b c0a7163 3f6ac7b c0a7163 3acc658 c0a7163 3f6ac7b 35ff61e c0a7163 35ff61e c0a7163 35ff61e c0a7163 35ff61e c0a7163 35ff61e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 | # AGENTS.md — NeuroBridge Enterprise Pipeline
> Read this file at the start of every session. It is the contract every agent
> (human or LLM) operates under in this repository.
## 1. Project Vision
**NeuroBridge Enterprise** is a B2B SaaS platform that solves three structural
problems in real-world clinical/biomedical ML pipelines:
1. **Data Drift** between hospitals and acquisition sites (multi-center MRI).
2. **Missing Modalities** (a patient may have MRI but no EEG, or vice versa).
3. **Artifacts** in raw biosignals (eye blinks, line noise, motion in EEG).
The platform exposes three production pipelines behind a single FastAPI surface:
| Modality | Pipeline | Core Technique |
|---|---|---|
| Image (MRI / fMRI) | `src/pipelines/mri_pipeline.py` | ComBat Harmonization for site-level domain shift |
| Signal (EEG) | `src/pipelines/eeg_pipeline.py` | MNE-Python + ICA for artifact removal |
| Tabular (BBB / molecules) | `src/pipelines/bbb_pipeline.py` | RDKit Morgan fingerprints from SMILES |
All experiment runs are tracked in **MLflow**. All services ship as **Docker** images.
## 2. Directory Layout (load-bearing — do not violate)
```
.
├── AGENTS.md # This file
├── README.md
├── requirements.txt
├── pytest.ini
├── conftest.py # Repo-wide pytest fixtures (autouse: pins MLFLOW_TRACKING_URI to tmp dir for test isolation)
├── Dockerfile # Production image (FastAPI + pipelines)
├── docker-compose.yml # api + mlflow services for local stack
├── .dockerignore
├── .streamlit/
│ └── config.toml # Streamlit theme tokens
├── data/
│ ├── raw/ # Untouched source data. NEVER train on this directly.
│ └── processed/ # Pipeline output as Parquet (preserves dtypes; overwritten each run; see §4).
├── src/
│ ├── api/ # FastAPI surface
│ │ ├── main.py # App factory + /health
│ │ ├── routes.py # POST /pipeline/{bbb,eeg,mri} dispatch
│ │ └── schemas.py # Shared Pydantic request/response models
│ ├── core/ # Cross-cutting utilities
│ │ ├── logger.py # Structured logger (mandatory in every pipeline)
│ │ ├── determinism.py # Thread-pin env vars (OMP/OPENBLAS/MKL/pyarrow)
│ │ ├── storage.py # Parquet read/write helpers (snappy, single-threaded, deterministic)
│ │ └── tracking.py # MLflow `track_pipeline_run` context manager (see §7)
│ ├── pipelines/ # One file per modality. Pure functions + a `run_pipeline()` entry.
│ ├── models/ # Downstream decision-layer models
│ │ ├── bbb_model.py # BBB-permeability classifier + SHAP explainer + trainer CLI
│ │ └── mri_model.py # Volumetric MRI ONNX inference surface (external training)
│ ├── llm/ # Natural-language explainers (template + OpenRouter fallback)
│ ├── rag/ # Fastembed + FAISS retrieval layer
│ ├── agents/ # Tool registry + guarded OpenRouter orchestrator
│ └── frontend/
│ └── app.py # Streamlit dashboard
└── tests/
├── core/
├── api/
├── frontend/
├── pipelines/ # incl. test_cross_pipeline_smoke.py for integration coverage
└── fixtures/ # Tiny synthetic data files used by tests (NOT a Python package — no __init__.py)
```
**Rules:**
- New modality → new file under `src/pipelines/`. No mixing modalities in one file.
- Anything imported by 2+ pipelines → `src/core/`.
- Pipeline code (`src/pipelines/`, `src/core/`) must not read from or write to any path outside `data/`. Test code may read `tests/fixtures/`. The `data/` boundary is the storage contract for production data.
- `tests/fixtures/` holds CSV / numpy / DICOM blobs — do **not** add an `__init__.py` there.
## 3. Coding Standards
- **Python 3.10–3.12** (the pinned native-extension dependencies do not yet ship cp313+ wheels). Use `from __future__ import annotations` when needed for forward refs.
- **Type hints are mandatory** on every public function/method (parameters and return).
- **Modular structure.** One responsibility per function. If a function exceeds ~40 lines or 3 levels of nesting, split it.
- **TDD is the default workflow.** Write the failing test first, watch it fail, then implement. Tests live in `tests/` mirroring `src/`.
- **Logging is mandatory** for every pipeline. Use `src.core.logger.get_logger(__name__)`. No `print()` in `src/`.
- **Docstrings** on every public function — one-line summary + Args/Returns when non-trivial.
- **No hard-coded paths in business logic.** Pass paths as arguments to `run_pipeline(input_path, output_path)`.
- **Format & lint:** keep imports sorted; prefer `pathlib.Path` over `os.path`.
- **Commits are small and frequent.** Each green test → commit.
## 4. Data Readiness Principles
> **The Golden Rule: never train a model directly on raw data. Raw data must always pass through a pipeline first.**
Every modality pipeline MUST guarantee, before writing to `data/processed/`:
1. **Schema validity** — required columns present, expected dtypes.
2. **Domain validity** — invalid records (e.g. unparseable SMILES, NaN-only EEG epochs, corrupted DICOMs) are **logged with their identifier and dropped**, never silently coerced.
3. **Determinism** — given the same `data/raw/` input, the pipeline produces byte-identical `data/processed/` output. No wall-clock, no random seeds without explicit seeding.
4. **Traceability** — log row count in, row count out, and percentage dropped at INFO level.
5. **Idempotence** — re-running the pipeline overwrites `data/processed/` cleanly; no append, no partial writes.
**Determinism environment**: byte-identical output requires deterministic
floating-point reductions. Each pipeline module sets `OMP_NUM_THREADS=1`,
`OPENBLAS_NUM_THREADS=1`, `MKL_NUM_THREADS=1`, and pins pyarrow to
single-threaded mode at import time. CI runners and developer machines do
not need to set these manually — the pipeline modules handle it — but
overriding them in the environment will break Determinism rule 3.
**ComBat determinism boundary**: the MRI pipeline's `harmonize_combat` wraps
`neuroHarmonize.harmonizationLearn` and applies `np.round(14)` to its output.
This is a defensive measure: with the thread-pinning above, harmonization is
already bit-identical, but the rounding guarantees byte-identity even when
the env-pin discipline is bypassed (e.g. a sub-process that re-exports a
thread count). It discards ~5 trailing-mantissa bits of float64 — well below
ComBat's biological effect-size precision floor.
A model training script is allowed to import from `data/processed/` only. If a
training script references `data/raw/` directly, that is a bug and must be
refactored into a pipeline.
## 5. How to Add a New Pipeline (checklist)
1. Add `tests/pipelines/test_<name>_pipeline.py` with the failing tests first.
2. Create `src/pipelines/<name>_pipeline.py` exposing `run_pipeline(input_path: Path, output_path: Path) -> None`.
3. Use `get_logger(__name__)` for all status output (per §3).
4. Apply the §4 Data Readiness contract: validate + drop invalid records with a logged WARNING (identifier + count), log row count in/out/dropped at INFO, write deterministically, and overwrite (do not append) on re-run.
5. Write deterministic output to `output_path`.
6. Document any new dependency in `requirements.txt` (pinned).
7. Add a one-line entry to this file's pipeline table.
## 6. Storage Format Convention
All `data/processed/` outputs MUST be **Parquet** (`pyarrow` engine, `compression="snappy"`):
- Preserves dtypes (uint8 fingerprints stay uint8; float64 EEG features stay float64) — CSV silently widens numeric columns and is unsuitable for the high-dimensional float arrays produced by the EEG and MRI pipelines.
- Byte-deterministic with fixed compression and single-threaded writes (satisfies §4 Determinism).
- Read with `pd.read_parquet(path)`; no dtype hints required.
The raw `data/raw/` inputs may be in any vendor-supplied format (CSV for BBBP, EDF/FIF for EEG, NIfTI for MRI).
## 7. Experiment Tracking
Every `run_pipeline()` invocation logs to MLflow via `src.core.tracking.track_pipeline_run`:
- **Experiment names** match the pipeline module: `bbb_pipeline`, `eeg_pipeline`, `mri_pipeline`.
- **Params**: input/output paths and pipeline hyperparameters (e.g. BBB `n_bits` / `radius`, EEG `epoch_duration_s` / `random_state`, MRI `intensity_threshold` / `n_roi_axes`).
- **Metrics**: row counts (`rows_in`, `rows_out`, `rows_dropped` — or modality equivalent like `subjects_in/out/dropped`) and `duration_sec`.
- **Artifact**: the produced Parquet at `data/processed/<modality>_features.parquet`.
The tracking URI is read from `MLFLOW_TRACKING_URI` (defaults to `./mlruns/` when unset).
**Live-demo lifeline**: set `NEUROBRIDGE_DISABLE_MLFLOW=1` to skip tracking entirely — the helper yields `None` and emits no MLflow calls. Use this when the tracking server is unreachable (offline demo, network outage, or CI without an MLflow service). Pipelines complete normally; only the run metadata is lost.
The repo-wide `conftest.py` autouse fixture pins `MLFLOW_TRACKING_URI` to a tmp directory for the test session, so the production `mlruns/` directory is never written by the test suite. Tests that interact with MLflow (in `tests/core/test_tracking.py` and the per-pipeline `Test<Modality>PipelineMLflow` classes) all share this isolated store.
## 8. Decision Layer (Downstream Models)
Pipelines produce features (`data/processed/<modality>_features.parquet`).
Downstream models live in `src/models/` and consume processed features or a
deterministic model-local preprocessing contract:
| Model | File | Output | Endpoint |
|---|---|---|---|
| BBB permeability | `src/models/bbb_model.py` | `data/processed/bbb_model.joblib` | `POST /predict/bbb` |
| MRI image classifier | `src/models/mri_model.py` | `data/processed/mri_model.onnx` | `POST /predict/mri` |
In-repo trainable downstream model modules expose a uniform surface:
- `train(df, label_col, ...)` → fitted classifier
- `save(model, path)` / `load(path)` → joblib artifact I/O
- `predict_with_proba(model, smiles)` → `{label, confidence}` (confidence is the max-class probability)
- `explain_prediction(model, smiles, top_k)` → SHAP top-k attributions sorted by `|shap_value|` descending
MRI DL exception: training happens outside this repo and exports ONNX, so it
does not expose `train()` or SHAP. Runtime
loads the ONNX artifact with `mri_model.load()`, preprocesses one NIfTI via the
same deterministic resize + z-score contract used during training
(`preprocess_nifti()`), then returns class probabilities via `predict_nifti()`.
The API loads model artifacts at request time. If an artifact is missing,
the endpoint returns **HTTP 503** with a remediation hint instead of failing
process startup. BBB points at the trainer CLI (`python -m src.models.bbb_model`);
MRI points at the external ONNX export path.
**Determinism**: all in-repo classifiers are seeded (`random_state=42`
default), `n_jobs=1` (no tree-parallelism races). Re-running the BBB trainer
on the same Parquet produces identical predictions. MRI ONNX determinism is
bounded by the exported model plus the fixed runtime preprocessing contract.
**Override `BBB_MODEL_PATH`** env var to point the API at a non-default
artifact location (used by tests for tmp_path isolation).
**Override `MRI_MODEL_PATH`** env var to point the API at a non-default ONNX
artifact location. If the ONNX artifact is missing, `POST /predict/mri`
returns **HTTP 503** with a remediation hint.
**Calibration metadata** (Day 6): `train()` does an 80/20 stratified split,
computes precision-at-confidence-threshold bins on the held-out test set,
and stashes them on `model._neurobridge_calibration: list[dict]` (sorted
ascending by threshold). The API includes the bin matching each
prediction's confidence in `BBBPredictResponse.calibration`. UI uses this
to render an honest trust caption ("≥75% confident → 92% precision, n=18").
For tiny test fixtures where stratified split fails, calibration falls
back to zero-support bins so the API contract is always populated.
## 9. Demo Features (Day 6)
The frontend includes three jury-day demo amplifiers that don't change
the core contract:
- **Edge-case dropdown** (BBB tab): a curated catalog of 5 robustness
probes — invalid SMILES, empty input, OOD macrocycle (cyclosporine-like),
heavy halogenated aromatic. Each has a stated expectation; the UI
visualizes graceful failure (HTTP 400 → recoverable warning, never
a crash).
- **Calibration trust caption** (BBB decision card): renders the
precision-at-confidence-threshold from `BBBPredictResponse.calibration`.
Demonstrates that the system knows what it doesn't know.
- **MRI ComBat diagnostics** (MRI tab): `POST /pipeline/mri/diagnostics`
runs the pipeline twice (pre + post ComBat) and returns long-format
data + site-gap KPIs (Pre, Post, Reduction factor). The UI renders
a faceted altair density plot — visual proof that ComBat removes
site-driven domain shift.
## 10. Drift Surface (Day 7)
Each predict route maintains a per-worker rolling window of recent
prediction confidences (`collections.deque(maxlen=100)`). Train-time
median + std are stashed on `model._neurobridge_train_stats` (joblib
roundtrip-safe). The drift z-score is `(rolling_median − train_median) /
max(train_std, 1e-9)`, computed only when the buffer holds ≥10 samples
AND the model has the train-stats attribute. The `/predict/bbb`
response carries `drift_z: float | None` and `rolling_n: int`. The UI
renders a one-line caption with a magnitude tag (in-band, mild,
significant). Worker restart clears the deque; this is acceptable for
demo and removes the audit-trail concern.
## 11. LLM Explainer Surface (Day 7 + 9)
`src/llm/explainer.py` is the single entry point for natural-language
rationales. `explain(payload)` always returns `{rationale, source,
model}`. The deterministic template path is the source of truth for
tests; the LLM path is OpenRouter via the `openai==1.51.0` SDK and
walks a **smartest → smallest free-tier fallback chain**
(`_DEFAULT_FREE_MODEL_CHAIN`, 10 ids — head: `inclusionai/ling-2.6-1t:free`).
The chain is overridable at runtime via `OPENROUTER_FREE_MODELS`
(comma-separated). Status-code classification:
- `401` → key is bad → bail to template + actionable WARNING (rotate at
https://openrouter.ai/keys, enable free-model data-sharing at
https://openrouter.ai/settings/privacy).
- `400` → prompt-shape mismatch on this model → advance to next.
- `402 / 403 / 404 / 429 / 5xx` → advance to next.
- Network/timeout → bail to template (switching models won't help).
Two env knobs control the gate:
- `OPENROUTER_API_KEY` — when absent, fallback to template.
- `NEUROBRIDGE_DISABLE_LLM=1` — hard kill-switch; force template even
if a key is set. Use this for demo days when you want fully
deterministic, reproducible rationales.
**Prompt design** (`_build_llm_prompt`): two intent modes. When the
caller supplies `user_question`, the model is instructed to
language-match (Turkish question → Turkish answer), answer the
question directly (not a canned paper-style summary), and respond
conversationally to off-topic / greeting questions. When no
`user_question` is supplied, falls back to the original 2-4 sentence
paper-style rationale.
The `POST /explain/bbb` endpoint mirrors this contract. Pydantic
enforces a non-empty `top_features` list (422 on empty); every other
failure mode degrades to template + WARNING log + `source="template"`.
**Diagnostics**: `GET /diag/openrouter` (`src/api/main.py`) returns
key-presence (length + 12-char prefix only), kill-switch state, chain
length, first model id, and the result of an 8-token probe call
against that model. Surfaced in Streamlit as the sidebar "🔧 Diagnose
LLM" button. Use it when the deployed Space shows `source="template"`
unexpectedly — the most common causes are a missing/misnamed
`OPENROUTER_API_KEY` Space secret or a revoked key.
## 12. Multi-Modal Explainer (Day 8)
`src/llm/explainer.py` exposes `explain(payload, modality)` where
`modality ∈ {"bbb", "eeg", "mri"}`. Each modality has its own
deterministic template (`_template_explain_bbb / _eeg / _mri`) and
its own LLM prompt header. Unknown modality strings degrade to the
BBB template with a warning log; the function never raises. The
hybrid OpenRouter fallback contract from §11 applies uniformly.
The API exposes three matching endpoints — `POST /explain/{bbb,eeg,mri}` —
each on the `explain_router` (`/explain` prefix). Streamlit surfaces
the BBB version in the AI Assistant tab and the EEG/MRI versions as
inline expanders inside their respective pipeline tabs.
## 13. Experiments Surface (Day 8)
`GET /experiments/runs` returns up to 50 most recent MLflow runs
across the bbb/eeg/mri experiments, flattened into a list of
`MLflowRunSummary` (run_id, experiment_name, start_time, status,
metrics, params). `POST /experiments/diff {run_id_a, run_id_b}`
returns a side-by-side metric+param diff (`RunDiffRow`).
When `NEUROBRIDGE_DISABLE_MLFLOW=1`, both endpoints return empty
responses without raising — useful for deployments where there is no
writable `mlruns/` tree or the tracking server is unavailable. Unknown
run ids → 404.
The Streamlit "Experiments" tab is the user-facing surface. Cached
in session state with an explicit Refresh button.
## 14. Deploy Surface (Day 8)
`Dockerfile.hf` is the Hugging Face Spaces image. Single container,
two processes (FastAPI :8000 + Streamlit :7860) launched via
`supervisord.conf`. Build-time `RUN python -m src.models.bbb_model`
bakes the BBB model artifact into the image so the first `/predict/bbb`
call is instant on cold start. Build-time RAG ingest creates
`data/processed/faiss_index/`.
`docker-entrypoint.sh` is the runtime guard for local Docker/Compose demos:
when a mounted `./data` volume hides image-built artifacts, it seeds fixture
raw data, rebuilds missing BBB features/model artifacts, and rebuilds the
FAISS index before starting supervisord. It does not bake
`NEUROBRIDGE_DISABLE_MLFLOW=1` into the image; operators may set that env at
runtime if their tracking service is unavailable.
Default environment: `DEPLOY_ENV=hf_spaces`. The LLM kill-switch is **not**
set — deployed Spaces use the real OpenRouter free-tier chain (§11) when
`OPENROUTER_API_KEY` is configured in the Space's Secrets panel. Set
`NEUROBRIDGE_DISABLE_LLM=1` only when you want to force the deterministic
template path for a fully-reproducible demo.
The README's YAML front-matter declares the Space metadata
(SDK=docker, port=7860, app_file=src/frontend/app.py).
## 15. Orchestrator Agent Surface
`src/agents/orchestrator.py` exposes a single-agent function-calling
loop over the openai SDK (no LangChain / framework dep). The API enables
the guarded workflow mode: if the LLM skips or mis-shapes a required tool
call, deterministic routing in `src/agents/routing.py` falls back to exactly
one pipeline tool, then exactly one retrieval tool, then final synthesis.
The agent holds 4 tools, defined in `src/agents/tools.py`:
- `run_bbb_pipeline(smiles, top_k)` — wraps `POST /predict/bbb`
- `run_eeg_pipeline(input_path)` — wraps `POST /pipeline/eeg`
- `run_mri_pipeline(input_dir, sites_csv=None)` — wraps `POST /pipeline/mri`
and defaults `sites_csv` to `<input_dir>/sites.csv`
- `retrieve_context(query, k)` — wraps `src/rag/retrieve.py`
The system prompt (`src/agents/prompts.py:ORCHESTRATOR_SYSTEM_PROMPT`)
describes the workflow: pick exactly one pipeline → run it → formulate a
focused retrieval query → call retrieve_context → synthesize a 3-5 sentence
response that cites at least one chunk. The API-side workflow guard enforces
that order in code; the prompt is guidance, not the only control plane.
Language of the final response is mirrored from the user's question.
`POST /agent/run` is the public surface. It accepts `user_input`,
optional `user_question`, and optional MRI `sites_csv`. Default model is
`google/gemini-2.0-flash-exp:free` on OpenRouter (function-calling support
verified). Override via `NEUROBRIDGE_AGENT_MODEL` env var. Returns 503 when
`OPENROUTER_API_KEY` is unset.
Diagnostics: `GET /diag/agent` returns key presence, configured model,
RAG index status (chunk count), and the registered tool names.
## 16. RAG Surface
`src/rag/` is the retrieval layer. Stack: `fastembed`
(`BAAI/bge-small-en-v1.5`, 384-dim, ONNX, no torch dep) for
embeddings + `faiss-cpu` (`IndexFlatIP` after L2-norm = cosine) for
vector search.
Knowledge base lives at `data/knowledge_base/` (gitignored;
user-supplied `.md` / `.txt` / `.pdf`). Build the FAISS index with:
python -m src.rag.ingest [<input_dir> [<output_dir>]]
Defaults: input=`data/knowledge_base/`, output=`data/processed/faiss_index/`.
The Dockerfile runs this at build time so deployed Spaces start with
a populated index. `docker-entrypoint.sh` also rebuilds the index at
startup when a mounted `data/` volume hides the image-built artifacts.
Empty KB → empty index → `retrieve_context` returns 0 chunks; the agent
surfaces this and answers from the pipeline result alone.
`tests/fixtures/kb_sample/` ships 3 seed markdown files (Lipinski,
ComBat, MNE+ICA) — these double as test fixtures and as the demo
seed if no user-supplied PDFs are added.
|