docs: Day-4 close-out — AGENTS §7 tracking, README MLOps surface
Browse files
AGENTS.md
CHANGED
|
@@ -27,18 +27,36 @@ All experiment runs are tracked in **MLflow**. All services ship as **Docker** i
|
|
| 27 |
```
|
| 28 |
.
|
| 29 |
├── AGENTS.md # This file
|
|
|
|
| 30 |
├── requirements.txt
|
| 31 |
├── pytest.ini
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
├── data/
|
| 33 |
│ ├── raw/ # Untouched source data. NEVER train on this directly.
|
| 34 |
│ └── processed/ # Pipeline output as Parquet (preserves dtypes; overwritten each run; see §4).
|
| 35 |
├── src/
|
| 36 |
-
│ ├── api/ # FastAPI
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
│ ├── pipelines/ # One file per modality. Pure functions + a `run_pipeline()` entry.
|
| 38 |
-
│ └──
|
|
|
|
| 39 |
└── tests/
|
| 40 |
├── core/
|
| 41 |
-
├──
|
|
|
|
|
|
|
| 42 |
└── fixtures/ # Tiny synthetic data files used by tests (NOT a Python package — no __init__.py)
|
| 43 |
```
|
| 44 |
|
|
@@ -109,3 +127,18 @@ All `data/processed/` outputs MUST be **Parquet** (`pyarrow` engine, `compressio
|
|
| 109 |
- Read with `pd.read_parquet(path)`; no dtype hints required.
|
| 110 |
|
| 111 |
The raw `data/raw/` inputs may be in any vendor-supplied format (CSV for BBBP, EDF/FIF for EEG, NIfTI for MRI).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
```
|
| 28 |
.
|
| 29 |
├── AGENTS.md # This file
|
| 30 |
+
├── README.md
|
| 31 |
├── requirements.txt
|
| 32 |
├── pytest.ini
|
| 33 |
+
├── conftest.py # Repo-wide pytest fixtures (autouse: pins MLFLOW_TRACKING_URI to tmp dir for test isolation)
|
| 34 |
+
├── Dockerfile # Production image (FastAPI + pipelines)
|
| 35 |
+
├── docker-compose.yml # api + mlflow services for local stack
|
| 36 |
+
├── .dockerignore
|
| 37 |
+
├── .streamlit/
|
| 38 |
+
│ └── config.toml # Streamlit theme tokens
|
| 39 |
├── data/
|
| 40 |
│ ├── raw/ # Untouched source data. NEVER train on this directly.
|
| 41 |
│ └── processed/ # Pipeline output as Parquet (preserves dtypes; overwritten each run; see §4).
|
| 42 |
├── src/
|
| 43 |
+
│ ├── api/ # FastAPI surface
|
| 44 |
+
│ │ ├── main.py # App factory + /health
|
| 45 |
+
│ │ ├── routes.py # POST /pipeline/{bbb,eeg,mri} dispatch
|
| 46 |
+
│ │ └── schemas.py # Shared Pydantic request/response models
|
| 47 |
+
│ ├── core/ # Cross-cutting utilities
|
| 48 |
+
│ │ ├── logger.py # Structured logger (mandatory in every pipeline)
|
| 49 |
+
│ │ ├── determinism.py # Thread-pin env vars (OMP/OPENBLAS/MKL/pyarrow)
|
| 50 |
+
│ │ ├── storage.py # Parquet read/write helpers (snappy, single-threaded, deterministic)
|
| 51 |
+
│ │ └── tracking.py # MLflow `track_pipeline_run` context manager (see §7)
|
| 52 |
│ ├── pipelines/ # One file per modality. Pure functions + a `run_pipeline()` entry.
|
| 53 |
+
│ └── frontend/
|
| 54 |
+
│ └── app.py # Streamlit dashboard (3 tabs, one per modality)
|
| 55 |
└── tests/
|
| 56 |
├── core/
|
| 57 |
+
├── api/
|
| 58 |
+
├── frontend/
|
| 59 |
+
├── pipelines/ # incl. test_cross_pipeline_smoke.py for integration coverage
|
| 60 |
└── fixtures/ # Tiny synthetic data files used by tests (NOT a Python package — no __init__.py)
|
| 61 |
```
|
| 62 |
|
|
|
|
| 127 |
- Read with `pd.read_parquet(path)`; no dtype hints required.
|
| 128 |
|
| 129 |
The raw `data/raw/` inputs may be in any vendor-supplied format (CSV for BBBP, EDF/FIF for EEG, NIfTI for MRI).
|
| 130 |
+
|
| 131 |
+
## 7. Experiment Tracking
|
| 132 |
+
|
| 133 |
+
Every `run_pipeline()` invocation logs to MLflow via `src.core.tracking.track_pipeline_run`:
|
| 134 |
+
|
| 135 |
+
- **Experiment names** match the pipeline module: `bbb_pipeline`, `eeg_pipeline`, `mri_pipeline`.
|
| 136 |
+
- **Params**: input/output paths and pipeline hyperparameters (e.g. BBB `n_bits` / `radius`, EEG `epoch_duration_s` / `random_state`, MRI `intensity_threshold` / `n_roi_axes`).
|
| 137 |
+
- **Metrics**: row counts (`rows_in`, `rows_out`, `rows_dropped` — or modality equivalent like `subjects_in/out/dropped`) and `duration_sec`.
|
| 138 |
+
- **Artifact**: the produced Parquet at `data/processed/<modality>_features.parquet`.
|
| 139 |
+
|
| 140 |
+
The tracking URI is read from `MLFLOW_TRACKING_URI` (defaults to `./mlruns/` when unset).
|
| 141 |
+
|
| 142 |
+
**Live-demo lifeline**: set `NEUROBRIDGE_DISABLE_MLFLOW=1` to skip tracking entirely — the helper yields `None` and emits no MLflow calls. Use this when the tracking server is unreachable (offline demo, network outage, or CI without an MLflow service). Pipelines complete normally; only the run metadata is lost.
|
| 143 |
+
|
| 144 |
+
The repo-wide `conftest.py` autouse fixture pins `MLFLOW_TRACKING_URI` to a tmp directory for the test session, so the production `mlruns/` directory is never written by the test suite. Tests that interact with MLflow (in `tests/core/test_tracking.py` and the per-pipeline `Test<Modality>PipelineMLflow` classes) all share this isolated store.
|
README.md
CHANGED
|
@@ -13,6 +13,7 @@ and Docker shipping.
|
|
| 13 |
| 1 | Tabular (BBB / molecules) | [`bbb_pipeline.py`](src/pipelines/bbb_pipeline.py) | Shipped — 30 tests green |
|
| 14 |
| 2 | Signal (EEG) | [`eeg_pipeline.py`](src/pipelines/eeg_pipeline.py) | Shipped — 67 tests green |
|
| 15 |
| 3 | Image (MRI / fMRI) | [`mri_pipeline.py`](src/pipelines/mri_pipeline.py) | Shipped — 106 tests green |
|
|
|
|
| 16 |
|
| 17 |
## Quick Start
|
| 18 |
|
|
@@ -58,6 +59,19 @@ Result lives at `data/processed/mri_features.parquet` (48 ROI features per subje
|
|
| 58 |
> [Kaggle](https://www.kaggle.com/datasets/priyanagda/bbbp) or
|
| 59 |
> [MoleculeNet](https://moleculenet.org/datasets-1); place as `data/raw/bbbp.csv`.
|
| 60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
## Repository Layout
|
| 62 |
|
| 63 |
```text
|
|
@@ -139,8 +153,7 @@ finishes in under 4 seconds on a 2024 laptop.
|
|
| 139 |
|
| 140 |
- **Day 2 (shipped):** `eeg_pipeline.py` — bandpass + MNE ICA artifact removal + PSD + statistical features → Parquet.
|
| 141 |
- **Day 3 (shipped):** `mri_pipeline.py` — NIfTI volume loading, brain masking, ROI feature extraction, ComBat harmonization (`neuroHarmonize`) for site-level domain shift → Parquet (48 features, 106 tests green).
|
| 142 |
-
- **Day 4
|
| 143 |
-
CI.
|
| 144 |
|
| 145 |
## Where to Look
|
| 146 |
|
|
@@ -152,3 +165,9 @@ finishes in under 4 seconds on a 2024 laptop.
|
|
| 152 |
- **EEG pipeline:** [`src/pipelines/eeg_pipeline.py`](src/pipelines/eeg_pipeline.py) + [`tests/pipelines/test_eeg_pipeline.py`](tests/pipelines/test_eeg_pipeline.py)
|
| 153 |
- **Day-3 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-01-day3-mri-combat-pipeline.md`](docs/superpowers/plans/2026-05-01-day3-mri-combat-pipeline.md)
|
| 154 |
- **MRI pipeline:** [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py) + [`tests/pipelines/test_mri_pipeline.py`](tests/pipelines/test_mri_pipeline.py)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
| 1 | Tabular (BBB / molecules) | [`bbb_pipeline.py`](src/pipelines/bbb_pipeline.py) | Shipped — 30 tests green |
|
| 14 |
| 2 | Signal (EEG) | [`eeg_pipeline.py`](src/pipelines/eeg_pipeline.py) | Shipped — 67 tests green |
|
| 15 |
| 3 | Image (MRI / fMRI) | [`mri_pipeline.py`](src/pipelines/mri_pipeline.py) | Shipped — 106 tests green |
|
| 16 |
+
| 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
|
| 17 |
|
| 18 |
## Quick Start
|
| 19 |
|
|
|
|
| 59 |
> [Kaggle](https://www.kaggle.com/datasets/priyanagda/bbbp) or
|
| 60 |
> [MoleculeNet](https://moleculenet.org/datasets-1); place as `data/raw/bbbp.csv`.
|
| 61 |
|
| 62 |
+
### Run the full stack with Docker
|
| 63 |
+
|
| 64 |
+
```bash
|
| 65 |
+
docker compose up
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
Then browse to:
|
| 69 |
+
- **FastAPI Swagger** — <http://localhost:8000/docs>
|
| 70 |
+
- **Streamlit dashboard** — `streamlit run src/frontend/app.py` (port 8501; not in compose by default)
|
| 71 |
+
- **MLflow UI** — <http://localhost:5000>
|
| 72 |
+
|
| 73 |
+
Live-demo robustness: if the MLflow service is unreachable, set `NEUROBRIDGE_DISABLE_MLFLOW=1` to make the pipelines run without tracking.
|
| 74 |
+
|
| 75 |
## Repository Layout
|
| 76 |
|
| 77 |
```text
|
|
|
|
| 153 |
|
| 154 |
- **Day 2 (shipped):** `eeg_pipeline.py` — bandpass + MNE ICA artifact removal + PSD + statistical features → Parquet.
|
| 155 |
- **Day 3 (shipped):** `mri_pipeline.py` — NIfTI volume loading, brain masking, ROI feature extraction, ComBat harmonization (`neuroHarmonize`) for site-level domain shift → Parquet (48 features, 106 tests green).
|
| 156 |
+
- **Day 4 (shipped):** FastAPI surface in `src/api/` (POST `/pipeline/{bbb,eeg,mri}` + `/health`), MLflow experiment tracking via `src.core.tracking` (see AGENTS.md §7), Streamlit dashboard at `src/frontend/app.py`, and Docker / `docker-compose.yml` for the api + MLflow stack — 142 tests green.
|
|
|
|
| 157 |
|
| 158 |
## Where to Look
|
| 159 |
|
|
|
|
| 165 |
- **EEG pipeline:** [`src/pipelines/eeg_pipeline.py`](src/pipelines/eeg_pipeline.py) + [`tests/pipelines/test_eeg_pipeline.py`](tests/pipelines/test_eeg_pipeline.py)
|
| 166 |
- **Day-3 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-01-day3-mri-combat-pipeline.md`](docs/superpowers/plans/2026-05-01-day3-mri-combat-pipeline.md)
|
| 167 |
- **MRI pipeline:** [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py) + [`tests/pipelines/test_mri_pipeline.py`](tests/pipelines/test_mri_pipeline.py)
|
| 168 |
+
- **Day-4 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-02-day4-api-mlops-frontend.md`](docs/superpowers/plans/2026-05-02-day4-api-mlops-frontend.md)
|
| 169 |
+
- **Shared core helpers:** [`src/core/determinism.py`](src/core/determinism.py), [`src/core/storage.py`](src/core/storage.py), [`src/core/tracking.py`](src/core/tracking.py)
|
| 170 |
+
- **FastAPI surface:** [`src/api/main.py`](src/api/main.py), [`src/api/routes.py`](src/api/routes.py), [`src/api/schemas.py`](src/api/schemas.py)
|
| 171 |
+
- **Streamlit dashboard:** [`src/frontend/app.py`](src/frontend/app.py)
|
| 172 |
+
- **Container stack:** [`Dockerfile`](Dockerfile), [`docker-compose.yml`](docker-compose.yml)
|
| 173 |
+
- **Day-4 tests:** [`tests/api/`](tests/api/), [`tests/frontend/`](tests/frontend/), [`tests/pipelines/test_cross_pipeline_smoke.py`](tests/pipelines/test_cross_pipeline_smoke.py)
|