mekosotto commited on
Commit
d3d1ac7
·
1 Parent(s): ef4cf4a

docs: Day-4 close-out — AGENTS §7 tracking, README MLOps surface

Browse files
Files changed (2) hide show
  1. AGENTS.md +36 -3
  2. README.md +21 -2
AGENTS.md CHANGED
@@ -27,18 +27,36 @@ All experiment runs are tracked in **MLflow**. All services ship as **Docker** i
27
  ```
28
  .
29
  ├── AGENTS.md # This file
 
30
  ├── requirements.txt
31
  ├── pytest.ini
 
 
 
 
 
 
32
  ├── data/
33
  │ ├── raw/ # Untouched source data. NEVER train on this directly.
34
  │ └── processed/ # Pipeline output as Parquet (preserves dtypes; overwritten each run; see §4).
35
  ├── src/
36
- │ ├── api/ # FastAPI routers, request/response schemas
 
 
 
 
 
 
 
 
37
  │ ├── pipelines/ # One file per modality. Pure functions + a `run_pipeline()` entry.
38
- │ └── core/ # Cross-cutting utilities: logging, config (MLflow helpers planned)
 
39
  └── tests/
40
  ├── core/
41
- ├── pipelines/
 
 
42
  └── fixtures/ # Tiny synthetic data files used by tests (NOT a Python package — no __init__.py)
43
  ```
44
 
@@ -109,3 +127,18 @@ All `data/processed/` outputs MUST be **Parquet** (`pyarrow` engine, `compressio
109
  - Read with `pd.read_parquet(path)`; no dtype hints required.
110
 
111
  The raw `data/raw/` inputs may be in any vendor-supplied format (CSV for BBBP, EDF/FIF for EEG, NIfTI for MRI).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ```
28
  .
29
  ├── AGENTS.md # This file
30
+ ├── README.md
31
  ├── requirements.txt
32
  ├── pytest.ini
33
+ ├── conftest.py # Repo-wide pytest fixtures (autouse: pins MLFLOW_TRACKING_URI to tmp dir for test isolation)
34
+ ├── Dockerfile # Production image (FastAPI + pipelines)
35
+ ├── docker-compose.yml # api + mlflow services for local stack
36
+ ├── .dockerignore
37
+ ├── .streamlit/
38
+ │ └── config.toml # Streamlit theme tokens
39
  ├── data/
40
  │ ├── raw/ # Untouched source data. NEVER train on this directly.
41
  │ └── processed/ # Pipeline output as Parquet (preserves dtypes; overwritten each run; see §4).
42
  ├── src/
43
+ │ ├── api/ # FastAPI surface
44
+ │ │ ├── main.py # App factory + /health
45
+ │ │ ├── routes.py # POST /pipeline/{bbb,eeg,mri} dispatch
46
+ │ │ └── schemas.py # Shared Pydantic request/response models
47
+ │ ├── core/ # Cross-cutting utilities
48
+ │ │ ├── logger.py # Structured logger (mandatory in every pipeline)
49
+ │ │ ├── determinism.py # Thread-pin env vars (OMP/OPENBLAS/MKL/pyarrow)
50
+ │ │ ├── storage.py # Parquet read/write helpers (snappy, single-threaded, deterministic)
51
+ │ │ └── tracking.py # MLflow `track_pipeline_run` context manager (see §7)
52
  │ ├── pipelines/ # One file per modality. Pure functions + a `run_pipeline()` entry.
53
+ │ └── frontend/
54
+ │ └── app.py # Streamlit dashboard (3 tabs, one per modality)
55
  └── tests/
56
  ├── core/
57
+ ├── api/
58
+ ├── frontend/
59
+ ├── pipelines/ # incl. test_cross_pipeline_smoke.py for integration coverage
60
  └── fixtures/ # Tiny synthetic data files used by tests (NOT a Python package — no __init__.py)
61
  ```
62
 
 
127
  - Read with `pd.read_parquet(path)`; no dtype hints required.
128
 
129
  The raw `data/raw/` inputs may be in any vendor-supplied format (CSV for BBBP, EDF/FIF for EEG, NIfTI for MRI).
130
+
131
+ ## 7. Experiment Tracking
132
+
133
+ Every `run_pipeline()` invocation logs to MLflow via `src.core.tracking.track_pipeline_run`:
134
+
135
+ - **Experiment names** match the pipeline module: `bbb_pipeline`, `eeg_pipeline`, `mri_pipeline`.
136
+ - **Params**: input/output paths and pipeline hyperparameters (e.g. BBB `n_bits` / `radius`, EEG `epoch_duration_s` / `random_state`, MRI `intensity_threshold` / `n_roi_axes`).
137
+ - **Metrics**: row counts (`rows_in`, `rows_out`, `rows_dropped` — or modality equivalent like `subjects_in/out/dropped`) and `duration_sec`.
138
+ - **Artifact**: the produced Parquet at `data/processed/<modality>_features.parquet`.
139
+
140
+ The tracking URI is read from `MLFLOW_TRACKING_URI` (defaults to `./mlruns/` when unset).
141
+
142
+ **Live-demo lifeline**: set `NEUROBRIDGE_DISABLE_MLFLOW=1` to skip tracking entirely — the helper yields `None` and emits no MLflow calls. Use this when the tracking server is unreachable (offline demo, network outage, or CI without an MLflow service). Pipelines complete normally; only the run metadata is lost.
143
+
144
+ The repo-wide `conftest.py` autouse fixture pins `MLFLOW_TRACKING_URI` to a tmp directory for the test session, so the production `mlruns/` directory is never written by the test suite. Tests that interact with MLflow (in `tests/core/test_tracking.py` and the per-pipeline `Test<Modality>PipelineMLflow` classes) all share this isolated store.
README.md CHANGED
@@ -13,6 +13,7 @@ and Docker shipping.
13
  | 1 | Tabular (BBB / molecules) | [`bbb_pipeline.py`](src/pipelines/bbb_pipeline.py) | Shipped — 30 tests green |
14
  | 2 | Signal (EEG) | [`eeg_pipeline.py`](src/pipelines/eeg_pipeline.py) | Shipped — 67 tests green |
15
  | 3 | Image (MRI / fMRI) | [`mri_pipeline.py`](src/pipelines/mri_pipeline.py) | Shipped — 106 tests green |
 
16
 
17
  ## Quick Start
18
 
@@ -58,6 +59,19 @@ Result lives at `data/processed/mri_features.parquet` (48 ROI features per subje
58
  > [Kaggle](https://www.kaggle.com/datasets/priyanagda/bbbp) or
59
  > [MoleculeNet](https://moleculenet.org/datasets-1); place as `data/raw/bbbp.csv`.
60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  ## Repository Layout
62
 
63
  ```text
@@ -139,8 +153,7 @@ finishes in under 4 seconds on a 2024 laptop.
139
 
140
  - **Day 2 (shipped):** `eeg_pipeline.py` — bandpass + MNE ICA artifact removal + PSD + statistical features → Parquet.
141
  - **Day 3 (shipped):** `mri_pipeline.py` — NIfTI volume loading, brain masking, ROI feature extraction, ComBat harmonization (`neuroHarmonize`) for site-level domain shift → Parquet (48 features, 106 tests green).
142
- - **Day 4+:** FastAPI surface in `src/api/`, MLflow experiment tracking, Docker images,
143
- CI.
144
 
145
  ## Where to Look
146
 
@@ -152,3 +165,9 @@ finishes in under 4 seconds on a 2024 laptop.
152
  - **EEG pipeline:** [`src/pipelines/eeg_pipeline.py`](src/pipelines/eeg_pipeline.py) + [`tests/pipelines/test_eeg_pipeline.py`](tests/pipelines/test_eeg_pipeline.py)
153
  - **Day-3 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-01-day3-mri-combat-pipeline.md`](docs/superpowers/plans/2026-05-01-day3-mri-combat-pipeline.md)
154
  - **MRI pipeline:** [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py) + [`tests/pipelines/test_mri_pipeline.py`](tests/pipelines/test_mri_pipeline.py)
 
 
 
 
 
 
 
13
  | 1 | Tabular (BBB / molecules) | [`bbb_pipeline.py`](src/pipelines/bbb_pipeline.py) | Shipped — 30 tests green |
14
  | 2 | Signal (EEG) | [`eeg_pipeline.py`](src/pipelines/eeg_pipeline.py) | Shipped — 67 tests green |
15
  | 3 | Image (MRI / fMRI) | [`mri_pipeline.py`](src/pipelines/mri_pipeline.py) | Shipped — 106 tests green |
16
+ | 4 | API + MLOps + Frontend | FastAPI + MLflow + Streamlit + Docker | Shipped — 142 tests green |
17
 
18
  ## Quick Start
19
 
 
59
  > [Kaggle](https://www.kaggle.com/datasets/priyanagda/bbbp) or
60
  > [MoleculeNet](https://moleculenet.org/datasets-1); place as `data/raw/bbbp.csv`.
61
 
62
+ ### Run the full stack with Docker
63
+
64
+ ```bash
65
+ docker compose up
66
+ ```
67
+
68
+ Then browse to:
69
+ - **FastAPI Swagger** — <http://localhost:8000/docs>
70
+ - **Streamlit dashboard** — `streamlit run src/frontend/app.py` (port 8501; not in compose by default)
71
+ - **MLflow UI** — <http://localhost:5000>
72
+
73
+ Live-demo robustness: if the MLflow service is unreachable, set `NEUROBRIDGE_DISABLE_MLFLOW=1` to make the pipelines run without tracking.
74
+
75
  ## Repository Layout
76
 
77
  ```text
 
153
 
154
  - **Day 2 (shipped):** `eeg_pipeline.py` — bandpass + MNE ICA artifact removal + PSD + statistical features → Parquet.
155
  - **Day 3 (shipped):** `mri_pipeline.py` — NIfTI volume loading, brain masking, ROI feature extraction, ComBat harmonization (`neuroHarmonize`) for site-level domain shift → Parquet (48 features, 106 tests green).
156
+ - **Day 4 (shipped):** FastAPI surface in `src/api/` (POST `/pipeline/{bbb,eeg,mri}` + `/health`), MLflow experiment tracking via `src.core.tracking` (see AGENTS.md §7), Streamlit dashboard at `src/frontend/app.py`, and Docker / `docker-compose.yml` for the api + MLflow stack — 142 tests green.
 
157
 
158
  ## Where to Look
159
 
 
165
  - **EEG pipeline:** [`src/pipelines/eeg_pipeline.py`](src/pipelines/eeg_pipeline.py) + [`tests/pipelines/test_eeg_pipeline.py`](tests/pipelines/test_eeg_pipeline.py)
166
  - **Day-3 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-01-day3-mri-combat-pipeline.md`](docs/superpowers/plans/2026-05-01-day3-mri-combat-pipeline.md)
167
  - **MRI pipeline:** [`src/pipelines/mri_pipeline.py`](src/pipelines/mri_pipeline.py) + [`tests/pipelines/test_mri_pipeline.py`](tests/pipelines/test_mri_pipeline.py)
168
+ - **Day-4 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-02-day4-api-mlops-frontend.md`](docs/superpowers/plans/2026-05-02-day4-api-mlops-frontend.md)
169
+ - **Shared core helpers:** [`src/core/determinism.py`](src/core/determinism.py), [`src/core/storage.py`](src/core/storage.py), [`src/core/tracking.py`](src/core/tracking.py)
170
+ - **FastAPI surface:** [`src/api/main.py`](src/api/main.py), [`src/api/routes.py`](src/api/routes.py), [`src/api/schemas.py`](src/api/schemas.py)
171
+ - **Streamlit dashboard:** [`src/frontend/app.py`](src/frontend/app.py)
172
+ - **Container stack:** [`Dockerfile`](Dockerfile), [`docker-compose.yml`](docker-compose.yml)
173
+ - **Day-4 tests:** [`tests/api/`](tests/api/), [`tests/frontend/`](tests/frontend/), [`tests/pipelines/test_cross_pipeline_smoke.py`](tests/pipelines/test_cross_pipeline_smoke.py)