mekosotto commited on
Commit
582bce2
·
1 Parent(s): 3acc658

docs(plan): orchestrator agent + RAG feedback implementation plan

Browse files
docs/superpowers/plans/2026-05-02-orchestrator-agent-rag.md ADDED
@@ -0,0 +1,2426 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Orchestrator Agent + RAG Feedback Implementation Plan
2
+
3
+ > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
4
+
5
+ **Goal:** Wrap the three modality pipelines as function-calling tools, add an orchestrator agent that picks the right pipeline for each input, and feed the pipeline output through a RAG retrieval tool so the final response is grounded in user-curated reference documents.
6
+
7
+ **Architecture:** Single orchestrator agent (OpenAI-SDK function-calling loop, no framework) holds 4 tools — `run_bbb_pipeline`, `run_eeg_pipeline`, `run_mri_pipeline`, `retrieve_context`. Pipelines stay deterministic (already 184 tests green); only the wrapper layer is new. RAG uses `fastembed` for embeddings (lightweight ONNX, no torch) + `faiss-cpu` for vector search. Knowledge base is markdown / PDF files in `data/knowledge_base/` ingested at Docker build time. Streamlit gets a new "🤖 Agent" tab that surfaces the agent's tool-call trace as evidence.
8
+
9
+ **Tech Stack:** `openai==1.51.0` (existing — function calling), `fastembed==0.4.2` (embeddings, ~50MB), `faiss-cpu==1.8.0` (vector store), `pypdf==5.0.1` (PDF loader). Reuses the project's existing `get_logger`, Pydantic patterns, and `src/llm/explainer.py` model fallback discipline.
10
+
11
+ ---
12
+
13
+ ## File Structure
14
+
15
+ **New packages:**
16
+
17
+ ```
18
+ src/agents/
19
+ ├── __init__.py
20
+ ├── schemas.py # Pydantic I/O for each tool + AgentResult
21
+ ├── tools.py # Tool dataclass + registry + 4 tool implementations
22
+ ├── orchestrator.py # Orchestrator class (LLM loop + dispatch + trace)
23
+ └── prompts.py # ORCHESTRATOR_SYSTEM_PROMPT + helpers
24
+
25
+ src/rag/
26
+ ├── __init__.py
27
+ ├── chunker.py # Recursive character splitter
28
+ ├── embed.py # Embedder (fastembed wrapper)
29
+ ├── store.py # FAISSStore (load/save/add/search)
30
+ ├── retrieve.py # RAGRetriever (embed query → top-k chunks)
31
+ └── ingest.py # CLI: walk data/knowledge_base/ → embed → persist
32
+
33
+ data/knowledge_base/ # NEW (gitignored, user drops .pdf / .md here)
34
+ ├── README.md # explains what to drop, format expectations
35
+ └── .gitkeep
36
+
37
+ data/processed/faiss_index/ # NEW (built at runtime / Dockerfile RUN)
38
+ ├── index.bin
39
+ └── chunks.json
40
+
41
+ tests/agents/
42
+ ├── test_schemas.py
43
+ ├── test_tools.py
44
+ ├── test_orchestrator.py
45
+ └── test_orchestrator_live.py # network-gated, slow-marked
46
+
47
+ tests/rag/
48
+ ├── test_chunker.py
49
+ ├── test_embed.py
50
+ ├── test_store.py
51
+ ├── test_retrieve.py
52
+ └── test_ingest.py
53
+
54
+ tests/fixtures/kb_sample/ # NEW
55
+ ├── lipinski_rule_of_five.md
56
+ ├── combat_harmonization_primer.md
57
+ └── mne_ica_basics.md
58
+ ```
59
+
60
+ **Modified:**
61
+
62
+ ```
63
+ requirements.txt # +fastembed, +faiss-cpu, +pypdf
64
+ .gitignore # +data/knowledge_base/*.pdf, +data/processed/faiss_index/
65
+ src/api/routes.py # +agent_router, POST /agent/run
66
+ src/api/schemas.py # +AgentRunRequest, +AgentRunResponse, +ToolTraceItem
67
+ src/api/main.py # mount agent_router
68
+ src/frontend/app.py # +"🤖 Agent" tab
69
+ Dockerfile # RUN python -m src.rag.ingest at build
70
+ Dockerfile.hf # same
71
+ AGENTS.md # +§15 Agent surface + §16 RAG surface
72
+ ```
73
+
74
+ ---
75
+
76
+ ## Task 1: Add RAG dependencies
77
+
78
+ **Files:**
79
+ - Modify: `requirements.txt`
80
+ - Modify: `.gitignore`
81
+
82
+ - [ ] **Step 1: Add deps to requirements.txt**
83
+
84
+ Open `requirements.txt`, find the section after `# --- Tooling / tests ---` (around `httpx==0.27.2`) and insert before `# --- Frontend (B2B dashboard) ---`:
85
+
86
+ ```
87
+ # --- RAG (knowledge retrieval for agent feedback loop) ---
88
+ fastembed==0.4.2 # ONNX-based embeddings, no torch dep
89
+ faiss-cpu==1.8.0 # vector store
90
+ pypdf==5.0.1 # PDF text extraction
91
+ ```
92
+
93
+ - [ ] **Step 2: Update .gitignore**
94
+
95
+ Append to `.gitignore`:
96
+
97
+ ```
98
+ # RAG knowledge base (user-supplied PDFs/MD; not source-controlled)
99
+ data/knowledge_base/*.pdf
100
+ data/knowledge_base/*.PDF
101
+
102
+ # RAG built artifacts
103
+ data/processed/faiss_index/
104
+ ```
105
+
106
+ - [ ] **Step 3: Install deps + verify**
107
+
108
+ Run: `pip install fastembed==0.4.2 faiss-cpu==1.8.0 pypdf==5.0.1`
109
+
110
+ Expected: install succeeds. Then verify import:
111
+
112
+ ```bash
113
+ python -c "from fastembed import TextEmbedding; import faiss; import pypdf; print('ok')"
114
+ ```
115
+
116
+ Expected: `ok`
117
+
118
+ - [ ] **Step 4: Commit**
119
+
120
+ ```bash
121
+ git add requirements.txt .gitignore
122
+ git commit -m "feat(rag): add fastembed/faiss-cpu/pypdf for retrieval layer"
123
+ ```
124
+
125
+ ---
126
+
127
+ ## Task 2: RAG document chunker
128
+
129
+ **Files:**
130
+ - Create: `src/rag/__init__.py`
131
+ - Create: `src/rag/chunker.py`
132
+ - Create: `tests/rag/__init__.py`
133
+ - Create: `tests/rag/test_chunker.py`
134
+
135
+ - [ ] **Step 1: Create empty package markers**
136
+
137
+ ```bash
138
+ mkdir -p src/rag tests/rag
139
+ touch src/rag/__init__.py tests/rag/__init__.py
140
+ ```
141
+
142
+ - [ ] **Step 2: Write the failing test**
143
+
144
+ Create `tests/rag/test_chunker.py`:
145
+
146
+ ```python
147
+ """Tests for src.rag.chunker — paragraph-aware character splitter."""
148
+ from __future__ import annotations
149
+
150
+ import pytest
151
+
152
+ from src.rag.chunker import chunk_text
153
+
154
+
155
+ class TestChunkText:
156
+ def test_short_text_returns_single_chunk(self) -> None:
157
+ out = chunk_text("hello world", max_chars=100, overlap=10)
158
+ assert out == ["hello world"]
159
+
160
+ def test_empty_text_returns_empty_list(self) -> None:
161
+ assert chunk_text("", max_chars=100, overlap=10) == []
162
+ assert chunk_text(" \n\n ", max_chars=100, overlap=10) == []
163
+
164
+ def test_long_text_splits_into_multiple_chunks(self) -> None:
165
+ text = "a" * 250
166
+ out = chunk_text(text, max_chars=100, overlap=10)
167
+ assert len(out) >= 3
168
+ # every chunk respects max_chars
169
+ for c in out:
170
+ assert len(c) <= 100
171
+
172
+ def test_overlap_between_chunks(self) -> None:
173
+ text = "abcdefghij" * 30 # 300 chars, no natural break
174
+ out = chunk_text(text, max_chars=100, overlap=20)
175
+ # consecutive chunks share at least some characters
176
+ for i in range(len(out) - 1):
177
+ assert out[i][-10:] in out[i + 1] or out[i + 1][:10] in out[i]
178
+
179
+ def test_paragraph_boundary_preferred(self) -> None:
180
+ # First paragraph fits, second doesn't — split at \n\n
181
+ para_a = "First paragraph content."
182
+ para_b = "Second paragraph content " * 10
183
+ text = f"{para_a}\n\n{para_b}"
184
+ out = chunk_text(text, max_chars=100, overlap=10)
185
+ # first chunk should end at the paragraph boundary, not mid-word
186
+ assert para_a in out[0]
187
+ ```
188
+
189
+ - [ ] **Step 3: Run test to verify it fails**
190
+
191
+ Run: `pytest tests/rag/test_chunker.py -v`
192
+
193
+ Expected: FAIL with `ModuleNotFoundError: No module named 'src.rag.chunker'`
194
+
195
+ - [ ] **Step 4: Implement the chunker**
196
+
197
+ Create `src/rag/chunker.py`:
198
+
199
+ ```python
200
+ """Paragraph-aware recursive character splitter for RAG ingestion.
201
+
202
+ Public entry: `chunk_text(text, max_chars, overlap)`. Splits on the first
203
+ of [paragraph break, sentence end, newline, space] that fits inside the
204
+ window. Empty / whitespace-only inputs return [].
205
+ """
206
+ from __future__ import annotations
207
+
208
+
209
+ _SEPARATORS: tuple[str, ...] = ("\n\n", ". ", "\n", " ")
210
+
211
+
212
+ def chunk_text(text: str, max_chars: int = 600, overlap: int = 80) -> list[str]:
213
+ """Split `text` into chunks of at most `max_chars`, with `overlap` carry-over."""
214
+ text = text.strip()
215
+ if not text:
216
+ return []
217
+ if len(text) <= max_chars:
218
+ return [text]
219
+
220
+ chunks: list[str] = []
221
+ start = 0
222
+ n = len(text)
223
+ while start < n:
224
+ end = min(start + max_chars, n)
225
+ if end < n:
226
+ # try to land on a clean boundary inside [start, end]
227
+ for sep in _SEPARATORS:
228
+ last = text.rfind(sep, start, end)
229
+ if last > start:
230
+ end = last + len(sep)
231
+ break
232
+ chunk = text[start:end].strip()
233
+ if chunk:
234
+ chunks.append(chunk)
235
+ if end >= n:
236
+ break
237
+ start = max(start + 1, end - overlap)
238
+ return chunks
239
+ ```
240
+
241
+ - [ ] **Step 5: Run test to verify it passes**
242
+
243
+ Run: `pytest tests/rag/test_chunker.py -v`
244
+
245
+ Expected: 5 passed
246
+
247
+ - [ ] **Step 6: Commit**
248
+
249
+ ```bash
250
+ git add src/rag/__init__.py src/rag/chunker.py tests/rag/__init__.py tests/rag/test_chunker.py
251
+ git commit -m "feat(rag): paragraph-aware chunker (chunk_text)"
252
+ ```
253
+
254
+ ---
255
+
256
+ ## Task 3: RAG embedder
257
+
258
+ **Files:**
259
+ - Create: `src/rag/embed.py`
260
+ - Create: `tests/rag/test_embed.py`
261
+
262
+ - [ ] **Step 1: Write the failing test**
263
+
264
+ Create `tests/rag/test_embed.py`:
265
+
266
+ ```python
267
+ """Tests for src.rag.embed — fastembed wrapper."""
268
+ from __future__ import annotations
269
+
270
+ import numpy as np
271
+ import pytest
272
+
273
+ from src.rag.embed import Embedder, EMBEDDING_DIM
274
+
275
+
276
+ class TestEmbedder:
277
+ @pytest.fixture(scope="class")
278
+ def embedder(self) -> Embedder:
279
+ return Embedder()
280
+
281
+ def test_dim_constant_matches_model(self, embedder: Embedder) -> None:
282
+ out = embedder.encode(["hello"])
283
+ assert out.shape == (1, EMBEDDING_DIM)
284
+
285
+ def test_batch_encoding(self, embedder: Embedder) -> None:
286
+ out = embedder.encode(["hello", "world", "blood-brain barrier"])
287
+ assert out.shape == (3, EMBEDDING_DIM)
288
+ assert out.dtype == np.float32
289
+
290
+ def test_empty_list_returns_empty_array(self, embedder: Embedder) -> None:
291
+ out = embedder.encode([])
292
+ assert out.shape == (0, EMBEDDING_DIM)
293
+
294
+ def test_similar_strings_have_higher_similarity_than_dissimilar(
295
+ self, embedder: Embedder
296
+ ) -> None:
297
+ vecs = embedder.encode([
298
+ "blood-brain barrier permeability",
299
+ "BBB drug penetration",
300
+ "MRI multi-site harmonization",
301
+ ])
302
+ # cosine similarity (vectors should be normalized for stable comparison)
303
+ from numpy.linalg import norm
304
+ def cos(a, b):
305
+ return float(np.dot(a, b) / (norm(a) * norm(b)))
306
+ sim_ab = cos(vecs[0], vecs[1])
307
+ sim_ac = cos(vecs[0], vecs[2])
308
+ assert sim_ab > sim_ac, f"Expected BBB-related strings closer; got {sim_ab=} vs {sim_ac=}"
309
+ ```
310
+
311
+ - [ ] **Step 2: Run test to verify it fails**
312
+
313
+ Run: `pytest tests/rag/test_embed.py -v`
314
+
315
+ Expected: FAIL with `ModuleNotFoundError: No module named 'src.rag.embed'`
316
+
317
+ - [ ] **Step 3: Implement the embedder**
318
+
319
+ Create `src/rag/embed.py`:
320
+
321
+ ```python
322
+ """Fastembed wrapper — ONNX-based, CPU-only, no torch dep.
323
+
324
+ Public entry: `Embedder().encode(texts) -> np.ndarray[N, D]`. Model is
325
+ loaded lazily on first call. Output is float32 to match FAISS's expected
326
+ input dtype.
327
+ """
328
+ from __future__ import annotations
329
+
330
+ import numpy as np
331
+
332
+ from src.core.logger import get_logger
333
+
334
+ logger = get_logger(__name__)
335
+
336
+
337
+ # bge-small-en-v1.5: 384-dim, ~33MB ONNX, MTEB top-tier for size class.
338
+ _MODEL_NAME = "BAAI/bge-small-en-v1.5"
339
+ EMBEDDING_DIM = 384
340
+
341
+
342
+ class Embedder:
343
+ """Lazy-loaded fastembed wrapper. One instance per process is enough."""
344
+
345
+ def __init__(self, model_name: str = _MODEL_NAME) -> None:
346
+ self._model_name = model_name
347
+ self._model = None # lazy-loaded on first encode()
348
+
349
+ def _ensure_model(self) -> None:
350
+ if self._model is None:
351
+ from fastembed import TextEmbedding
352
+ logger.info("Loading fastembed model %s (one-time)", self._model_name)
353
+ self._model = TextEmbedding(model_name=self._model_name)
354
+
355
+ def encode(self, texts: list[str]) -> np.ndarray:
356
+ if not texts:
357
+ return np.zeros((0, EMBEDDING_DIM), dtype=np.float32)
358
+ self._ensure_model()
359
+ embeddings = list(self._model.embed(texts))
360
+ return np.array(embeddings, dtype=np.float32)
361
+ ```
362
+
363
+ - [ ] **Step 4: Run test to verify it passes**
364
+
365
+ Run: `pytest tests/rag/test_embed.py -v`
366
+
367
+ Expected: 4 passed (first run downloads ~33MB model, ~30s; subsequent runs cached).
368
+
369
+ - [ ] **Step 5: Commit**
370
+
371
+ ```bash
372
+ git add src/rag/embed.py tests/rag/test_embed.py
373
+ git commit -m "feat(rag): fastembed wrapper (Embedder, bge-small-en-v1.5, 384-dim)"
374
+ ```
375
+
376
+ ---
377
+
378
+ ## Task 4: FAISS store
379
+
380
+ **Files:**
381
+ - Create: `src/rag/store.py`
382
+ - Create: `tests/rag/test_store.py`
383
+
384
+ - [ ] **Step 1: Write the failing test**
385
+
386
+ Create `tests/rag/test_store.py`:
387
+
388
+ ```python
389
+ """Tests for src.rag.store — FAISS vector store with metadata."""
390
+ from __future__ import annotations
391
+
392
+ from pathlib import Path
393
+
394
+ import numpy as np
395
+ import pytest
396
+
397
+ from src.rag.store import FAISSStore
398
+
399
+
400
+ def _rand_vecs(n: int, d: int = 4, seed: int = 0) -> np.ndarray:
401
+ rng = np.random.default_rng(seed)
402
+ return rng.standard_normal((n, d), dtype=np.float32)
403
+
404
+
405
+ class TestFAISSStore:
406
+ def test_add_then_search(self) -> None:
407
+ store = FAISSStore(dim=4)
408
+ vecs = _rand_vecs(3)
409
+ chunks = [{"text": f"chunk-{i}", "source": "test.md"} for i in range(3)]
410
+ store.add(vecs, chunks)
411
+ results = store.search(vecs[0], k=2)
412
+ assert len(results) == 2
413
+ # the closest hit is the chunk we used as the query (cosine ~1.0)
414
+ top_chunk, top_score = results[0]
415
+ assert top_chunk["text"] == "chunk-0"
416
+ assert top_score > 0.99
417
+
418
+ def test_add_size_mismatch_raises(self) -> None:
419
+ store = FAISSStore(dim=4)
420
+ with pytest.raises(ValueError, match="size mismatch"):
421
+ store.add(_rand_vecs(3), [{"text": "only-one"}])
422
+
423
+ def test_search_k_larger_than_corpus(self) -> None:
424
+ store = FAISSStore(dim=4)
425
+ store.add(_rand_vecs(2), [{"text": f"c{i}"} for i in range(2)])
426
+ results = store.search(_rand_vecs(1)[0], k=10)
427
+ assert len(results) == 2
428
+
429
+ def test_save_load_roundtrip(self, tmp_path: Path) -> None:
430
+ store = FAISSStore(dim=4)
431
+ vecs = _rand_vecs(3)
432
+ chunks = [{"text": f"chunk-{i}", "source": "test.md"} for i in range(3)]
433
+ store.add(vecs, chunks)
434
+ store.save(tmp_path / "idx")
435
+
436
+ restored = FAISSStore.load(tmp_path / "idx", dim=4)
437
+ results = restored.search(vecs[0], k=1)
438
+ assert results[0][0]["text"] == "chunk-0"
439
+
440
+ def test_search_on_empty_store_returns_empty(self) -> None:
441
+ store = FAISSStore(dim=4)
442
+ assert store.search(_rand_vecs(1)[0], k=5) == []
443
+ ```
444
+
445
+ - [ ] **Step 2: Run test to verify it fails**
446
+
447
+ Run: `pytest tests/rag/test_store.py -v`
448
+
449
+ Expected: FAIL with `ModuleNotFoundError: No module named 'src.rag.store'`
450
+
451
+ - [ ] **Step 3: Implement the store**
452
+
453
+ Create `src/rag/store.py`:
454
+
455
+ ```python
456
+ """FAISS vector store with parallel chunk metadata.
457
+
458
+ Public entry: `FAISSStore(dim)`. Vectors are L2-normalized on add and
459
+ search so inner-product == cosine similarity. Chunks are arbitrary dicts;
460
+ `text` and `source` keys are recommended but not enforced.
461
+ """
462
+ from __future__ import annotations
463
+
464
+ import json
465
+ from pathlib import Path
466
+ from typing import Any
467
+
468
+ import faiss
469
+ import numpy as np
470
+
471
+
472
+ class FAISSStore:
473
+ """Inner-product (cosine after L2-norm) FAISS store with chunk metadata."""
474
+
475
+ def __init__(self, dim: int) -> None:
476
+ self.dim = dim
477
+ self._index: faiss.Index = faiss.IndexFlatIP(dim)
478
+ self._chunks: list[dict[str, Any]] = []
479
+
480
+ def __len__(self) -> int:
481
+ return len(self._chunks)
482
+
483
+ def add(self, vectors: np.ndarray, chunks: list[dict[str, Any]]) -> None:
484
+ if vectors.shape[0] != len(chunks):
485
+ raise ValueError(
486
+ f"size mismatch: {vectors.shape[0]} vectors vs {len(chunks)} chunks"
487
+ )
488
+ if vectors.shape[0] == 0:
489
+ return
490
+ v = np.asarray(vectors, dtype=np.float32)
491
+ faiss.normalize_L2(v)
492
+ self._index.add(v)
493
+ self._chunks.extend(chunks)
494
+
495
+ def search(self, query: np.ndarray, k: int = 5) -> list[tuple[dict[str, Any], float]]:
496
+ if len(self._chunks) == 0:
497
+ return []
498
+ q = np.asarray(query, dtype=np.float32)
499
+ if q.ndim == 1:
500
+ q = q[np.newaxis, :]
501
+ faiss.normalize_L2(q)
502
+ k = min(k, len(self._chunks))
503
+ scores, idx = self._index.search(q, k)
504
+ out: list[tuple[dict[str, Any], float]] = []
505
+ for i, s in zip(idx[0], scores[0]):
506
+ if i == -1:
507
+ continue
508
+ out.append((self._chunks[int(i)], float(s)))
509
+ return out
510
+
511
+ def save(self, dir_path: Path) -> None:
512
+ dir_path.mkdir(parents=True, exist_ok=True)
513
+ faiss.write_index(self._index, str(dir_path / "index.bin"))
514
+ (dir_path / "chunks.json").write_text(json.dumps(self._chunks, indent=2))
515
+
516
+ @classmethod
517
+ def load(cls, dir_path: Path, dim: int) -> "FAISSStore":
518
+ store = cls(dim=dim)
519
+ store._index = faiss.read_index(str(dir_path / "index.bin"))
520
+ store._chunks = json.loads((dir_path / "chunks.json").read_text())
521
+ return store
522
+ ```
523
+
524
+ - [ ] **Step 4: Run test to verify it passes**
525
+
526
+ Run: `pytest tests/rag/test_store.py -v`
527
+
528
+ Expected: 5 passed
529
+
530
+ - [ ] **Step 5: Commit**
531
+
532
+ ```bash
533
+ git add src/rag/store.py tests/rag/test_store.py
534
+ git commit -m "feat(rag): FAISS inner-product store with chunk metadata + roundtrip"
535
+ ```
536
+
537
+ ---
538
+
539
+ ## Task 5: RAG ingest CLI
540
+
541
+ **Files:**
542
+ - Create: `src/rag/ingest.py`
543
+ - Create: `tests/fixtures/kb_sample/lipinski_rule_of_five.md`
544
+ - Create: `tests/fixtures/kb_sample/combat_harmonization_primer.md`
545
+ - Create: `tests/fixtures/kb_sample/mne_ica_basics.md`
546
+ - Create: `tests/rag/test_ingest.py`
547
+
548
+ - [ ] **Step 1: Create the sample knowledge-base fixtures**
549
+
550
+ Create `tests/fixtures/kb_sample/lipinski_rule_of_five.md`:
551
+
552
+ ```markdown
553
+ # Lipinski's Rule of Five — BBB Permeability Heuristic
554
+
555
+ Lipinski's Rule of Five (Lipinski 1997, 2001) is the foundational
556
+ medicinal-chemistry rule for predicting whether a small molecule will
557
+ cross the blood-brain barrier (BBB) by passive diffusion.
558
+
559
+ ## The four criteria
560
+
561
+ A molecule is likely BBB-permeable if it satisfies all four:
562
+
563
+ 1. Molecular weight (MW) <= 500 Daltons
564
+ 2. Octanol-water partition coefficient (logP) <= 5
565
+ 3. Hydrogen-bond donors <= 5
566
+ 4. Hydrogen-bond acceptors <= 10
567
+
568
+ Molecules violating two or more criteria are typically poorly absorbed
569
+ or impermeant.
570
+
571
+ ## Why ethanol crosses
572
+
573
+ Ethanol (CCO) has MW=46 Da, logP=-0.31, 1 H-bond donor, 1 H-bond
574
+ acceptor — well within all four thresholds. This explains its rapid
575
+ CNS penetration despite hydrophilicity.
576
+
577
+ ## SHAP attribution interpretation
578
+
579
+ When a Random Forest BBB classifier flags Morgan fingerprint bits with
580
+ positive SHAP values toward a "permeable" label, the bit usually
581
+ corresponds to a small lipophilic substructure (CH3-, -OCH3-, aromatic
582
+ ring) consistent with Lipinski compliance.
583
+ ```
584
+
585
+ Create `tests/fixtures/kb_sample/combat_harmonization_primer.md`:
586
+
587
+ ```markdown
588
+ # ComBat Harmonization for Multi-Site Neuroimaging
589
+
590
+ ComBat (Johnson et al. 2007, adapted to MRI by Fortin et al. 2017, 2018)
591
+ is the de-facto standard for removing scanner / acquisition-site bias
592
+ from multi-center neuroimaging studies.
593
+
594
+ ## How it works
595
+
596
+ ComBat models per-site location (mean) and scale (variance) parameters
597
+ using an empirical-Bayes hierarchical framework. It estimates these
598
+ parameters jointly across all sites and shrinks them toward a global
599
+ prior — small-N sites are pulled toward the global mean, preventing
600
+ overfitting.
601
+
602
+ ## Site-gap reduction
603
+
604
+ A typical demonstration: the per-site mean of a hippocampus volume
605
+ feature can vary by 5+ standard deviations across hospitals. ComBat
606
+ typically collapses this gap to <0.005 — a 1000x+ reduction — while
607
+ preserving within-site biological variance (age, sex, diagnosis).
608
+
609
+ ## When it fails
610
+
611
+ ComBat requires at least 2 sites with overlapping covariate
612
+ distributions. Single-site data, or sites with completely disjoint
613
+ populations (e.g., one site only-pediatric, another only-elderly),
614
+ produce unreliable harmonization.
615
+ ```
616
+
617
+ Create `tests/fixtures/kb_sample/mne_ica_basics.md`:
618
+
619
+ ```markdown
620
+ # MNE-Python ICA for EEG Artifact Removal
621
+
622
+ Independent Component Analysis (ICA, Hyvärinen 1999) decomposes a
623
+ multi-channel EEG recording into statistically independent source
624
+ components. It is the de-facto method for removing eye-blink and
625
+ heartbeat artifacts before downstream analysis.
626
+
627
+ ## Why ICA, not PCA
628
+
629
+ PCA decomposes signals into orthogonal components — but neural sources
630
+ are not orthogonal in scalp space, they are statistically independent.
631
+ ICA's independence assumption matches the physics: the eye, the heart,
632
+ and cortical sources fire on uncorrelated schedules.
633
+
634
+ ## The standard workflow
635
+
636
+ 1. Bandpass the raw recording at 0.5-40 Hz to remove DC drift and line
637
+ noise (50/60 Hz).
638
+ 2. Fit ICA with N components (typically 15-30, less than channel count).
639
+ 3. Identify artifact components by correlating each ICA source with the
640
+ EOG (eye) channel; reject components with |correlation| > 0.5.
641
+ 4. Reconstruct the cleaned signal by zeroing out the rejected
642
+ components and inverse-transforming.
643
+
644
+ ## Quality check
645
+
646
+ Post-ICA, the EOG channel should show minimal residual correlation
647
+ with frontal channels (Fp1/Fp2). If it doesn't, the ICA fit was likely
648
+ unstable — re-run with a different random seed or more components.
649
+ ```
650
+
651
+ - [ ] **Step 2: Write the failing test**
652
+
653
+ Create `tests/rag/test_ingest.py`:
654
+
655
+ ```python
656
+ """Tests for src.rag.ingest — walk a directory, chunk, embed, persist."""
657
+ from __future__ import annotations
658
+
659
+ import shutil
660
+ from pathlib import Path
661
+
662
+ import pytest
663
+
664
+ from src.rag.ingest import ingest_directory
665
+ from src.rag.store import FAISSStore
666
+
667
+
668
+ _FIXTURE_KB = Path(__file__).parent.parent / "fixtures" / "kb_sample"
669
+
670
+
671
+ class TestIngestDirectory:
672
+ def test_ingests_markdown_files(self, tmp_path: Path) -> None:
673
+ out_dir = tmp_path / "idx"
674
+ n = ingest_directory(_FIXTURE_KB, out_dir)
675
+ assert n > 0 # at least one chunk per fixture file
676
+ assert (out_dir / "index.bin").exists()
677
+ assert (out_dir / "chunks.json").exists()
678
+
679
+ def test_loaded_store_is_searchable(self, tmp_path: Path) -> None:
680
+ out_dir = tmp_path / "idx"
681
+ ingest_directory(_FIXTURE_KB, out_dir)
682
+ from src.rag.embed import EMBEDDING_DIM
683
+ store = FAISSStore.load(out_dir, dim=EMBEDDING_DIM)
684
+ assert len(store) > 0
685
+ # chunks have source metadata
686
+ assert all("source" in c for c in store._chunks)
687
+ assert all("text" in c for c in store._chunks)
688
+
689
+ def test_empty_directory_creates_empty_index(self, tmp_path: Path) -> None:
690
+ empty = tmp_path / "empty_kb"
691
+ empty.mkdir()
692
+ out_dir = tmp_path / "idx"
693
+ n = ingest_directory(empty, out_dir)
694
+ assert n == 0
695
+ assert (out_dir / "index.bin").exists()
696
+ ```
697
+
698
+ - [ ] **Step 3: Run test to verify it fails**
699
+
700
+ Run: `pytest tests/rag/test_ingest.py -v`
701
+
702
+ Expected: FAIL with `ModuleNotFoundError: No module named 'src.rag.ingest'`
703
+
704
+ - [ ] **Step 4: Implement the ingest CLI**
705
+
706
+ Create `src/rag/ingest.py`:
707
+
708
+ ```python
709
+ """Walk a knowledge-base directory, chunk each file, embed, persist FAISS index.
710
+
711
+ CLI entry point: `python -m src.rag.ingest [<input_dir> [<output_dir>]]`.
712
+ Defaults: input=`data/knowledge_base/`, output=`data/processed/faiss_index/`.
713
+
714
+ Supported file types: `.md`, `.txt`, `.pdf`. Other extensions are ignored
715
+ with a logged WARNING.
716
+ """
717
+ from __future__ import annotations
718
+
719
+ import sys
720
+ from pathlib import Path
721
+
722
+ from src.core.logger import get_logger
723
+ from src.rag.chunker import chunk_text
724
+ from src.rag.embed import EMBEDDING_DIM, Embedder
725
+ from src.rag.store import FAISSStore
726
+
727
+ logger = get_logger(__name__)
728
+
729
+
730
+ _DEFAULT_INPUT = Path("data/knowledge_base")
731
+ _DEFAULT_OUTPUT = Path("data/processed/faiss_index")
732
+ _SUPPORTED = {".md", ".txt", ".pdf"}
733
+
734
+
735
+ def _read_pdf(path: Path) -> str:
736
+ from pypdf import PdfReader
737
+ reader = PdfReader(str(path))
738
+ return "\n\n".join(page.extract_text() or "" for page in reader.pages)
739
+
740
+
741
+ def _read_file(path: Path) -> str:
742
+ suffix = path.suffix.lower()
743
+ if suffix == ".pdf":
744
+ return _read_pdf(path)
745
+ return path.read_text(encoding="utf-8", errors="replace")
746
+
747
+
748
+ def ingest_directory(input_dir: Path, output_dir: Path) -> int:
749
+ """Ingest every supported file in `input_dir` into a FAISS index at `output_dir`.
750
+
751
+ Returns the total number of chunks indexed.
752
+ """
753
+ input_dir = Path(input_dir)
754
+ output_dir = Path(output_dir)
755
+
756
+ files = sorted(p for p in input_dir.rglob("*") if p.suffix.lower() in _SUPPORTED)
757
+ logger.info("Ingesting %d file(s) from %s", len(files), input_dir)
758
+
759
+ all_chunks: list[dict] = []
760
+ for path in files:
761
+ try:
762
+ text = _read_file(path)
763
+ except Exception as e:
764
+ logger.warning("Skipping %s (read failed: %s)", path, e)
765
+ continue
766
+ for i, ch in enumerate(chunk_text(text)):
767
+ all_chunks.append({
768
+ "text": ch,
769
+ "source": str(path.relative_to(input_dir)),
770
+ "chunk_index": i,
771
+ })
772
+
773
+ store = FAISSStore(dim=EMBEDDING_DIM)
774
+ if all_chunks:
775
+ embedder = Embedder()
776
+ vectors = embedder.encode([c["text"] for c in all_chunks])
777
+ store.add(vectors, all_chunks)
778
+
779
+ store.save(output_dir)
780
+ logger.info("Indexed %d chunk(s) → %s", len(all_chunks), output_dir)
781
+ return len(all_chunks)
782
+
783
+
784
+ def main() -> None:
785
+ args = sys.argv[1:]
786
+ inp = Path(args[0]) if len(args) >= 1 else _DEFAULT_INPUT
787
+ out = Path(args[1]) if len(args) >= 2 else _DEFAULT_OUTPUT
788
+ n = ingest_directory(inp, out)
789
+ print(f"Indexed {n} chunks into {out}")
790
+
791
+
792
+ if __name__ == "__main__":
793
+ main()
794
+ ```
795
+
796
+ - [ ] **Step 5: Run test to verify it passes**
797
+
798
+ Run: `pytest tests/rag/test_ingest.py -v`
799
+
800
+ Expected: 3 passed (first run may download embedding model if not cached from Task 3)
801
+
802
+ - [ ] **Step 6: Commit**
803
+
804
+ ```bash
805
+ git add src/rag/ingest.py tests/rag/test_ingest.py tests/fixtures/kb_sample/
806
+ git commit -m "feat(rag): ingest CLI (markdown/PDF → chunks → FAISS) + sample KB fixtures"
807
+ ```
808
+
809
+ ---
810
+
811
+ ## Task 6: RAG retriever
812
+
813
+ **Files:**
814
+ - Create: `src/rag/retrieve.py`
815
+ - Create: `tests/rag/test_retrieve.py`
816
+
817
+ - [ ] **Step 1: Write the failing test**
818
+
819
+ Create `tests/rag/test_retrieve.py`:
820
+
821
+ ```python
822
+ """Tests for src.rag.retrieve — query → top-k chunks."""
823
+ from __future__ import annotations
824
+
825
+ from pathlib import Path
826
+
827
+ import pytest
828
+
829
+ from src.rag.ingest import ingest_directory
830
+ from src.rag.retrieve import RAGRetriever
831
+
832
+
833
+ _FIXTURE_KB = Path(__file__).parent.parent / "fixtures" / "kb_sample"
834
+
835
+
836
+ class TestRAGRetriever:
837
+ @pytest.fixture(scope="class")
838
+ def retriever(self, tmp_path_factory: pytest.TempPathFactory) -> RAGRetriever:
839
+ idx_dir = tmp_path_factory.mktemp("rag_idx")
840
+ ingest_directory(_FIXTURE_KB, idx_dir)
841
+ return RAGRetriever.load(idx_dir)
842
+
843
+ def test_bbb_query_returns_lipinski_chunk(self, retriever: RAGRetriever) -> None:
844
+ hits = retriever.search("Why does ethanol cross the blood-brain barrier?", k=3)
845
+ assert len(hits) == 3
846
+ sources = [h["source"] for h in hits]
847
+ assert "lipinski_rule_of_five.md" in sources
848
+ # top hit should be from lipinski
849
+ assert hits[0]["source"] == "lipinski_rule_of_five.md"
850
+
851
+ def test_combat_query_returns_combat_chunk(self, retriever: RAGRetriever) -> None:
852
+ hits = retriever.search("How does ComBat remove scanner bias from MRI data?", k=2)
853
+ assert hits[0]["source"] == "combat_harmonization_primer.md"
854
+
855
+ def test_eeg_query_returns_ica_chunk(self, retriever: RAGRetriever) -> None:
856
+ hits = retriever.search("How do you remove eye blink artifacts from EEG?", k=2)
857
+ assert hits[0]["source"] == "mne_ica_basics.md"
858
+
859
+ def test_search_includes_score_and_text(self, retriever: RAGRetriever) -> None:
860
+ hits = retriever.search("BBB permeability", k=1)
861
+ h = hits[0]
862
+ assert "text" in h
863
+ assert "source" in h
864
+ assert "score" in h
865
+ assert isinstance(h["score"], float)
866
+ assert 0.0 <= h["score"] <= 1.0
867
+ ```
868
+
869
+ - [ ] **Step 2: Run test to verify it fails**
870
+
871
+ Run: `pytest tests/rag/test_retrieve.py -v`
872
+
873
+ Expected: FAIL with `ModuleNotFoundError: No module named 'src.rag.retrieve'`
874
+
875
+ - [ ] **Step 3: Implement the retriever**
876
+
877
+ Create `src/rag/retrieve.py`:
878
+
879
+ ```python
880
+ """Query → top-k chunks. Encapsulates the embedder + store pair so callers
881
+ don't have to assemble both. Loads from disk lazily.
882
+ """
883
+ from __future__ import annotations
884
+
885
+ from pathlib import Path
886
+
887
+ from src.core.logger import get_logger
888
+ from src.rag.embed import EMBEDDING_DIM, Embedder
889
+ from src.rag.store import FAISSStore
890
+
891
+ logger = get_logger(__name__)
892
+
893
+
894
+ class RAGRetriever:
895
+ """Bundle (embedder, store). Use `RAGRetriever.load(dir)` to construct."""
896
+
897
+ def __init__(self, store: FAISSStore, embedder: Embedder) -> None:
898
+ self._store = store
899
+ self._embedder = embedder
900
+
901
+ @classmethod
902
+ def load(cls, index_dir: Path) -> "RAGRetriever":
903
+ store = FAISSStore.load(Path(index_dir), dim=EMBEDDING_DIM)
904
+ return cls(store=store, embedder=Embedder())
905
+
906
+ def __len__(self) -> int:
907
+ return len(self._store)
908
+
909
+ def search(self, query: str, k: int = 5) -> list[dict]:
910
+ """Return up to `k` chunks most relevant to `query`, sorted by score desc.
911
+
912
+ Each chunk dict carries `text`, `source`, `chunk_index`, `score`.
913
+ Returns [] for empty query or empty store.
914
+ """
915
+ if not query.strip() or len(self._store) == 0:
916
+ return []
917
+ vec = self._embedder.encode([query])
918
+ hits = self._store.search(vec[0], k=k)
919
+ return [{**chunk, "score": score} for chunk, score in hits]
920
+ ```
921
+
922
+ - [ ] **Step 4: Run test to verify it passes**
923
+
924
+ Run: `pytest tests/rag/test_retrieve.py -v`
925
+
926
+ Expected: 4 passed
927
+
928
+ - [ ] **Step 5: Commit**
929
+
930
+ ```bash
931
+ git add src/rag/retrieve.py tests/rag/test_retrieve.py
932
+ git commit -m "feat(rag): RAGRetriever (load + search → chunks with scores)"
933
+ ```
934
+
935
+ ---
936
+
937
+ ## Task 7: Tool schemas + registry
938
+
939
+ **Files:**
940
+ - Create: `src/agents/__init__.py`
941
+ - Create: `src/agents/schemas.py`
942
+ - Create: `src/agents/tools.py`
943
+ - Create: `tests/agents/__init__.py`
944
+ - Create: `tests/agents/test_tools.py`
945
+
946
+ - [ ] **Step 1: Create empty package markers**
947
+
948
+ ```bash
949
+ mkdir -p src/agents tests/agents
950
+ touch src/agents/__init__.py tests/agents/__init__.py
951
+ ```
952
+
953
+ - [ ] **Step 2: Write the failing test**
954
+
955
+ Create `tests/agents/test_tools.py`:
956
+
957
+ ```python
958
+ """Tests for src.agents.tools — Tool dataclass + registry + 4 tool wrappers."""
959
+ from __future__ import annotations
960
+
961
+ from pathlib import Path
962
+
963
+ import pytest
964
+ from pydantic import BaseModel
965
+
966
+ from src.agents.tools import (
967
+ Tool,
968
+ build_default_tools,
969
+ BBBPipelineInput,
970
+ EEGPipelineInput,
971
+ MRIPipelineInput,
972
+ RetrieveContextInput,
973
+ )
974
+
975
+
976
+ class _DummyInput(BaseModel):
977
+ x: int
978
+ y: str = "default"
979
+
980
+
981
+ class _DummyOutput(BaseModel):
982
+ result: int
983
+
984
+
985
+ class TestTool:
986
+ def test_openai_schema_shape(self) -> None:
987
+ tool = Tool(
988
+ name="dummy",
989
+ description="A dummy tool",
990
+ input_model=_DummyInput,
991
+ output_model=_DummyOutput,
992
+ execute=lambda inp: _DummyOutput(result=inp.x * 2),
993
+ )
994
+ schema = tool.openai_schema()
995
+ assert schema["type"] == "function"
996
+ assert schema["function"]["name"] == "dummy"
997
+ assert schema["function"]["description"] == "A dummy tool"
998
+ params = schema["function"]["parameters"]
999
+ assert params["type"] == "object"
1000
+ assert "x" in params["properties"]
1001
+ assert "x" in params["required"]
1002
+ assert "y" not in params["required"] # has default
1003
+
1004
+ def test_invoke_validates_and_returns_dict(self) -> None:
1005
+ tool = Tool(
1006
+ name="dummy",
1007
+ description="d",
1008
+ input_model=_DummyInput,
1009
+ output_model=_DummyOutput,
1010
+ execute=lambda inp: _DummyOutput(result=inp.x * 2),
1011
+ )
1012
+ out = tool.invoke({"x": 5})
1013
+ assert out == {"result": 10}
1014
+
1015
+ def test_invoke_invalid_input_raises(self) -> None:
1016
+ tool = Tool(
1017
+ name="dummy",
1018
+ description="d",
1019
+ input_model=_DummyInput,
1020
+ output_model=_DummyOutput,
1021
+ execute=lambda inp: _DummyOutput(result=inp.x * 2),
1022
+ )
1023
+ with pytest.raises(ValueError, match="invalid input"):
1024
+ tool.invoke({"y": "missing-x"})
1025
+
1026
+
1027
+ class TestBuildDefaultTools:
1028
+ def test_default_set_has_four_tools(self, tmp_path: Path) -> None:
1029
+ # build with placeholder paths; tools won't be invoked here
1030
+ tools = build_default_tools(rag_index_dir=None)
1031
+ names = {t.name for t in tools}
1032
+ assert names == {
1033
+ "run_bbb_pipeline",
1034
+ "run_eeg_pipeline",
1035
+ "run_mri_pipeline",
1036
+ "retrieve_context",
1037
+ }
1038
+
1039
+ def test_each_tool_has_pydantic_input_model(self) -> None:
1040
+ tools = build_default_tools(rag_index_dir=None)
1041
+ for t in tools:
1042
+ assert issubclass(t.input_model, BaseModel)
1043
+ assert issubclass(t.output_model, BaseModel)
1044
+
1045
+ def test_input_models_have_smiles_paths(self) -> None:
1046
+ # verify the field names downstream system prompt depends on
1047
+ assert "smiles" in BBBPipelineInput.model_fields
1048
+ assert "input_path" in EEGPipelineInput.model_fields
1049
+ assert "input_dir" in MRIPipelineInput.model_fields
1050
+ assert "sites_csv" in MRIPipelineInput.model_fields
1051
+ assert "query" in RetrieveContextInput.model_fields
1052
+ assert "k" in RetrieveContextInput.model_fields
1053
+ ```
1054
+
1055
+ - [ ] **Step 3: Run test to verify it fails**
1056
+
1057
+ Run: `pytest tests/agents/test_tools.py -v`
1058
+
1059
+ Expected: FAIL with `ModuleNotFoundError: No module named 'src.agents.tools'`
1060
+
1061
+ - [ ] **Step 4: Implement schemas**
1062
+
1063
+ Create `src/agents/schemas.py`:
1064
+
1065
+ ```python
1066
+ """Pydantic input/output schemas for orchestrator tools and the agent result.
1067
+
1068
+ These schemas double as OpenAI function-calling parameter definitions
1069
+ (via `model_json_schema()`) and as runtime validation gates. Keep field
1070
+ names lowercase + snake_case so prompts and JSON outputs align.
1071
+ """
1072
+ from __future__ import annotations
1073
+
1074
+ from typing import Any
1075
+
1076
+ from pydantic import BaseModel, Field
1077
+
1078
+
1079
+ # --- Pipeline tool inputs ---------------------------------------------------
1080
+
1081
+ class BBBPipelineInput(BaseModel):
1082
+ """Input for `run_bbb_pipeline` — a single SMILES string."""
1083
+ smiles: str = Field(..., description="A single molecular SMILES string, e.g. 'CCO'")
1084
+ top_k: int = Field(5, ge=1, le=20, description="Top-k SHAP attributions to return")
1085
+
1086
+
1087
+ class EEGPipelineInput(BaseModel):
1088
+ """Input for `run_eeg_pipeline` — path to an EEG file (.fif or .edf)."""
1089
+ input_path: str = Field(..., description="Path to EEG recording file (.fif or .edf)")
1090
+ epoch_duration_s: float = Field(2.0, gt=0.1, le=60.0)
1091
+
1092
+
1093
+ class MRIPipelineInput(BaseModel):
1094
+ """Input for `run_mri_pipeline` — directory of NIfTI files + sites CSV."""
1095
+ input_dir: str = Field(..., description="Directory containing .nii.gz volumes")
1096
+ sites_csv: str = Field(..., description="CSV mapping subject_id → site")
1097
+
1098
+
1099
+ class RetrieveContextInput(BaseModel):
1100
+ """Input for `retrieve_context` — natural-language query into the KB."""
1101
+ query: str = Field(..., min_length=2, description="Search query for the knowledge base")
1102
+ k: int = Field(4, ge=1, le=10, description="Number of chunks to return")
1103
+
1104
+
1105
+ # --- Pipeline tool outputs --------------------------------------------------
1106
+
1107
+ class BBBPipelineOutput(BaseModel):
1108
+ smiles: str
1109
+ label: int
1110
+ label_text: str
1111
+ confidence: float
1112
+ top_features: list[dict[str, Any]]
1113
+ drift_z: float | None = None
1114
+
1115
+
1116
+ class EEGPipelineOutput(BaseModel):
1117
+ input_path: str
1118
+ output_path: str
1119
+ rows: int
1120
+ columns: int
1121
+ duration_sec: float
1122
+
1123
+
1124
+ class MRIPipelineOutput(BaseModel):
1125
+ input_dir: str
1126
+ output_path: str
1127
+ rows: int
1128
+ columns: int
1129
+ duration_sec: float
1130
+
1131
+
1132
+ class RetrieveContextOutput(BaseModel):
1133
+ query: str
1134
+ chunks: list[dict[str, Any]]
1135
+
1136
+
1137
+ # --- Agent result -----------------------------------------------------------
1138
+
1139
+ class ToolTraceItem(BaseModel):
1140
+ """One step in the orchestrator's tool-call trace."""
1141
+ name: str
1142
+ args: dict[str, Any]
1143
+ result: dict[str, Any] | None = None
1144
+ error: str | None = None
1145
+
1146
+
1147
+ class AgentResult(BaseModel):
1148
+ """Final orchestrator response: synthesized text + full trace."""
1149
+ text: str
1150
+ trace: list[ToolTraceItem] = Field(default_factory=list)
1151
+ model: str | None = None
1152
+ finish_reason: str = "complete" # complete | max_steps | error
1153
+ ```
1154
+
1155
+ - [ ] **Step 5: Implement Tool dataclass + registry**
1156
+
1157
+ Create `src/agents/tools.py`:
1158
+
1159
+ ```python
1160
+ """Tool dataclass + registry. Wraps each pipeline + the RAG retriever as a
1161
+ function-callable tool the orchestrator can invoke.
1162
+
1163
+ Public entry: `build_default_tools(rag_index_dir)` returns the 4 tools.
1164
+ """
1165
+ from __future__ import annotations
1166
+
1167
+ from dataclasses import dataclass
1168
+ from pathlib import Path
1169
+ from typing import Any, Callable
1170
+
1171
+ from pydantic import BaseModel, ValidationError
1172
+
1173
+ from src.agents.schemas import (
1174
+ BBBPipelineInput,
1175
+ BBBPipelineOutput,
1176
+ EEGPipelineInput,
1177
+ EEGPipelineOutput,
1178
+ MRIPipelineInput,
1179
+ MRIPipelineOutput,
1180
+ RetrieveContextInput,
1181
+ RetrieveContextOutput,
1182
+ )
1183
+ from src.core.logger import get_logger
1184
+
1185
+ logger = get_logger(__name__)
1186
+
1187
+
1188
+ @dataclass
1189
+ class Tool:
1190
+ """One callable tool exposed to the orchestrator.
1191
+
1192
+ `execute(input_model_instance) -> output_model_instance` is the contract.
1193
+ `invoke(args_dict)` validates the dict, runs execute, returns a plain dict.
1194
+ """
1195
+ name: str
1196
+ description: str
1197
+ input_model: type[BaseModel]
1198
+ output_model: type[BaseModel]
1199
+ execute: Callable[[Any], BaseModel]
1200
+
1201
+ def openai_schema(self) -> dict[str, Any]:
1202
+ """OpenAI/OpenRouter function-calling schema for this tool."""
1203
+ params = self.input_model.model_json_schema()
1204
+ # OpenAI doesn't accept top-level $defs / title in some clients —
1205
+ # strip the cosmetic ones; keep properties/required/type.
1206
+ cleaned = {
1207
+ "type": "object",
1208
+ "properties": params.get("properties", {}),
1209
+ "required": params.get("required", []),
1210
+ }
1211
+ return {
1212
+ "type": "function",
1213
+ "function": {
1214
+ "name": self.name,
1215
+ "description": self.description,
1216
+ "parameters": cleaned,
1217
+ },
1218
+ }
1219
+
1220
+ def invoke(self, args: dict[str, Any]) -> dict[str, Any]:
1221
+ try:
1222
+ inp = self.input_model.model_validate(args)
1223
+ except ValidationError as e:
1224
+ raise ValueError(f"invalid input for {self.name}: {e}") from e
1225
+ out = self.execute(inp)
1226
+ return out.model_dump()
1227
+
1228
+
1229
+ # ---------------------------------------------------------------------------
1230
+ # Tool implementations — thin wrappers around existing pipelines + RAG.
1231
+ # Heavy work stays in the underlying modules; these only adapt I/O.
1232
+ # ---------------------------------------------------------------------------
1233
+
1234
+
1235
+ def _execute_bbb(inp: BBBPipelineInput) -> BBBPipelineOutput:
1236
+ """Predict + SHAP for a single SMILES, reusing the existing model surface."""
1237
+ from src.api import routes as api_routes
1238
+ from src.api.schemas import BBBPredictRequest
1239
+
1240
+ response = api_routes.predict_bbb(
1241
+ BBBPredictRequest(smiles=inp.smiles, top_k=inp.top_k)
1242
+ )
1243
+ return BBBPipelineOutput(
1244
+ smiles=inp.smiles,
1245
+ label=response.label,
1246
+ label_text=response.label_text,
1247
+ confidence=response.confidence,
1248
+ top_features=[f.model_dump() for f in response.top_features],
1249
+ drift_z=response.drift_z,
1250
+ )
1251
+
1252
+
1253
+ def _execute_eeg(inp: EEGPipelineInput) -> EEGPipelineOutput:
1254
+ from src.api.schemas import EEGRequest
1255
+ from src.api import routes as api_routes
1256
+
1257
+ out_path = Path("data/processed/eeg_features.parquet")
1258
+ response = api_routes.run_eeg_pipeline_route(
1259
+ EEGRequest(
1260
+ input_path=inp.input_path,
1261
+ output_path=str(out_path),
1262
+ epoch_duration_s=inp.epoch_duration_s,
1263
+ )
1264
+ )
1265
+ return EEGPipelineOutput(
1266
+ input_path=inp.input_path,
1267
+ output_path=response.output_path,
1268
+ rows=response.rows,
1269
+ columns=response.columns,
1270
+ duration_sec=response.duration_sec,
1271
+ )
1272
+
1273
+
1274
+ def _execute_mri(inp: MRIPipelineInput) -> MRIPipelineOutput:
1275
+ from src.api.schemas import MRIRequest
1276
+ from src.api import routes as api_routes
1277
+
1278
+ out_path = Path("data/processed/mri_features.parquet")
1279
+ response = api_routes.run_mri_pipeline_route(
1280
+ MRIRequest(
1281
+ input_dir=inp.input_dir,
1282
+ sites_csv=inp.sites_csv,
1283
+ output_path=str(out_path),
1284
+ )
1285
+ )
1286
+ return MRIPipelineOutput(
1287
+ input_dir=inp.input_dir,
1288
+ output_path=response.output_path,
1289
+ rows=response.rows,
1290
+ columns=response.columns,
1291
+ duration_sec=response.duration_sec,
1292
+ )
1293
+
1294
+
1295
+ def _make_retrieve_executor(rag_index_dir: Path | None) -> Callable[[RetrieveContextInput], RetrieveContextOutput]:
1296
+ """Closure: capture the index dir; lazy-load the retriever on first call."""
1297
+ state: dict[str, Any] = {"retriever": None}
1298
+
1299
+ def execute(inp: RetrieveContextInput) -> RetrieveContextOutput:
1300
+ if rag_index_dir is None or not (rag_index_dir / "index.bin").exists():
1301
+ return RetrieveContextOutput(query=inp.query, chunks=[])
1302
+ if state["retriever"] is None:
1303
+ from src.rag.retrieve import RAGRetriever
1304
+ state["retriever"] = RAGRetriever.load(rag_index_dir)
1305
+ hits = state["retriever"].search(inp.query, k=inp.k)
1306
+ return RetrieveContextOutput(query=inp.query, chunks=hits)
1307
+
1308
+ return execute
1309
+
1310
+
1311
+ def build_default_tools(rag_index_dir: Path | None) -> list[Tool]:
1312
+ """Return the 4 tools the orchestrator gets by default."""
1313
+ return [
1314
+ Tool(
1315
+ name="run_bbb_pipeline",
1316
+ description=(
1317
+ "Predict blood-brain-barrier permeability for a SINGLE SMILES "
1318
+ "string. Use this when the user input looks like a molecule "
1319
+ "(short alphanumeric string with no file extension, e.g. 'CCO', "
1320
+ "'c1ccccc1'). Returns label, confidence, top SHAP features, drift."
1321
+ ),
1322
+ input_model=BBBPipelineInput,
1323
+ output_model=BBBPipelineOutput,
1324
+ execute=_execute_bbb,
1325
+ ),
1326
+ Tool(
1327
+ name="run_eeg_pipeline",
1328
+ description=(
1329
+ "Run the EEG signal-processing pipeline (bandpass + ICA + "
1330
+ "epoching + feature extraction) on an EEG recording file. Use "
1331
+ "when input_path ends in .fif or .edf. Returns row/column "
1332
+ "counts + duration."
1333
+ ),
1334
+ input_model=EEGPipelineInput,
1335
+ output_model=EEGPipelineOutput,
1336
+ execute=_execute_eeg,
1337
+ ),
1338
+ Tool(
1339
+ name="run_mri_pipeline",
1340
+ description=(
1341
+ "Run the multi-site MRI ComBat-harmonization pipeline. Use "
1342
+ "when input is a directory containing .nii.gz volumes paired "
1343
+ "with a sites.csv. Returns row/column counts + duration."
1344
+ ),
1345
+ input_model=MRIPipelineInput,
1346
+ output_model=MRIPipelineOutput,
1347
+ execute=_execute_mri,
1348
+ ),
1349
+ Tool(
1350
+ name="retrieve_context",
1351
+ description=(
1352
+ "Retrieve up to k passages from the curated reference knowledge "
1353
+ "base. Use AFTER a pipeline tool returns, to ground your final "
1354
+ "synthesis in cited literature. Formulate a focused query "
1355
+ "based on the pipeline output (e.g., 'BBB permeability of "
1356
+ "small lipophilic molecules' or 'ComBat site harmonization')."
1357
+ ),
1358
+ input_model=RetrieveContextInput,
1359
+ output_model=RetrieveContextOutput,
1360
+ execute=_make_retrieve_executor(rag_index_dir),
1361
+ ),
1362
+ ]
1363
+ ```
1364
+
1365
+ - [ ] **Step 6: Run test to verify it passes**
1366
+
1367
+ Run: `pytest tests/agents/test_tools.py -v`
1368
+
1369
+ Expected: 6 passed
1370
+
1371
+ - [ ] **Step 7: Commit**
1372
+
1373
+ ```bash
1374
+ git add src/agents/__init__.py src/agents/schemas.py src/agents/tools.py tests/agents/__init__.py tests/agents/test_tools.py
1375
+ git commit -m "feat(agents): Tool dataclass + registry + 4 tool wrappers (3 pipelines + RAG)"
1376
+ ```
1377
+
1378
+ ---
1379
+
1380
+ ## Task 8: Orchestrator agent loop
1381
+
1382
+ **Files:**
1383
+ - Create: `src/agents/prompts.py`
1384
+ - Create: `src/agents/orchestrator.py`
1385
+ - Create: `tests/agents/test_orchestrator.py`
1386
+
1387
+ - [ ] **Step 1: Create the system prompt module**
1388
+
1389
+ Create `src/agents/prompts.py`:
1390
+
1391
+ ```python
1392
+ """System prompts for the orchestrator agent.
1393
+
1394
+ Kept in a dedicated module so prompt edits are diff-readable and reviewable
1395
+ in isolation from the orchestrator loop.
1396
+ """
1397
+ from __future__ import annotations
1398
+
1399
+
1400
+ ORCHESTRATOR_SYSTEM_PROMPT = """\
1401
+ You are the NeuroBridge clinical-ML orchestrator. You have four tools:
1402
+
1403
+ - run_bbb_pipeline(smiles, top_k=5) → for a SMILES molecular string
1404
+ - run_eeg_pipeline(input_path) → for a .fif or .edf EEG file path
1405
+ - run_mri_pipeline(input_dir, sites_csv) → for a directory of NIfTI MRI files
1406
+ - retrieve_context(query, k=4) → for grounding chunks from the knowledge base
1407
+
1408
+ Workflow — follow exactly:
1409
+
1410
+ 1. Look at the user input. Decide which ONE pipeline tool fits:
1411
+ - SMILES (short, all-letters/digits, no slashes, no .ext) → run_bbb_pipeline
1412
+ - Path ending in .fif or .edf → run_eeg_pipeline
1413
+ - Path that is a directory (no file extension at the tail) → run_mri_pipeline
1414
+ If ambiguous, prefer SMILES if it parses; otherwise return:
1415
+ "Cannot identify modality. Provide a SMILES, .fif/.edf path, or NIfTI directory."
1416
+
1417
+ 2. Call the chosen pipeline tool exactly once with the user input.
1418
+
1419
+ 3. After the pipeline returns, formulate ONE focused retrieval query that
1420
+ captures the scientific concept behind the prediction (NOT the raw input).
1421
+ Examples of good queries:
1422
+ - "BBB permeability of small lipophilic molecules" (after BBB predict)
1423
+ - "ICA artifact removal in multi-channel EEG" (after EEG run)
1424
+ - "ComBat scanner site harmonization in multi-center MRI" (after MRI run)
1425
+ Then call retrieve_context with that query.
1426
+
1427
+ 4. Synthesize a final response in 3-5 sentences:
1428
+ - State the concrete pipeline result (label, confidence, key numbers).
1429
+ - Cite at least one specific fact from the retrieved chunks (mention the
1430
+ source file in parentheses, e.g. "(lipinski_rule_of_five.md)").
1431
+ - Match the user's question language: Turkish in → Turkish out, etc.
1432
+ - If retrieve_context returned 0 chunks, say so explicitly and answer
1433
+ using only the pipeline result.
1434
+
1435
+ Hard constraints:
1436
+ - Call exactly ONE pipeline tool, then exactly ONE retrieve_context, then stop.
1437
+ - Do NOT invent facts. Only use numbers from the pipeline tool output and
1438
+ text from the retrieved chunks.
1439
+ - No preamble, no apologies, no meta-commentary about being an AI.
1440
+ """
1441
+ ```
1442
+
1443
+ - [ ] **Step 2: Write the failing test**
1444
+
1445
+ Create `tests/agents/test_orchestrator.py`:
1446
+
1447
+ ```python
1448
+ """Tests for src.agents.orchestrator — agent loop with stubbed LLM client.
1449
+
1450
+ We do NOT hit OpenRouter here. We construct a fake client that returns
1451
+ scripted tool-call responses, then verify the orchestrator dispatches
1452
+ tools and assembles the trace correctly.
1453
+ """
1454
+ from __future__ import annotations
1455
+
1456
+ import json
1457
+ from typing import Any
1458
+ from unittest.mock import MagicMock
1459
+
1460
+ import pytest
1461
+ from pydantic import BaseModel
1462
+
1463
+ from src.agents.orchestrator import Orchestrator
1464
+ from src.agents.tools import Tool
1465
+
1466
+
1467
+ # --- Helpers ----------------------------------------------------------------
1468
+
1469
+
1470
+ def _fake_choice_with_tool_call(name: str, args: dict[str, Any], call_id: str = "c1") -> Any:
1471
+ msg = MagicMock()
1472
+ msg.content = None
1473
+ tc = MagicMock()
1474
+ tc.id = call_id
1475
+ tc.function.name = name
1476
+ tc.function.arguments = json.dumps(args)
1477
+ tc.model_dump = MagicMock(return_value={"id": call_id, "type": "function",
1478
+ "function": {"name": name,
1479
+ "arguments": json.dumps(args)}})
1480
+ msg.tool_calls = [tc]
1481
+ choice = MagicMock()
1482
+ choice.message = msg
1483
+ response = MagicMock()
1484
+ response.choices = [choice]
1485
+ return response
1486
+
1487
+
1488
+ def _fake_choice_with_text(text: str) -> Any:
1489
+ msg = MagicMock()
1490
+ msg.content = text
1491
+ msg.tool_calls = None
1492
+ choice = MagicMock()
1493
+ choice.message = msg
1494
+ response = MagicMock()
1495
+ response.choices = [choice]
1496
+ return response
1497
+
1498
+
1499
+ class _PingInput(BaseModel):
1500
+ msg: str
1501
+
1502
+
1503
+ class _PingOutput(BaseModel):
1504
+ echo: str
1505
+
1506
+
1507
+ def _make_ping_tool() -> Tool:
1508
+ return Tool(
1509
+ name="ping",
1510
+ description="Echo a string back.",
1511
+ input_model=_PingInput,
1512
+ output_model=_PingOutput,
1513
+ execute=lambda inp: _PingOutput(echo=f"pong:{inp.msg}"),
1514
+ )
1515
+
1516
+
1517
+ # --- Tests ------------------------------------------------------------------
1518
+
1519
+
1520
+ class TestOrchestrator:
1521
+ def test_single_tool_then_text_response(self) -> None:
1522
+ client = MagicMock()
1523
+ client.chat.completions.create.side_effect = [
1524
+ _fake_choice_with_tool_call("ping", {"msg": "hello"}),
1525
+ _fake_choice_with_text("All done."),
1526
+ ]
1527
+ orch = Orchestrator(
1528
+ llm_client=client,
1529
+ tools=[_make_ping_tool()],
1530
+ system_prompt="sys",
1531
+ model="stub-model",
1532
+ max_steps=4,
1533
+ )
1534
+ result = orch.run("test input")
1535
+ assert result.text == "All done."
1536
+ assert result.finish_reason == "complete"
1537
+ assert len(result.trace) == 1
1538
+ assert result.trace[0].name == "ping"
1539
+ assert result.trace[0].args == {"msg": "hello"}
1540
+ assert result.trace[0].result == {"echo": "pong:hello"}
1541
+
1542
+ def test_unknown_tool_recorded_as_error(self) -> None:
1543
+ client = MagicMock()
1544
+ client.chat.completions.create.side_effect = [
1545
+ _fake_choice_with_tool_call("nonexistent_tool", {"x": 1}),
1546
+ _fake_choice_with_text("Done."),
1547
+ ]
1548
+ orch = Orchestrator(
1549
+ llm_client=client,
1550
+ tools=[_make_ping_tool()],
1551
+ system_prompt="sys",
1552
+ model="stub-model",
1553
+ max_steps=4,
1554
+ )
1555
+ result = orch.run("test")
1556
+ assert result.trace[0].error is not None
1557
+ assert "unknown tool" in result.trace[0].error
1558
+ assert result.text == "Done."
1559
+
1560
+ def test_invalid_tool_args_recorded_as_error(self) -> None:
1561
+ client = MagicMock()
1562
+ client.chat.completions.create.side_effect = [
1563
+ _fake_choice_with_tool_call("ping", {"wrong_field": "x"}),
1564
+ _fake_choice_with_text("Recovered."),
1565
+ ]
1566
+ orch = Orchestrator(
1567
+ llm_client=client,
1568
+ tools=[_make_ping_tool()],
1569
+ system_prompt="sys",
1570
+ model="stub-model",
1571
+ max_steps=4,
1572
+ )
1573
+ result = orch.run("test")
1574
+ assert result.trace[0].error is not None
1575
+ assert result.text == "Recovered."
1576
+
1577
+ def test_max_steps_exhausted_returns_finish_reason(self) -> None:
1578
+ client = MagicMock()
1579
+ # Always return another tool call — never terminates with text
1580
+ client.chat.completions.create.side_effect = [
1581
+ _fake_choice_with_tool_call("ping", {"msg": f"{i}"}, call_id=f"c{i}")
1582
+ for i in range(10)
1583
+ ]
1584
+ orch = Orchestrator(
1585
+ llm_client=client,
1586
+ tools=[_make_ping_tool()],
1587
+ system_prompt="sys",
1588
+ model="stub-model",
1589
+ max_steps=3,
1590
+ )
1591
+ result = orch.run("test")
1592
+ assert result.finish_reason == "max_steps"
1593
+ assert len(result.trace) == 3
1594
+
1595
+ def test_first_response_is_text_no_tools(self) -> None:
1596
+ client = MagicMock()
1597
+ client.chat.completions.create.side_effect = [
1598
+ _fake_choice_with_text("Direct answer."),
1599
+ ]
1600
+ orch = Orchestrator(
1601
+ llm_client=client,
1602
+ tools=[_make_ping_tool()],
1603
+ system_prompt="sys",
1604
+ model="stub-model",
1605
+ )
1606
+ result = orch.run("trivial input")
1607
+ assert result.text == "Direct answer."
1608
+ assert result.trace == []
1609
+ ```
1610
+
1611
+ - [ ] **Step 3: Run test to verify it fails**
1612
+
1613
+ Run: `pytest tests/agents/test_orchestrator.py -v`
1614
+
1615
+ Expected: FAIL with `ModuleNotFoundError: No module named 'src.agents.orchestrator'`
1616
+
1617
+ - [ ] **Step 4: Implement the orchestrator**
1618
+
1619
+ Create `src/agents/orchestrator.py`:
1620
+
1621
+ ```python
1622
+ """Orchestrator agent: function-calling loop over a list of Tools.
1623
+
1624
+ No agent framework — uses the openai SDK's chat-completions function-calling
1625
+ interface directly. This is the same SDK already used by src/llm/explainer.py,
1626
+ keeping the dependency surface minimal.
1627
+
1628
+ Public entry: `Orchestrator(llm_client, tools, system_prompt, model).run(user_input)`.
1629
+ Returns an `AgentResult` with synthesized text + full tool-call trace.
1630
+ """
1631
+ from __future__ import annotations
1632
+
1633
+ import json
1634
+ from typing import Any
1635
+
1636
+ from src.agents.schemas import AgentResult, ToolTraceItem
1637
+ from src.agents.tools import Tool
1638
+ from src.core.logger import get_logger
1639
+
1640
+ logger = get_logger(__name__)
1641
+
1642
+
1643
+ class Orchestrator:
1644
+ """Single-agent function-calling loop. Stops on (a) text response, (b) max steps."""
1645
+
1646
+ def __init__(
1647
+ self,
1648
+ llm_client: Any,
1649
+ tools: list[Tool],
1650
+ system_prompt: str,
1651
+ model: str,
1652
+ max_steps: int = 5,
1653
+ temperature: float = 0.0,
1654
+ ) -> None:
1655
+ self._client = llm_client
1656
+ self._tools_by_name = {t.name: t for t in tools}
1657
+ self._tool_schemas = [t.openai_schema() for t in tools]
1658
+ self._system_prompt = system_prompt
1659
+ self._model = model
1660
+ self._max_steps = max_steps
1661
+ self._temperature = temperature
1662
+
1663
+ def run(self, user_input: str) -> AgentResult:
1664
+ messages: list[dict[str, Any]] = [
1665
+ {"role": "system", "content": self._system_prompt},
1666
+ {"role": "user", "content": user_input},
1667
+ ]
1668
+ trace: list[ToolTraceItem] = []
1669
+
1670
+ for _step in range(self._max_steps):
1671
+ response = self._client.chat.completions.create(
1672
+ model=self._model,
1673
+ messages=messages,
1674
+ tools=self._tool_schemas,
1675
+ tool_choice="auto",
1676
+ temperature=self._temperature,
1677
+ )
1678
+ msg = response.choices[0].message
1679
+
1680
+ if not getattr(msg, "tool_calls", None):
1681
+ return AgentResult(
1682
+ text=(msg.content or "").strip(),
1683
+ trace=trace,
1684
+ model=self._model,
1685
+ finish_reason="complete",
1686
+ )
1687
+
1688
+ messages.append({
1689
+ "role": "assistant",
1690
+ "content": msg.content,
1691
+ "tool_calls": [tc.model_dump() for tc in msg.tool_calls],
1692
+ })
1693
+
1694
+ for tc in msg.tool_calls:
1695
+ name = tc.function.name
1696
+ tool = self._tools_by_name.get(name)
1697
+ if tool is None:
1698
+ err = f"unknown tool: {name}"
1699
+ trace.append(ToolTraceItem(name=name, args={}, error=err))
1700
+ messages.append({
1701
+ "role": "tool",
1702
+ "tool_call_id": tc.id,
1703
+ "content": json.dumps({"error": err}),
1704
+ })
1705
+ continue
1706
+ try:
1707
+ args = json.loads(tc.function.arguments or "{}")
1708
+ result = tool.invoke(args)
1709
+ trace.append(ToolTraceItem(name=name, args=args, result=result))
1710
+ messages.append({
1711
+ "role": "tool",
1712
+ "tool_call_id": tc.id,
1713
+ "content": json.dumps({"result": result}, default=str),
1714
+ })
1715
+ except Exception as e:
1716
+ err = str(e)
1717
+ trace.append(ToolTraceItem(name=name, args={}, error=err))
1718
+ messages.append({
1719
+ "role": "tool",
1720
+ "tool_call_id": tc.id,
1721
+ "content": json.dumps({"error": err}),
1722
+ })
1723
+
1724
+ return AgentResult(
1725
+ text="Max steps reached without a final answer.",
1726
+ trace=trace,
1727
+ model=self._model,
1728
+ finish_reason="max_steps",
1729
+ )
1730
+ ```
1731
+
1732
+ - [ ] **Step 5: Run test to verify it passes**
1733
+
1734
+ Run: `pytest tests/agents/test_orchestrator.py -v`
1735
+
1736
+ Expected: 5 passed
1737
+
1738
+ - [ ] **Step 6: Commit**
1739
+
1740
+ ```bash
1741
+ git add src/agents/prompts.py src/agents/orchestrator.py tests/agents/test_orchestrator.py
1742
+ git commit -m "feat(agents): orchestrator loop (function-calling + tool trace + max-steps gate)"
1743
+ ```
1744
+
1745
+ ---
1746
+
1747
+ ## Task 9: FastAPI /agent/run endpoint
1748
+
1749
+ **Files:**
1750
+ - Modify: `src/api/schemas.py`
1751
+ - Modify: `src/api/routes.py`
1752
+ - Modify: `src/api/main.py`
1753
+ - Create: `tests/agents/test_agent_route.py`
1754
+
1755
+ - [ ] **Step 1: Add request/response schemas**
1756
+
1757
+ Append to `src/api/schemas.py`:
1758
+
1759
+ ```python
1760
+
1761
+
1762
+ # --- Agent surface (orchestrator + RAG) ------------------------------------
1763
+
1764
+ class AgentRunRequest(BaseModel):
1765
+ """User input to the orchestrator."""
1766
+ user_input: str = Field(..., min_length=1, description="SMILES, file path, or directory path")
1767
+ user_question: str | None = Field(
1768
+ None, description="Optional natural-language question to language-match the response"
1769
+ )
1770
+
1771
+
1772
+ class AgentToolTraceItem(BaseModel):
1773
+ name: str
1774
+ args: dict = Field(default_factory=dict)
1775
+ result: dict | None = None
1776
+ error: str | None = None
1777
+
1778
+
1779
+ class AgentRunResponse(BaseModel):
1780
+ text: str
1781
+ trace: list[AgentToolTraceItem] = Field(default_factory=list)
1782
+ model: str | None = None
1783
+ finish_reason: str = "complete"
1784
+ ```
1785
+
1786
+ - [ ] **Step 2: Write the failing test**
1787
+
1788
+ Create `tests/agents/test_agent_route.py`:
1789
+
1790
+ ```python
1791
+ """Tests for POST /agent/run — uses a stub orchestrator factory."""
1792
+ from __future__ import annotations
1793
+
1794
+ from typing import Any
1795
+ from unittest.mock import patch
1796
+
1797
+ import pytest
1798
+ from fastapi.testclient import TestClient
1799
+
1800
+ from src.agents.schemas import AgentResult, ToolTraceItem
1801
+ from src.api.main import app
1802
+
1803
+
1804
+ client = TestClient(app)
1805
+
1806
+
1807
+ class _FakeOrchestrator:
1808
+ """Returns a canned AgentResult; ignores input."""
1809
+ def __init__(self, *args: Any, **kwargs: Any) -> None:
1810
+ pass
1811
+
1812
+ def run(self, user_input: str) -> AgentResult:
1813
+ return AgentResult(
1814
+ text=f"Synthesized answer for: {user_input}",
1815
+ trace=[
1816
+ ToolTraceItem(name="run_bbb_pipeline", args={"smiles": user_input},
1817
+ result={"label": 1, "label_text": "permeable"}),
1818
+ ToolTraceItem(name="retrieve_context", args={"query": "BBB"},
1819
+ result={"chunks": []}),
1820
+ ],
1821
+ model="stub-model",
1822
+ finish_reason="complete",
1823
+ )
1824
+
1825
+
1826
+ class TestAgentRoute:
1827
+ def test_post_returns_synthesized_text_and_trace(self) -> None:
1828
+ with patch("src.api.routes._build_orchestrator", return_value=_FakeOrchestrator()):
1829
+ r = client.post("/agent/run", json={"user_input": "CCO"})
1830
+ assert r.status_code == 200
1831
+ body = r.json()
1832
+ assert "Synthesized answer for: CCO" in body["text"]
1833
+ assert len(body["trace"]) == 2
1834
+ assert body["trace"][0]["name"] == "run_bbb_pipeline"
1835
+ assert body["model"] == "stub-model"
1836
+ assert body["finish_reason"] == "complete"
1837
+
1838
+ def test_empty_user_input_422(self) -> None:
1839
+ r = client.post("/agent/run", json={"user_input": ""})
1840
+ assert r.status_code == 422
1841
+
1842
+ def test_missing_user_input_422(self) -> None:
1843
+ r = client.post("/agent/run", json={})
1844
+ assert r.status_code == 422
1845
+ ```
1846
+
1847
+ - [ ] **Step 3: Run test to verify it fails**
1848
+
1849
+ Run: `pytest tests/agents/test_agent_route.py -v`
1850
+
1851
+ Expected: FAIL with `404` or import error referencing `_build_orchestrator`.
1852
+
1853
+ - [ ] **Step 4: Wire up the route**
1854
+
1855
+ In `src/api/routes.py`, add to the imports block (alongside the existing `from src.api.schemas import ...`):
1856
+
1857
+ ```python
1858
+ from src.api.schemas import (
1859
+ AgentRunRequest,
1860
+ AgentRunResponse,
1861
+ AgentToolTraceItem,
1862
+ # ... existing imports continue ...
1863
+ )
1864
+ ```
1865
+
1866
+ (Add `AgentRunRequest`, `AgentRunResponse`, `AgentToolTraceItem` to the alphabetized import block at the top.)
1867
+
1868
+ Append at the bottom of `src/api/routes.py`:
1869
+
1870
+ ```python
1871
+
1872
+
1873
+ # --- Agent router ----------------------------------------------------------
1874
+
1875
+ agent_router = APIRouter(prefix="/agent")
1876
+
1877
+
1878
+ _DEFAULT_RAG_INDEX_DIR = Path("data/processed/faiss_index")
1879
+ _AGENT_MODEL_ENV = "NEUROBRIDGE_AGENT_MODEL"
1880
+ _AGENT_DEFAULT_MODEL = "google/gemini-2.0-flash-exp:free"
1881
+
1882
+
1883
+ def _build_orchestrator():
1884
+ """Construct the default orchestrator. Patchable in tests."""
1885
+ from openai import OpenAI
1886
+
1887
+ from src.agents.orchestrator import Orchestrator
1888
+ from src.agents.prompts import ORCHESTRATOR_SYSTEM_PROMPT
1889
+ from src.agents.tools import build_default_tools
1890
+
1891
+ api_key = os.environ.get("OPENROUTER_API_KEY")
1892
+ if not api_key:
1893
+ raise HTTPException(
1894
+ status_code=503,
1895
+ detail="OPENROUTER_API_KEY not set; agent surface unavailable.",
1896
+ )
1897
+ client = OpenAI(
1898
+ base_url="https://openrouter.ai/api/v1",
1899
+ api_key=api_key,
1900
+ timeout=30.0,
1901
+ )
1902
+ rag_dir = _DEFAULT_RAG_INDEX_DIR if _DEFAULT_RAG_INDEX_DIR.exists() else None
1903
+ tools = build_default_tools(rag_index_dir=rag_dir)
1904
+ model = os.environ.get(_AGENT_MODEL_ENV, _AGENT_DEFAULT_MODEL)
1905
+ return Orchestrator(
1906
+ llm_client=client,
1907
+ tools=tools,
1908
+ system_prompt=ORCHESTRATOR_SYSTEM_PROMPT,
1909
+ model=model,
1910
+ max_steps=5,
1911
+ )
1912
+
1913
+
1914
+ @agent_router.post("/run", response_model=AgentRunResponse)
1915
+ def run_agent(req: AgentRunRequest) -> AgentRunResponse:
1916
+ """Run the orchestrator on `user_input`. Picks a pipeline + grounds via RAG."""
1917
+ orch = _build_orchestrator()
1918
+ user_text = req.user_input
1919
+ if req.user_question:
1920
+ user_text = f"{req.user_input}\n\nUser question: {req.user_question}"
1921
+ result = orch.run(user_text)
1922
+ return AgentRunResponse(
1923
+ text=result.text,
1924
+ trace=[
1925
+ AgentToolTraceItem(name=t.name, args=t.args, result=t.result, error=t.error)
1926
+ for t in result.trace
1927
+ ],
1928
+ model=result.model,
1929
+ finish_reason=result.finish_reason,
1930
+ )
1931
+ ```
1932
+
1933
+ - [ ] **Step 5: Mount the router**
1934
+
1935
+ Modify `src/api/main.py`:
1936
+
1937
+ ```python
1938
+ from src.api.routes import (
1939
+ router as pipeline_router,
1940
+ predict_router,
1941
+ explain_router,
1942
+ experiments_router,
1943
+ agent_router,
1944
+ )
1945
+ ```
1946
+
1947
+ And add the include line:
1948
+
1949
+ ```python
1950
+ app.include_router(experiments_router)
1951
+ app.include_router(agent_router)
1952
+ ```
1953
+
1954
+ - [ ] **Step 6: Run test to verify it passes**
1955
+
1956
+ Run: `pytest tests/agents/test_agent_route.py -v`
1957
+
1958
+ Expected: 3 passed
1959
+
1960
+ - [ ] **Step 7: Run the full test suite to verify no regressions**
1961
+
1962
+ Run: `pytest -q`
1963
+
1964
+ Expected: All previously-passing tests still pass.
1965
+
1966
+ - [ ] **Step 8: Commit**
1967
+
1968
+ ```bash
1969
+ git add src/api/schemas.py src/api/routes.py src/api/main.py tests/agents/test_agent_route.py
1970
+ git commit -m "feat(api): POST /agent/run endpoint (orchestrator + RAG, stub-injectable)"
1971
+ ```
1972
+
1973
+ ---
1974
+
1975
+ ## Task 10: Streamlit Agent tab + decision trace UI
1976
+
1977
+ **Files:**
1978
+ - Modify: `src/frontend/app.py`
1979
+
1980
+ - [ ] **Step 1: Locate the existing tabs declaration**
1981
+
1982
+ Open `src/frontend/app.py`, find the line containing `bbb_tab, eeg_tab, mri_tab, assistant_tab, experiments_tab = st.tabs([` (around line 1755).
1983
+
1984
+ - [ ] **Step 2: Add a new "🤖 Agent" tab**
1985
+
1986
+ Replace the tabs declaration:
1987
+
1988
+ ```python
1989
+ bbb_tab, eeg_tab, mri_tab, assistant_tab, experiments_tab, agent_tab = st.tabs([
1990
+ "🧪 Molecule",
1991
+ "🌊 Signal",
1992
+ "🧠 Image",
1993
+ "🤝 AI Assistant",
1994
+ "🔬 Experiments",
1995
+ "🤖 Agent",
1996
+ ])
1997
+ ```
1998
+
1999
+ (Match the existing emoji + label style for the first five tabs — the exact strings may differ in your repo; only add the 6th tuple element and the 6th list element.)
2000
+
2001
+ - [ ] **Step 3: Implement the Agent tab body**
2002
+
2003
+ Find the end of `experiments_tab:` block. After it (still inside the same indentation as the other `with X_tab:` blocks), add:
2004
+
2005
+ ```python
2006
+ with agent_tab:
2007
+ st.markdown("### Orchestrator Agent")
2008
+ st.caption(
2009
+ "Pick the pipeline automatically, run it, then ground the response "
2010
+ "in curated reference docs (RAG)."
2011
+ )
2012
+
2013
+ with st.form("agent_form"):
2014
+ agent_input = st.text_input(
2015
+ "Input",
2016
+ value="CCO",
2017
+ help="SMILES (e.g., CCO), .fif/.edf path, or NIfTI directory path",
2018
+ )
2019
+ agent_question = st.text_input(
2020
+ "Question (optional)",
2021
+ value="",
2022
+ help="Ask in any language — the agent will mirror it in the response",
2023
+ )
2024
+ submitted = st.form_submit_button("Run agent")
2025
+
2026
+ if submitted and agent_input:
2027
+ with st.spinner("Agent is reasoning..."):
2028
+ try:
2029
+ payload: dict = {"user_input": agent_input}
2030
+ if agent_question:
2031
+ payload["user_question"] = agent_question
2032
+ response = _post("/agent/run", payload, timeout=120.0)
2033
+ except Exception as e:
2034
+ st.error(f"Agent run failed: {e}")
2035
+ else:
2036
+ st.markdown("#### Response")
2037
+ st.write(response.get("text", ""))
2038
+ st.caption(
2039
+ f"model: `{response.get('model', '?')}` · "
2040
+ f"finish: `{response.get('finish_reason', '?')}`"
2041
+ )
2042
+ trace = response.get("trace", [])
2043
+ with st.expander(f"🧠 Decision trace ({len(trace)} step{'s' if len(trace) != 1 else ''})", expanded=True):
2044
+ if not trace:
2045
+ st.write("_(no tool calls)_")
2046
+ for i, step in enumerate(trace, start=1):
2047
+ st.markdown(f"**{i}. `{step['name']}`**")
2048
+ if step.get("error"):
2049
+ st.error(step["error"])
2050
+ else:
2051
+ st.json(step.get("args", {}))
2052
+ st.json(step.get("result", {}))
2053
+ ```
2054
+
2055
+ - [ ] **Step 4: Verify the file imports / `_post` helper**
2056
+
2057
+ `_post` is the existing helper used by other tabs. If your version doesn't accept a `timeout` kwarg, add it. Search for `def _post`:
2058
+
2059
+ ```bash
2060
+ grep -n "def _post" src/frontend/app.py
2061
+ ```
2062
+
2063
+ If `_post` lacks a timeout parameter, modify its signature. If it already accepts it, no change needed.
2064
+
2065
+ - [ ] **Step 5: Smoke-test the import**
2066
+
2067
+ Run: `python -c "import importlib.util; spec = importlib.util.spec_from_file_location('app', 'src/frontend/app.py'); mod = importlib.util.module_from_spec(spec); spec.loader.exec_module(mod); print('imported ok')"`
2068
+
2069
+ Expected: `imported ok` (no syntax errors).
2070
+
2071
+ - [ ] **Step 6: Run the existing frontend smoke test**
2072
+
2073
+ Run: `pytest tests/frontend/ -v`
2074
+
2075
+ Expected: all green (existing import test still passes).
2076
+
2077
+ - [ ] **Step 7: Commit**
2078
+
2079
+ ```bash
2080
+ git add src/frontend/app.py
2081
+ git commit -m "feat(frontend): Agent tab with decision-trace expander"
2082
+ ```
2083
+
2084
+ ---
2085
+
2086
+ ## Task 11: Knowledge base seed + Dockerfile RAG ingest
2087
+
2088
+ **Files:**
2089
+ - Create: `data/knowledge_base/README.md`
2090
+ - Create: `data/knowledge_base/.gitkeep`
2091
+ - Modify: `Dockerfile`
2092
+ - Modify: `Dockerfile.hf`
2093
+
2094
+ - [ ] **Step 1: Create the knowledge-base directory + README**
2095
+
2096
+ ```bash
2097
+ mkdir -p data/knowledge_base
2098
+ touch data/knowledge_base/.gitkeep
2099
+ ```
2100
+
2101
+ Create `data/knowledge_base/README.md`:
2102
+
2103
+ ```markdown
2104
+ # RAG Knowledge Base
2105
+
2106
+ Drop reference documents here (`.md`, `.txt`, or `.pdf`). They will be
2107
+ ingested by `python -m src.rag.ingest` at Docker build time and surfaced
2108
+ to the orchestrator agent via the `retrieve_context` tool.
2109
+
2110
+ ## Recommended seed set
2111
+
2112
+ For a clinical-ML / NeuroBridge demo:
2113
+
2114
+ - **BBB / molecules**: Lipinski's Rule of Five (1997, 2001), Pajouhesh & Lenz
2115
+ CNS multiparameter optimization (2005)
2116
+ - **MRI / harmonization**: Fortin et al. ComBat for cortical thickness (2017),
2117
+ Fortin et al. ComBat for diffusion (2018), Johnson et al. original ComBat
2118
+ (2007, gene expression)
2119
+ - **EEG / artifacts**: Hyvärinen ICA primer (1999), MNE-Python overview
2120
+ (Gramfort 2013)
2121
+
2122
+ ## Format notes
2123
+
2124
+ - PDFs work via `pypdf`. OCR-only PDFs (scanned images) won't extract text;
2125
+ pre-OCR them first.
2126
+ - Markdown is preferred — full text + headers chunk cleanly.
2127
+ - Files are gitignored by default. Mount them via Docker volume in
2128
+ production, or COPY them in via a sub-path before the `RUN` ingest line.
2129
+
2130
+ ## Re-indexing
2131
+
2132
+ After adding/removing files, re-run:
2133
+
2134
+ python -m src.rag.ingest
2135
+
2136
+ This rewrites `data/processed/faiss_index/` from scratch (no incremental
2137
+ update — the index is small enough to rebuild in seconds).
2138
+ ```
2139
+
2140
+ - [ ] **Step 2: Add the ingest step to Dockerfile**
2141
+
2142
+ Open `Dockerfile`. Find the existing big `RUN mkdir -p data/raw data/processed && ...` block (around line 38). At the END of that block (before the `EXPOSE` line), append a new RUN step:
2143
+
2144
+ ```dockerfile
2145
+ # --- RAG knowledge base ingest ---
2146
+ # Build the FAISS index from any seed docs in tests/fixtures/kb_sample/
2147
+ # (always present) plus data/knowledge_base/ (optional, user-supplied via
2148
+ # additional COPY layer or volume mount). Empty KB → empty index, agent
2149
+ # still functions, retrieve_context just returns no chunks.
2150
+ COPY tests/fixtures/kb_sample/ ./data/knowledge_base/seed/
2151
+ RUN python -m src.rag.ingest data/knowledge_base data/processed/faiss_index
2152
+ ```
2153
+
2154
+ (Place this after the existing pipeline-seed block and before `EXPOSE 7860`.)
2155
+
2156
+ - [ ] **Step 3: Mirror the change in Dockerfile.hf**
2157
+
2158
+ Apply the exact same edit to `Dockerfile.hf` (it's currently identical to `Dockerfile` per the readback).
2159
+
2160
+ - [ ] **Step 4: Verify Dockerfile parses**
2161
+
2162
+ Run: `docker build --no-cache -f Dockerfile -t neurobridge-test . 2>&1 | tail -30`
2163
+
2164
+ Expected: build succeeds; the `python -m src.rag.ingest` step logs `Indexed N chunks → data/processed/faiss_index`.
2165
+
2166
+ (If Docker isn't available locally, skip and verify on next HF push instead — note the assumption in the commit.)
2167
+
2168
+ - [ ] **Step 5: Commit**
2169
+
2170
+ ```bash
2171
+ git add data/knowledge_base/README.md data/knowledge_base/.gitkeep Dockerfile Dockerfile.hf
2172
+ git commit -m "feat(deploy): build RAG index at Docker build time + KB seed dir"
2173
+ ```
2174
+
2175
+ ---
2176
+
2177
+ ## Task 12: Live OpenRouter integration test + diag endpoint
2178
+
2179
+ **Files:**
2180
+ - Create: `tests/agents/test_orchestrator_live.py`
2181
+ - Modify: `src/api/main.py`
2182
+
2183
+ - [ ] **Step 1: Write the network-gated live test**
2184
+
2185
+ Create `tests/agents/test_orchestrator_live.py`:
2186
+
2187
+ ```python
2188
+ """Live integration test — hits real OpenRouter, picks pipeline, retrieves chunks.
2189
+
2190
+ Skipped unless OPENROUTER_API_KEY is set. Marked `slow` (network round-trips).
2191
+ """
2192
+ from __future__ import annotations
2193
+
2194
+ import os
2195
+ from pathlib import Path
2196
+
2197
+ import pytest
2198
+ from openai import OpenAI
2199
+
2200
+ from src.agents.orchestrator import Orchestrator
2201
+ from src.agents.prompts import ORCHESTRATOR_SYSTEM_PROMPT
2202
+ from src.agents.tools import build_default_tools
2203
+ from src.rag.ingest import ingest_directory
2204
+
2205
+
2206
+ _FIXTURE_KB = Path(__file__).parent.parent / "fixtures" / "kb_sample"
2207
+ _DEFAULT_MODEL = "google/gemini-2.0-flash-exp:free"
2208
+ _FALLBACK_MODEL = "anthropic/claude-haiku-4-5"
2209
+
2210
+
2211
+ @pytest.mark.slow
2212
+ @pytest.mark.skipif(
2213
+ not os.environ.get("OPENROUTER_API_KEY"),
2214
+ reason="OPENROUTER_API_KEY not set",
2215
+ )
2216
+ class TestOrchestratorLive:
2217
+ @pytest.fixture(scope="class")
2218
+ def rag_dir(self, tmp_path_factory: pytest.TempPathFactory) -> Path:
2219
+ d = tmp_path_factory.mktemp("rag_live")
2220
+ ingest_directory(_FIXTURE_KB, d)
2221
+ return d
2222
+
2223
+ @pytest.fixture(scope="class")
2224
+ def client(self) -> OpenAI:
2225
+ return OpenAI(
2226
+ base_url="https://openrouter.ai/api/v1",
2227
+ api_key=os.environ["OPENROUTER_API_KEY"],
2228
+ timeout=30.0,
2229
+ )
2230
+
2231
+ def test_smiles_input_picks_bbb_then_retrieves(self, client: OpenAI, rag_dir: Path) -> None:
2232
+ tools = build_default_tools(rag_index_dir=rag_dir)
2233
+ orch = Orchestrator(
2234
+ llm_client=client,
2235
+ tools=tools,
2236
+ system_prompt=ORCHESTRATOR_SYSTEM_PROMPT,
2237
+ model=os.environ.get("NEUROBRIDGE_AGENT_MODEL", _DEFAULT_MODEL),
2238
+ max_steps=5,
2239
+ )
2240
+ result = orch.run("CCO")
2241
+ # Soft assertions — model behavior varies but the workflow shape is fixed.
2242
+ assert result.finish_reason == "complete", f"got {result.finish_reason}, trace={result.trace}"
2243
+ tool_names = [t.name for t in result.trace]
2244
+ assert "run_bbb_pipeline" in tool_names, f"BBB pipeline not called; trace={tool_names}"
2245
+ assert "retrieve_context" in tool_names, f"RAG not called; trace={tool_names}"
2246
+ assert result.text, "empty final text"
2247
+ ```
2248
+
2249
+ - [ ] **Step 2: Run the test (live, requires key)**
2250
+
2251
+ Run: `OPENROUTER_API_KEY=$OPENROUTER_API_KEY pytest tests/agents/test_orchestrator_live.py -v -m slow`
2252
+
2253
+ Expected: 1 passed. If the BBB pipeline tool fails because the model artifact isn't present, that is a separate setup issue — the test still validates the orchestration shape.
2254
+
2255
+ - [ ] **Step 3: Add /diag/agent endpoint**
2256
+
2257
+ In `src/api/main.py`, after the existing `diag_openrouter` function, append:
2258
+
2259
+ ```python
2260
+
2261
+
2262
+ @app.get("/diag/agent")
2263
+ def diag_agent() -> dict:
2264
+ """Reachability probe for the orchestrator agent surface.
2265
+
2266
+ Reports key presence (length + 12-char prefix only — never the full
2267
+ secret), the configured agent model, knowledge-base index status,
2268
+ and the registered tool names.
2269
+ """
2270
+ import os as _os
2271
+ from pathlib import Path as _Path
2272
+
2273
+ from src.agents.tools import build_default_tools
2274
+
2275
+ key = _os.environ.get("OPENROUTER_API_KEY") or ""
2276
+ model = _os.environ.get("NEUROBRIDGE_AGENT_MODEL", "google/gemini-2.0-flash-exp:free")
2277
+
2278
+ rag_dir = _Path("data/processed/faiss_index")
2279
+ rag_status: dict = {"index_dir": str(rag_dir), "exists": False, "chunk_count": 0}
2280
+ if (rag_dir / "index.bin").exists() and (rag_dir / "chunks.json").exists():
2281
+ rag_status["exists"] = True
2282
+ try:
2283
+ import json as _json
2284
+ rag_status["chunk_count"] = len(_json.loads((rag_dir / "chunks.json").read_text()))
2285
+ except Exception as e:
2286
+ rag_status["error"] = f"chunks.json unreadable: {e}"
2287
+
2288
+ tools = build_default_tools(rag_index_dir=rag_dir if rag_status["exists"] else None)
2289
+ return {
2290
+ "has_key": bool(key),
2291
+ "key_len": len(key),
2292
+ "key_prefix": key[:12] if key else None,
2293
+ "agent_model": model,
2294
+ "rag": rag_status,
2295
+ "tool_names": [t.name for t in tools],
2296
+ }
2297
+ ```
2298
+
2299
+ - [ ] **Step 4: Smoke-test the diag endpoint**
2300
+
2301
+ Start the API in one shell:
2302
+
2303
+ ```bash
2304
+ uvicorn src.api.main:app --port 8000 &
2305
+ sleep 3
2306
+ curl -s http://localhost:8000/diag/agent | python3 -m json.tool
2307
+ kill %1
2308
+ ```
2309
+
2310
+ Expected: JSON with `has_key`, `agent_model`, `rag.exists` (true if you ran the ingest CLI locally), and `tool_names: [...]` list of 4 tool names.
2311
+
2312
+ - [ ] **Step 5: Commit**
2313
+
2314
+ ```bash
2315
+ git add tests/agents/test_orchestrator_live.py src/api/main.py
2316
+ git commit -m "feat(agents): live OpenRouter integration test (slow) + GET /diag/agent"
2317
+ ```
2318
+
2319
+ ---
2320
+
2321
+ ## Task 13: Documentation update
2322
+
2323
+ **Files:**
2324
+ - Modify: `AGENTS.md`
2325
+ - Modify: `README.md`
2326
+
2327
+ - [ ] **Step 1: Add §15 + §16 to AGENTS.md**
2328
+
2329
+ Append to `AGENTS.md`:
2330
+
2331
+ ```markdown
2332
+
2333
+ ## 15. Orchestrator Agent Surface
2334
+
2335
+ `src/agents/orchestrator.py` exposes a single-agent function-calling
2336
+ loop over the openai SDK (no LangChain / framework dep). The agent
2337
+ holds 4 tools, defined in `src/agents/tools.py`:
2338
+
2339
+ - `run_bbb_pipeline(smiles, top_k)` — wraps `POST /predict/bbb`
2340
+ - `run_eeg_pipeline(input_path)` — wraps `POST /pipeline/eeg`
2341
+ - `run_mri_pipeline(input_dir, sites_csv)` — wraps `POST /pipeline/mri`
2342
+ - `retrieve_context(query, k)` — wraps `src/rag/retrieve.py`
2343
+
2344
+ The system prompt (`src/agents/prompts.py:ORCHESTRATOR_SYSTEM_PROMPT`)
2345
+ locks the workflow: pick exactly one pipeline → run it → formulate a
2346
+ focused retrieval query → call retrieve_context → synthesize a
2347
+ 3-5 sentence response that cites at least one chunk. Language of the
2348
+ final response is mirrored from the user's question.
2349
+
2350
+ `POST /agent/run` is the public surface. Default model is
2351
+ `google/gemini-2.0-flash-exp:free` on OpenRouter (function-calling
2352
+ support verified). Override via `NEUROBRIDGE_AGENT_MODEL` env var.
2353
+ Returns 503 when `OPENROUTER_API_KEY` is unset.
2354
+
2355
+ Diagnostics: `GET /diag/agent` returns key presence, configured model,
2356
+ RAG index status (chunk count), and the registered tool names.
2357
+
2358
+ ## 16. RAG Surface
2359
+
2360
+ `src/rag/` is the retrieval layer. Stack: `fastembed`
2361
+ (`BAAI/bge-small-en-v1.5`, 384-dim, ONNX, no torch dep) for
2362
+ embeddings + `faiss-cpu` (`IndexFlatIP` after L2-norm = cosine) for
2363
+ vector search.
2364
+
2365
+ Knowledge base lives at `data/knowledge_base/` (gitignored;
2366
+ user-supplied `.md` / `.txt` / `.pdf`). Build the FAISS index with:
2367
+
2368
+ python -m src.rag.ingest [<input_dir> [<output_dir>]]
2369
+
2370
+ Defaults: input=`data/knowledge_base/`, output=`data/processed/faiss_index/`.
2371
+ The Dockerfile runs this at build time so deployed Spaces start with
2372
+ a populated index. Empty KB → empty index → `retrieve_context`
2373
+ returns 0 chunks; the agent surfaces this and answers from the
2374
+ pipeline result alone.
2375
+
2376
+ `tests/fixtures/kb_sample/` ships 3 seed markdown files (Lipinski,
2377
+ ComBat, MNE+ICA) — these double as test fixtures and as the demo
2378
+ seed if no user-supplied PDFs are added.
2379
+ ```
2380
+
2381
+ - [ ] **Step 2: Add agent + RAG bullets to README.md "Where to Look"**
2382
+
2383
+ In `README.md`, find the "Where to Look" list. Append:
2384
+
2385
+ ```markdown
2386
+ - **Orchestrator agent (Task 13):** [`src/agents/orchestrator.py`](src/agents/orchestrator.py), [`src/agents/tools.py`](src/agents/tools.py), [`src/agents/prompts.py`](src/agents/prompts.py)
2387
+ - **RAG layer:** [`src/rag/`](src/rag/) — chunker, embedder (fastembed), FAISS store, retriever, ingest CLI
2388
+ - **Agent endpoint:** `POST /agent/run` (orchestrator + RAG); diagnostic at `GET /diag/agent`
2389
+ - **Streamlit Agent tab:** "🤖 Agent" tab in [`src/frontend/app.py`](src/frontend/app.py) — input box + decision-trace expander
2390
+ - **RAG knowledge base:** drop `.md`/`.pdf` into [`data/knowledge_base/`](data/knowledge_base/) — see its README
2391
+ ```
2392
+
2393
+ - [ ] **Step 3: Commit**
2394
+
2395
+ ```bash
2396
+ git add AGENTS.md README.md
2397
+ git commit -m "docs: §15 orchestrator agent + §16 RAG surface (AGENTS.md + README pointers)"
2398
+ ```
2399
+
2400
+ ---
2401
+
2402
+ ## Self-Review
2403
+
2404
+ **1. Spec coverage:** Walked through the user's spec — pipelines as tools (Tasks 6, 7), orchestrator at the front (Tasks 7, 8, 9), RAG feedback (Tasks 2-6, 7), modular (separate `src/agents/` and `src/rag/` packages with single-responsibility files), user-supplied KB files (Task 11). All covered.
2405
+
2406
+ **2. Placeholder scan:** No "TODO", "implement later", "fill in details", or "similar to Task N" in the body. Each step has full code.
2407
+
2408
+ **3. Type consistency:**
2409
+ - `BBBPipelineInput.smiles` (str) used in Task 7 schemas, Task 7 tool `_execute_bbb`, Task 8 stub test, Task 12 live test ✓
2410
+ - `RetrieveContextInput.query` + `k` used consistently in Task 7 schema, Task 7 tool, Task 8 prompt ✓
2411
+ - `Tool.invoke(args: dict)` returns dict — used in Task 8 orchestrator ✓
2412
+ - `AgentResult` / `ToolTraceItem` schemas used in Task 7 (define), Task 8 (build), Task 9 (route response model) ✓
2413
+ - `Orchestrator.__init__(llm_client, tools, system_prompt, model, max_steps, temperature)` matches usage in Task 9 `_build_orchestrator` and Task 12 live test ✓
2414
+ - Pipeline call paths: Task 7's `_execute_bbb` references `api_routes.predict_bbb` — verify this name matches `src/api/routes.py`. **Note for implementer:** If the actual function name differs (e.g., `predict_bbb_endpoint`), adapt the call site; the test in Task 8 uses a stub so it won't catch this. Same for `run_eeg_pipeline_route` / `run_mri_pipeline_route`.
2415
+
2416
+ ---
2417
+
2418
+ ## Execution Handoff
2419
+
2420
+ Plan complete and saved to `docs/superpowers/plans/2026-05-02-orchestrator-agent-rag.md`. Two execution options:
2421
+
2422
+ **1. Subagent-Driven (recommended)** — I dispatch a fresh subagent per task, review between tasks, fast iteration
2423
+
2424
+ **2. Inline Execution** — Execute tasks in this session using executing-plans, batch execution with checkpoints
2425
+
2426
+ Which approach?