Spaces:

mekosotto
/

hackathon

Running

mekosotto commited on 5 days ago

Commit

0b5f569

2 Parent(s): 582bce2 150cf3b

Merge feat/orchestrator-rag: orchestrator agent + RAG feedback layer

13 tasks delivered via subagent-driven-development:
- src/rag/ (chunker, fastembed, FAISS store, retriever, ingest CLI)
- src/agents/ (Tool dataclass + 4 wrappers, function-calling orchestrator loop)
- POST /agent/run + GET /diag/agent endpoints
- Streamlit '🤖 Agent' tab with decision-trace expander
- 3 seed KB markdown fixtures (Lipinski, ComBat, MNE+ICA)
- Dockerfile + Dockerfile.hf build-time RAG ingest
- AGENTS.md §15 + §16, README pointers

233 tests pass + 1 live test gated on (key + BBB model artifact).

Files changed (37) hide show

.gitignore +8 -0
AGENTS.md +47 -0
Dockerfile +8 -0
Dockerfile.hf +8 -0
README.md +5 -0
data/knowledge_base/.gitkeep +0 -0
data/knowledge_base/README.md +34 -0
requirements.txt +5 -0
src/agents/__init__.py +0 -0
src/agents/orchestrator.py +108 -0
src/agents/prompts.py +49 -0
src/agents/schemas.py +87 -0
src/agents/tools.py +223 -0
src/api/main.py +39 -0
src/api/routes.py +63 -0
src/api/schemas.py +24 -0
src/frontend/app.py +53 -3
src/rag/__init__.py +0 -0
src/rag/chunker.py +39 -0
src/rag/embed.py +39 -0
src/rag/ingest.py +85 -0
src/rag/retrieve.py +40 -0
src/rag/store.py +66 -0
tests/agents/__init__.py +0 -0
tests/agents/test_agent_route.py +54 -0
tests/agents/test_orchestrator.py +161 -0
tests/agents/test_orchestrator_live.py +74 -0
tests/agents/test_tools.py +128 -0
tests/fixtures/kb_sample/combat_harmonization_primer.md +27 -0
tests/fixtures/kb_sample/lipinski_rule_of_five.md +30 -0
tests/fixtures/kb_sample/mne_ica_basics.md +29 -0
tests/rag/__init__.py +0 -0
tests/rag/test_chunker.py +40 -0
tests/rag/test_embed.py +42 -0
tests/rag/test_ingest.py +40 -0
tests/rag/test_retrieve.py +45 -0
tests/rag/test_store.py +70 -0

.gitignore CHANGED Viewed

@@ -34,3 +34,11 @@ mlartifacts/
 .idea/
 .vscode/
 .DS_Store

 .idea/
 .vscode/
 .DS_Store
+# RAG knowledge base — ignore user-supplied content; allow only README/.gitkeep
+data/knowledge_base/*
+!data/knowledge_base/README.md
+!data/knowledge_base/.gitkeep
+# RAG built artifacts
+data/processed/faiss_index/

AGENTS.md CHANGED Viewed

@@ -305,3 +305,50 @@ deterministic template path for a fully-reproducible demo.
 The README's YAML front-matter declares the Space metadata
 (SDK=docker, port=7860, app_file=src/frontend/app.py).

 The README's YAML front-matter declares the Space metadata
 (SDK=docker, port=7860, app_file=src/frontend/app.py).
+## 15. Orchestrator Agent Surface
+`src/agents/orchestrator.py` exposes a single-agent function-calling
+loop over the openai SDK (no LangChain / framework dep). The agent
+holds 4 tools, defined in `src/agents/tools.py`:
+- `run_bbb_pipeline(smiles, top_k)` — wraps `POST /predict/bbb`
+- `run_eeg_pipeline(input_path)` — wraps `POST /pipeline/eeg`
+- `run_mri_pipeline(input_dir, sites_csv)` — wraps `POST /pipeline/mri`
+- `retrieve_context(query, k)` — wraps `src/rag/retrieve.py`
+The system prompt (`src/agents/prompts.py:ORCHESTRATOR_SYSTEM_PROMPT`)
+locks the workflow: pick exactly one pipeline → run it → formulate a
+focused retrieval query → call retrieve_context → synthesize a
+3-5 sentence response that cites at least one chunk. Language of the
+final response is mirrored from the user's question.
+`POST /agent/run` is the public surface. Default model is
+`google/gemini-2.0-flash-exp:free` on OpenRouter (function-calling
+support verified). Override via `NEUROBRIDGE_AGENT_MODEL` env var.
+Returns 503 when `OPENROUTER_API_KEY` is unset.
+Diagnostics: `GET /diag/agent` returns key presence, configured model,
+RAG index status (chunk count), and the registered tool names.
+## 16. RAG Surface
+`src/rag/` is the retrieval layer. Stack: `fastembed`
+(`BAAI/bge-small-en-v1.5`, 384-dim, ONNX, no torch dep) for
+embeddings + `faiss-cpu` (`IndexFlatIP` after L2-norm = cosine) for
+vector search.
+Knowledge base lives at `data/knowledge_base/` (gitignored;
+user-supplied `.md` / `.txt` / `.pdf`). Build the FAISS index with:
+    python -m src.rag.ingest [<input_dir> [<output_dir>]]
+Defaults: input=`data/knowledge_base/`, output=`data/processed/faiss_index/`.
+The Dockerfile runs this at build time so deployed Spaces start with
+a populated index. Empty KB → empty index → `retrieve_context`
+returns 0 chunks; the agent surfaces this and answers from the
+pipeline result alone.
+`tests/fixtures/kb_sample/` ships 3 seed markdown files (Lipinski,
+ComBat, MNE+ICA) — these double as test fixtures and as the demo
+seed if no user-supplied PDFs are added.

Dockerfile CHANGED Viewed

@@ -43,6 +43,14 @@ RUN mkdir -p data/raw data/processed && \
     python -c "from pathlib import Path; from src.pipelines.eeg_pipeline import run_pipeline; run_pipeline(input_path=Path('tests/fixtures/eeg_sample.fif'), output_path=Path('data/processed/eeg_features.parquet'))" && \
     python -c "from pathlib import Path; from src.pipelines.mri_pipeline import run_pipeline; run_pipeline(input_dir=Path('tests/fixtures/mri_sample'), sites_csv=Path('tests/fixtures/mri_sample/sites.csv'), output_path=Path('data/processed/mri_features.parquet'))"
 # --- HF Spaces convention ---
 EXPOSE 7860

     python -c "from pathlib import Path; from src.pipelines.eeg_pipeline import run_pipeline; run_pipeline(input_path=Path('tests/fixtures/eeg_sample.fif'), output_path=Path('data/processed/eeg_features.parquet'))" && \
     python -c "from pathlib import Path; from src.pipelines.mri_pipeline import run_pipeline; run_pipeline(input_dir=Path('tests/fixtures/mri_sample'), sites_csv=Path('tests/fixtures/mri_sample/sites.csv'), output_path=Path('data/processed/mri_features.parquet'))"
+# --- RAG knowledge base ingest ---
+# Build the FAISS index from any seed docs in tests/fixtures/kb_sample/
+# (always present) plus data/knowledge_base/ (optional, user-supplied via
+# additional COPY layer or volume mount). Empty KB → empty index, agent
+# still functions, retrieve_context just returns no chunks.
+COPY tests/fixtures/kb_sample/ ./data/knowledge_base/seed/
+RUN python -m src.rag.ingest data/knowledge_base data/processed/faiss_index
 # --- HF Spaces convention ---
 EXPOSE 7860

Dockerfile.hf CHANGED Viewed

@@ -43,6 +43,14 @@ RUN mkdir -p data/raw data/processed && \
     python -c "from pathlib import Path; from src.pipelines.eeg_pipeline import run_pipeline; run_pipeline(input_path=Path('tests/fixtures/eeg_sample.fif'), output_path=Path('data/processed/eeg_features.parquet'))" && \
     python -c "from pathlib import Path; from src.pipelines.mri_pipeline import run_pipeline; run_pipeline(input_dir=Path('tests/fixtures/mri_sample'), sites_csv=Path('tests/fixtures/mri_sample/sites.csv'), output_path=Path('data/processed/mri_features.parquet'))"
 # --- HF Spaces convention ---
 EXPOSE 7860

     python -c "from pathlib import Path; from src.pipelines.eeg_pipeline import run_pipeline; run_pipeline(input_path=Path('tests/fixtures/eeg_sample.fif'), output_path=Path('data/processed/eeg_features.parquet'))" && \
     python -c "from pathlib import Path; from src.pipelines.mri_pipeline import run_pipeline; run_pipeline(input_dir=Path('tests/fixtures/mri_sample'), sites_csv=Path('tests/fixtures/mri_sample/sites.csv'), output_path=Path('data/processed/mri_features.parquet'))"
+# --- RAG knowledge base ingest ---
+# Build the FAISS index from any seed docs in tests/fixtures/kb_sample/
+# (always present) plus data/knowledge_base/ (optional, user-supplied via
+# additional COPY layer or volume mount). Empty KB → empty index, agent
+# still functions, retrieve_context just returns no chunks.
+COPY tests/fixtures/kb_sample/ ./data/knowledge_base/seed/
+RUN python -m src.rag.ingest data/knowledge_base data/processed/faiss_index
 # --- HF Spaces convention ---
 EXPOSE 7860

README.md CHANGED Viewed

@@ -225,6 +225,11 @@ finishes in under 4 seconds on a 2024 laptop.
 - **New surfaces:** `POST /explain/eeg`, `POST /explain/mri`, `GET /experiments/runs`, `POST /experiments/diff`
 - **New deploy artifacts:** `Dockerfile.hf`, `supervisord.conf`
 - **LLM hardening (post-Day 8):** real OpenRouter LLM is now the default in deployed Spaces — `Dockerfile`/`Dockerfile.hf` no longer hard-code `NEUROBRIDGE_DISABLE_LLM=1`. Free-tier fallback chain (10 models, smartest → smallest) in [`src/llm/explainer.py`](src/llm/explainer.py), 401/400 status classification, and language-matching / intent-split prompt. Diagnostic endpoint `GET /diag/openrouter` ([`src/api/main.py`](src/api/main.py)) + Streamlit sidebar "🔧 Diagnose LLM" button. Live verification helper: [`scripts/diagnose_openrouter.py`](scripts/diagnose_openrouter.py).
 ## Day 7 — Demo Recipe

 - **New surfaces:** `POST /explain/eeg`, `POST /explain/mri`, `GET /experiments/runs`, `POST /experiments/diff`
 - **New deploy artifacts:** `Dockerfile.hf`, `supervisord.conf`
 - **LLM hardening (post-Day 8):** real OpenRouter LLM is now the default in deployed Spaces — `Dockerfile`/`Dockerfile.hf` no longer hard-code `NEUROBRIDGE_DISABLE_LLM=1`. Free-tier fallback chain (10 models, smartest → smallest) in [`src/llm/explainer.py`](src/llm/explainer.py), 401/400 status classification, and language-matching / intent-split prompt. Diagnostic endpoint `GET /diag/openrouter` ([`src/api/main.py`](src/api/main.py)) + Streamlit sidebar "🔧 Diagnose LLM" button. Live verification helper: [`scripts/diagnose_openrouter.py`](scripts/diagnose_openrouter.py).
+- **Orchestrator agent (Task 13):** [`src/agents/orchestrator.py`](src/agents/orchestrator.py), [`src/agents/tools.py`](src/agents/tools.py), [`src/agents/prompts.py`](src/agents/prompts.py)
+- **RAG layer:** [`src/rag/`](src/rag/) — chunker, embedder (fastembed), FAISS store, retriever, ingest CLI
+- **Agent endpoint:** `POST /agent/run` (orchestrator + RAG); diagnostic at `GET /diag/agent`
+- **Streamlit Agent tab:** "🤖 Agent" tab in [`src/frontend/app.py`](src/frontend/app.py) — input box + decision-trace expander
+- **RAG knowledge base:** drop `.md`/`.pdf` into [`data/knowledge_base/`](data/knowledge_base/) — see its README
 ## Day 7 — Demo Recipe

data/knowledge_base/.gitkeep ADDED Viewed

File without changes

data/knowledge_base/README.md ADDED Viewed

	@@ -0,0 +1,34 @@

+# RAG Knowledge Base
+Drop reference documents here (`.md`, `.txt`, or `.pdf`). They will be
+ingested by `python -m src.rag.ingest` at Docker build time and surfaced
+to the orchestrator agent via the `retrieve_context` tool.
+## Recommended seed set
+For a clinical-ML / NeuroBridge demo:
+- **BBB / molecules**: Lipinski's Rule of Five (1997, 2001), Pajouhesh & Lenz
+  CNS multiparameter optimization (2005)
+- **MRI / harmonization**: Fortin et al. ComBat for cortical thickness (2017),
+  Fortin et al. ComBat for diffusion (2018), Johnson et al. original ComBat
+  (2007, gene expression)
+- **EEG / artifacts**: Hyvärinen ICA primer (1999), MNE-Python overview
+  (Gramfort 2013)
+## Format notes
+- PDFs work via `pypdf`. OCR-only PDFs (scanned images) won't extract text;
+  pre-OCR them first.
+- Markdown is preferred — full text + headers chunk cleanly.
+- Files are gitignored by default. Mount them via Docker volume in
+  production, or COPY them in via a sub-path before the `RUN` ingest line.
+## Re-indexing
+After adding/removing files, re-run:
+    python -m src.rag.ingest
+This rewrites `data/processed/faiss_index/` from scratch (no incremental
+update — the index is small enough to rebuild in seconds).

requirements.txt CHANGED Viewed

@@ -37,6 +37,11 @@ pytest==8.3.3
 pytest-cov==5.0.0
 httpx==0.27.2  # FastAPI test client
 # --- Frontend (B2B dashboard) ---
 streamlit==1.39.0

 pytest-cov==5.0.0
 httpx==0.27.2  # FastAPI test client
+# --- RAG (knowledge retrieval for agent feedback loop) ---
+fastembed==0.4.2          # ONNX-based embeddings, no torch dep
+faiss-cpu==1.8.0          # vector store
+pypdf==5.0.1              # PDF text extraction
 # --- Frontend (B2B dashboard) ---
 streamlit==1.39.0

src/agents/__init__.py ADDED Viewed

File without changes

src/agents/orchestrator.py ADDED Viewed

	@@ -0,0 +1,108 @@

+"""Orchestrator agent: function-calling loop over a list of Tools.
+No agent framework — uses the openai SDK's chat-completions function-calling
+interface directly. This is the same SDK already used by src/llm/explainer.py,
+keeping the dependency surface minimal.
+Public entry: `Orchestrator(llm_client, tools, system_prompt, model).run(user_input)`.
+Returns an `AgentResult` with synthesized text + full tool-call trace.
+"""
+from __future__ import annotations
+import json
+from typing import Any
+from src.agents.schemas import AgentResult, ToolTraceItem
+from src.agents.tools import Tool
+from src.core.logger import get_logger
+logger = get_logger(__name__)
+class Orchestrator:
+    """Single-agent function-calling loop. Stops on (a) text response, (b) max steps."""
+    def __init__(
+        self,
+        llm_client: Any,
+        tools: list[Tool],
+        system_prompt: str,
+        model: str,
+        max_steps: int = 5,
+        temperature: float = 0.0,
+    ) -> None:
+        self._client = llm_client
+        self._tools_by_name = {t.name: t for t in tools}
+        self._tool_schemas = [t.openai_schema() for t in tools]
+        self._system_prompt = system_prompt
+        self._model = model
+        self._max_steps = max_steps
+        self._temperature = temperature
+    def run(self, user_input: str) -> AgentResult:
+        messages: list[dict[str, Any]] = [
+            {"role": "system", "content": self._system_prompt},
+            {"role": "user", "content": user_input},
+        ]
+        trace: list[ToolTraceItem] = []
+        for _step in range(self._max_steps):
+            response = self._client.chat.completions.create(
+                model=self._model,
+                messages=messages,
+                tools=self._tool_schemas,
+                tool_choice="auto",
+                temperature=self._temperature,
+            )
+            msg = response.choices[0].message
+            if not getattr(msg, "tool_calls", None):
+                return AgentResult(
+                    text=(msg.content or "").strip(),
+                    trace=trace,
+                    model=self._model,
+                    finish_reason="complete",
+                )
+            messages.append({
+                "role": "assistant",
+                "content": msg.content,
+                "tool_calls": [tc.model_dump() for tc in msg.tool_calls],
+            })
+            for tc in msg.tool_calls:
+                name = tc.function.name
+                tool = self._tools_by_name.get(name)
+                if tool is None:
+                    err = f"unknown tool: {name}"
+                    trace.append(ToolTraceItem(name=name, args={}, error=err))
+                    messages.append({
+                        "role": "tool",
+                        "tool_call_id": tc.id,
+                        "content": json.dumps({"error": err}),
+                    })
+                    continue
+                try:
+                    args = json.loads(tc.function.arguments or "{}")
+                    result = tool.invoke(args)
+                    trace.append(ToolTraceItem(name=name, args=args, result=result))
+                    messages.append({
+                        "role": "tool",
+                        "tool_call_id": tc.id,
+                        "content": json.dumps({"result": result}, default=str),
+                    })
+                except Exception as e:
+                    err = str(e)
+                    trace.append(ToolTraceItem(name=name, args={}, error=err))
+                    messages.append({
+                        "role": "tool",
+                        "tool_call_id": tc.id,
+                        "content": json.dumps({"error": err}),
+                    })
+        return AgentResult(
+            text="Max steps reached without a final answer.",
+            trace=trace,
+            model=self._model,
+            finish_reason="max_steps",
+        )

src/agents/prompts.py ADDED Viewed

	@@ -0,0 +1,49 @@

+"""System prompts for the orchestrator agent.
+Kept in a dedicated module so prompt edits are diff-readable and reviewable
+in isolation from the orchestrator loop.
+"""
+from __future__ import annotations
+ORCHESTRATOR_SYSTEM_PROMPT = """\
+You are the NeuroBridge clinical-ML orchestrator. You have four tools:
+- run_bbb_pipeline(smiles, top_k=5)         → for a SMILES molecular string
+- run_eeg_pipeline(input_path)               → for a .fif or .edf EEG file path
+- run_mri_pipeline(input_dir, sites_csv)     → for a directory of NIfTI MRI files
+- retrieve_context(query, k=4)               → for grounding chunks from the knowledge base
+Workflow — follow exactly:
+1. Look at the user input. Decide which ONE pipeline tool fits:
+   - SMILES (short, all-letters/digits, no slashes, no .ext)        → run_bbb_pipeline
+   - Path ending in .fif or .edf                                    → run_eeg_pipeline
+   - Path that is a directory (no file extension at the tail)       → run_mri_pipeline
+   If ambiguous, prefer SMILES if it parses; otherwise return:
+   "Cannot identify modality. Provide a SMILES, .fif/.edf path, or NIfTI directory."
+2. Call the chosen pipeline tool exactly once with the user input.
+3. After the pipeline returns, formulate ONE focused retrieval query that
+   captures the scientific concept behind the prediction (NOT the raw input).
+   Examples of good queries:
+   - "BBB permeability of small lipophilic molecules" (after BBB predict)
+   - "ICA artifact removal in multi-channel EEG" (after EEG run)
+   - "ComBat scanner site harmonization in multi-center MRI" (after MRI run)
+   Then call retrieve_context with that query.
+4. Synthesize a final response in 3-5 sentences:
+   - State the concrete pipeline result (label, confidence, key numbers).
+   - Cite at least one specific fact from the retrieved chunks (mention the
+     source file in parentheses, e.g. "(lipinski_rule_of_five.md)").
+   - Match the user's question language: Turkish in → Turkish out, etc.
+   - If retrieve_context returned 0 chunks, say so explicitly and answer
+     using only the pipeline result.
+Hard constraints:
+- Call exactly ONE pipeline tool, then exactly ONE retrieve_context, then stop.
+- Do NOT invent facts. Only use numbers from the pipeline tool output and
+  text from the retrieved chunks.
+- No preamble, no apologies, no meta-commentary about being an AI.
+"""

src/agents/schemas.py ADDED Viewed

	@@ -0,0 +1,87 @@

+"""Pydantic input/output schemas for orchestrator tools and the agent result.
+These schemas double as OpenAI function-calling parameter definitions
+(via `model_json_schema()`) and as runtime validation gates. Keep field
+names lowercase + snake_case so prompts and JSON outputs align.
+"""
+from __future__ import annotations
+from typing import Any
+from pydantic import BaseModel, Field
+# --- Pipeline tool inputs ---------------------------------------------------
+class BBBPipelineInput(BaseModel):
+    """Input for `run_bbb_pipeline` — a single SMILES string."""
+    smiles: str = Field(..., description="A single molecular SMILES string, e.g. 'CCO'")
+    top_k: int = Field(5, ge=1, le=20, description="Top-k SHAP attributions to return")
+class EEGPipelineInput(BaseModel):
+    """Input for `run_eeg_pipeline` — path to an EEG file (.fif or .edf)."""
+    input_path: str = Field(..., description="Path to EEG recording file (.fif or .edf)")
+    epoch_duration_s: float = Field(2.0, gt=0.1, le=60.0)
+class MRIPipelineInput(BaseModel):
+    """Input for `run_mri_pipeline` — directory of NIfTI files + sites CSV."""
+    input_dir: str = Field(..., description="Directory containing .nii.gz volumes")
+    sites_csv: str = Field(..., description="CSV mapping subject_id → site")
+class RetrieveContextInput(BaseModel):
+    """Input for `retrieve_context` — natural-language query into the KB."""
+    query: str = Field(..., min_length=2, description="Search query for the knowledge base")
+    k: int = Field(4, ge=1, le=10, description="Number of chunks to return")
+# --- Pipeline tool outputs --------------------------------------------------
+class BBBPipelineOutput(BaseModel):
+    smiles: str
+    label: int
+    label_text: str
+    confidence: float
+    top_features: list[dict[str, Any]]
+    drift_z: float | None = None
+class EEGPipelineOutput(BaseModel):
+    input_path: str
+    output_path: str
+    rows: int
+    columns: int
+    duration_sec: float
+class MRIPipelineOutput(BaseModel):
+    input_dir: str
+    output_path: str
+    rows: int
+    columns: int
+    duration_sec: float
+class RetrieveContextOutput(BaseModel):
+    query: str
+    chunks: list[dict[str, Any]]
+# --- Agent result -----------------------------------------------------------
+class ToolTraceItem(BaseModel):
+    """One step in the orchestrator's tool-call trace."""
+    name: str
+    args: dict[str, Any]
+    result: dict[str, Any] | None = None
+    error: str | None = None
+class AgentResult(BaseModel):
+    """Final orchestrator response: synthesized text + full trace."""
+    text: str
+    trace: list[ToolTraceItem] = Field(default_factory=list)
+    model: str | None = None
+    finish_reason: str = "complete"  # complete | max_steps | error

src/agents/tools.py ADDED Viewed

	@@ -0,0 +1,223 @@

+"""Tool dataclass + registry. Wraps each pipeline + the RAG retriever as a
+function-callable tool the orchestrator can invoke.
+Public entry: `build_default_tools(rag_index_dir)` returns the 4 tools.
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any, Callable
+from pydantic import BaseModel, ValidationError
+from src.agents.schemas import (
+    BBBPipelineInput,
+    BBBPipelineOutput,
+    EEGPipelineInput,
+    EEGPipelineOutput,
+    MRIPipelineInput,
+    MRIPipelineOutput,
+    RetrieveContextInput,
+    RetrieveContextOutput,
+)
+from src.core.logger import get_logger
+logger = get_logger(__name__)
+@dataclass
+class Tool:
+    """One callable tool exposed to the orchestrator.
+    `execute(input_model_instance) -> output_model_instance` is the contract.
+    `invoke(args_dict)` validates the dict, runs execute, returns a plain dict.
+    """
+    name: str
+    description: str
+    input_model: type[BaseModel]
+    output_model: type[BaseModel]
+    execute: Callable[[Any], BaseModel]
+    def openai_schema(self) -> dict[str, Any]:
+        """OpenAI/OpenRouter function-calling schema for this tool."""
+        params = self.input_model.model_json_schema()
+        # OpenAI doesn't accept top-level $defs / title in some clients —
+        # strip the cosmetic ones; keep properties/required/type.
+        cleaned = {
+            "type": "object",
+            "properties": params.get("properties", {}),
+            "required": params.get("required", []),
+        }
+        return {
+            "type": "function",
+            "function": {
+                "name": self.name,
+                "description": self.description,
+                "parameters": cleaned,
+            },
+        }
+    def invoke(self, args: dict[str, Any]) -> dict[str, Any]:
+        try:
+            inp = self.input_model.model_validate(args)
+        except ValidationError as e:
+            raise ValueError(f"invalid input for {self.name}: {e}") from e
+        out = self.execute(inp)
+        return out.model_dump()
+# ---------------------------------------------------------------------------
+# Tool implementations — thin wrappers around existing pipelines + RAG.
+# Heavy work stays in the underlying modules; these only adapt I/O.
+# ---------------------------------------------------------------------------
+def _make_bbb_executor() -> Callable[[BBBPipelineInput], BBBPipelineOutput]:
+    """Closure factory: BBB permeability prediction + SHAP, translates HTTPException."""
+    def execute(inp: BBBPipelineInput) -> BBBPipelineOutput:
+        from src.api import routes as api_routes
+        from src.api.schemas import BBBPredictRequest
+        from fastapi import HTTPException
+        try:
+            response = api_routes.predict_bbb(
+                BBBPredictRequest(smiles=inp.smiles, top_k=inp.top_k)
+            )
+        except HTTPException as e:
+            raise ValueError(f"bbb tool failed: {e.detail}") from e
+        return BBBPipelineOutput(
+            smiles=inp.smiles,
+            label=response.label,
+            label_text=response.label_text,
+            confidence=response.confidence,
+            top_features=[f.model_dump() for f in response.top_features],
+            drift_z=response.drift_z,
+        )
+    return execute
+def _make_eeg_executor(processed_dir: Path) -> Callable[[EEGPipelineInput], EEGPipelineOutput]:
+    """Closure factory: EEG pipeline, writes output under processed_dir."""
+    def execute(inp: EEGPipelineInput) -> EEGPipelineOutput:
+        from src.api.schemas import EEGRequest
+        from src.api import routes as api_routes
+        from fastapi import HTTPException
+        out_path = processed_dir / "eeg_features.parquet"
+        try:
+            response = api_routes.run_eeg(
+                EEGRequest(
+                    input_path=inp.input_path,
+                    output_path=str(out_path),
+                    epoch_duration_s=inp.epoch_duration_s,
+                )
+            )
+        except HTTPException as e:
+            raise ValueError(f"eeg tool failed: {e.detail}") from e
+        return EEGPipelineOutput(
+            input_path=inp.input_path,
+            output_path=response.output_path,
+            rows=response.rows,
+            columns=response.columns,
+            duration_sec=response.duration_sec,
+        )
+    return execute
+def _make_mri_executor(processed_dir: Path) -> Callable[[MRIPipelineInput], MRIPipelineOutput]:
+    """Closure factory: MRI pipeline, writes output under processed_dir."""
+    def execute(inp: MRIPipelineInput) -> MRIPipelineOutput:
+        from src.api.schemas import MRIRequest
+        from src.api import routes as api_routes
+        from fastapi import HTTPException
+        out_path = processed_dir / "mri_features.parquet"
+        try:
+            response = api_routes.run_mri(
+                MRIRequest(
+                    input_dir=inp.input_dir,
+                    sites_csv=inp.sites_csv,
+                    output_path=str(out_path),
+                )
+            )
+        except HTTPException as e:
+            raise ValueError(f"mri tool failed: {e.detail}") from e
+        return MRIPipelineOutput(
+            input_dir=inp.input_dir,
+            output_path=response.output_path,
+            rows=response.rows,
+            columns=response.columns,
+            duration_sec=response.duration_sec,
+        )
+    return execute
+def _make_retrieve_executor(rag_index_dir: Path | None) -> Callable[[RetrieveContextInput], RetrieveContextOutput]:
+    """Closure: capture the index dir; lazy-load the retriever on first call."""
+    state: dict[str, Any] = {"retriever": None}
+    def execute(inp: RetrieveContextInput) -> RetrieveContextOutput:
+        if rag_index_dir is None or not (rag_index_dir / "index.bin").exists():
+            return RetrieveContextOutput(query=inp.query, chunks=[])
+        if state["retriever"] is None:
+            from src.rag.retrieve import RAGRetriever
+            state["retriever"] = RAGRetriever.load(rag_index_dir)
+        hits = state["retriever"].search(inp.query, k=inp.k)
+        return RetrieveContextOutput(query=inp.query, chunks=hits)
+    return execute
+def build_default_tools(
+    rag_index_dir: Path | None,
+    processed_dir: Path = Path("data/processed"),
+) -> list[Tool]:
+    """Return the 4 tools the orchestrator gets by default."""
+    return [
+        Tool(
+            name="run_bbb_pipeline",
+            description=(
+                "Predict blood-brain-barrier permeability for a SINGLE SMILES "
+                "string. Use this when the user input looks like a molecule "
+                "(short alphanumeric string with no file extension, e.g. 'CCO', "
+                "'c1ccccc1'). Returns label, confidence, top SHAP features, drift."
+            ),
+            input_model=BBBPipelineInput,
+            output_model=BBBPipelineOutput,
+            execute=_make_bbb_executor(),
+        ),
+        Tool(
+            name="run_eeg_pipeline",
+            description=(
+                "Run the EEG signal-processing pipeline (bandpass + ICA + "
+                "epoching + feature extraction) on an EEG recording file. Use "
+                "when input_path ends in .fif or .edf. Returns row/column "
+                "counts + duration."
+            ),
+            input_model=EEGPipelineInput,
+            output_model=EEGPipelineOutput,
+            execute=_make_eeg_executor(processed_dir),
+        ),
+        Tool(
+            name="run_mri_pipeline",
+            description=(
+                "Run the multi-site MRI ComBat-harmonization pipeline. Use "
+                "when input is a directory containing .nii.gz volumes paired "
+                "with a sites.csv. Returns row/column counts + duration."
+            ),
+            input_model=MRIPipelineInput,
+            output_model=MRIPipelineOutput,
+            execute=_make_mri_executor(processed_dir),
+        ),
+        Tool(
+            name="retrieve_context",
+            description=(
+                "Retrieve up to k passages from the curated reference knowledge "
+                "base. Use AFTER a pipeline tool returns, to ground your final "
+                "synthesis in cited literature. Formulate a focused query "
+                "based on the pipeline output (e.g., 'BBB permeability of "
+                "small lipophilic molecules' or 'ComBat site harmonization')."
+            ),
+            input_model=RetrieveContextInput,
+            output_model=RetrieveContextOutput,
+            execute=_make_retrieve_executor(rag_index_dir),
+        ),
+    ]

src/api/main.py CHANGED Viewed

@@ -11,6 +11,7 @@ from src.api.routes import (
     predict_router,
     explain_router,
     experiments_router,
 )
 from src.api.schemas import HealthResponse
@@ -24,6 +25,7 @@ app.include_router(pipeline_router)
 app.include_router(predict_router)
 app.include_router(explain_router)
 app.include_router(experiments_router)
 @app.get("/health", response_model=HealthResponse)
@@ -100,3 +102,40 @@ def diag_openrouter() -> dict:
         out["probe"] = {"status": "ERR", "exception": type(e).__name__, "message": str(e)[:200]}
     return out

     predict_router,
     explain_router,
     experiments_router,
+    agent_router,
 )
 from src.api.schemas import HealthResponse
 app.include_router(predict_router)
 app.include_router(explain_router)
 app.include_router(experiments_router)
+app.include_router(agent_router)
 @app.get("/health", response_model=HealthResponse)
         out["probe"] = {"status": "ERR", "exception": type(e).__name__, "message": str(e)[:200]}
     return out
+@app.get("/diag/agent")
+def diag_agent() -> dict:
+    """Reachability probe for the orchestrator agent surface.
+    Reports key presence (length + 12-char prefix only — never the full
+    secret), the configured agent model, knowledge-base index status,
+    and the registered tool names.
+    """
+    import os as _os
+    from pathlib import Path as _Path
+    from src.agents.tools import build_default_tools
+    key = _os.environ.get("OPENROUTER_API_KEY") or ""
+    model = _os.environ.get("NEUROBRIDGE_AGENT_MODEL", "google/gemini-2.0-flash-exp:free")
+    rag_dir = _Path("data/processed/faiss_index")
+    rag_status: dict = {"index_dir": str(rag_dir), "exists": False, "chunk_count": 0}
+    if (rag_dir / "index.bin").exists() and (rag_dir / "chunks.json").exists():
+        rag_status["exists"] = True
+        try:
+            import json as _json
+            rag_status["chunk_count"] = len(_json.loads((rag_dir / "chunks.json").read_text()))
+        except Exception as e:
+            rag_status["error"] = f"chunks.json unreadable: {e}"
+    tools = build_default_tools(rag_index_dir=rag_dir if rag_status["exists"] else None)
+    return {
+        "has_key": bool(key),
+        "key_len": len(key),
+        "key_prefix": key[:12] if key else None,
+        "agent_model": model,
+        "rag": rag_status,
+        "tool_names": [t.name for t in tools],
+    }

src/api/routes.py CHANGED Viewed

@@ -18,6 +18,9 @@ import pandas as pd
 from fastapi import APIRouter, HTTPException
 from src.api.schemas import (
     BBBExplainRequest,
     BBBExplainResponse,
     BBBPredictRequest,
@@ -500,3 +503,63 @@ def diff_runs(req: RunDiffRequest) -> RunDiffResponse:
             )
         )
     return RunDiffResponse(rows=rows)

 from fastapi import APIRouter, HTTPException
 from src.api.schemas import (
+    AgentRunRequest,
+    AgentRunResponse,
+    AgentToolTraceItem,
     BBBExplainRequest,
     BBBExplainResponse,
     BBBPredictRequest,
             )
         )
     return RunDiffResponse(rows=rows)
+# --- Agent router ----------------------------------------------------------
+agent_router = APIRouter(prefix="/agent")
+_DEFAULT_RAG_INDEX_DIR = Path("data/processed/faiss_index")
+_AGENT_MODEL_ENV = "NEUROBRIDGE_AGENT_MODEL"
+_AGENT_DEFAULT_MODEL = "google/gemini-2.0-flash-exp:free"
+def _build_orchestrator():
+    """Construct the default orchestrator. Patchable in tests."""
+    from openai import OpenAI
+    from src.agents.orchestrator import Orchestrator
+    from src.agents.prompts import ORCHESTRATOR_SYSTEM_PROMPT
+    from src.agents.tools import build_default_tools
+    api_key = os.environ.get("OPENROUTER_API_KEY")
+    if not api_key:
+        raise HTTPException(
+            status_code=503,
+            detail="OPENROUTER_API_KEY not set; agent surface unavailable.",
+        )
+    client = OpenAI(
+        base_url="https://openrouter.ai/api/v1",
+        api_key=api_key,
+        timeout=30.0,
+    )
+    rag_dir = _DEFAULT_RAG_INDEX_DIR if _DEFAULT_RAG_INDEX_DIR.exists() else None
+    tools = build_default_tools(rag_index_dir=rag_dir)
+    model = os.environ.get(_AGENT_MODEL_ENV, _AGENT_DEFAULT_MODEL)
+    return Orchestrator(
+        llm_client=client,
+        tools=tools,
+        system_prompt=ORCHESTRATOR_SYSTEM_PROMPT,
+        model=model,
+        max_steps=5,
+    )
+@agent_router.post("/run", response_model=AgentRunResponse)
+def run_agent(req: AgentRunRequest) -> AgentRunResponse:
+    """Run the orchestrator on `user_input`. Picks a pipeline + grounds via RAG."""
+    orch = _build_orchestrator()
+    user_text = req.user_input
+    if req.user_question:
+        user_text = f"{req.user_input}\n\nUser question: {req.user_question}"
+    result = orch.run(user_text)
+    return AgentRunResponse(
+        text=result.text,
+        trace=[
+            AgentToolTraceItem(name=t.name, args=t.args, result=t.result, error=t.error)
+            for t in result.trace
+        ],
+        model=result.model,
+        finish_reason=result.finish_reason,
+    )

src/api/schemas.py CHANGED Viewed

@@ -228,3 +228,27 @@ class RunDiffRow(BaseModel):
 class RunDiffResponse(BaseModel):
     """Response for POST /experiments/diff: side-by-side metric/param diff."""
     rows: list[RunDiffRow]

 class RunDiffResponse(BaseModel):
     """Response for POST /experiments/diff: side-by-side metric/param diff."""
     rows: list[RunDiffRow]
+# --- Agent surface (orchestrator + RAG) ------------------------------------
+class AgentRunRequest(BaseModel):
+    """User input to the orchestrator."""
+    user_input: str = Field(..., min_length=1, description="SMILES, file path, or directory path")
+    user_question: str | None = Field(
+        None, description="Optional natural-language question to language-match the response"
+    )
+class AgentToolTraceItem(BaseModel):
+    name: str
+    args: dict = Field(default_factory=dict)
+    result: dict | None = None
+    error: str | None = None
+class AgentRunResponse(BaseModel):
+    text: str
+    trace: list[AgentToolTraceItem] = Field(default_factory=list)
+    model: str | None = None
+    finish_reason: str = "complete"

src/frontend/app.py CHANGED Viewed

@@ -935,9 +935,9 @@ def _check_api_health() -> tuple[bool, str]:
         return False, type(e).__name__.lower()
-def _post(endpoint: str, payload: dict) -> dict:
     """POST to the FastAPI surface; let httpx raise on non-2xx."""
-    resp = httpx.post(f"{_API_URL}{endpoint}", json=payload, timeout=120.0)
     resp.raise_for_status()
     return resp.json()
@@ -1752,12 +1752,13 @@ def main() -> None:
             "Run `uvicorn src.api.main:app --port 8000` or `docker compose up`."
         )
-    bbb_tab, eeg_tab, mri_tab, assistant_tab, experiments_tab = st.tabs([
         "Molecule",
         "Signal",
         "Image",
         "AI Assistant",
         "Experiments",
     ])
     with bbb_tab:
@@ -1771,6 +1772,55 @@ def main() -> None:
     with experiments_tab:
         _render_experiments_tab()
 if __name__ == "__main__":
     main()

         return False, type(e).__name__.lower()
+def _post(endpoint: str, payload: dict, timeout: float = 120.0) -> dict:
     """POST to the FastAPI surface; let httpx raise on non-2xx."""
+    resp = httpx.post(f"{_API_URL}{endpoint}", json=payload, timeout=timeout)
     resp.raise_for_status()
     return resp.json()
             "Run `uvicorn src.api.main:app --port 8000` or `docker compose up`."
         )
+    bbb_tab, eeg_tab, mri_tab, assistant_tab, experiments_tab, agent_tab = st.tabs([
         "Molecule",
         "Signal",
         "Image",
         "AI Assistant",
         "Experiments",
+        "🤖 Agent",
     ])
     with bbb_tab:
     with experiments_tab:
         _render_experiments_tab()
+    with agent_tab:
+        st.markdown("### Orchestrator Agent")
+        st.caption(
+            "Pick the pipeline automatically, run it, then ground the response "
+            "in curated reference docs (RAG)."
+        )
+        with st.form("agent_form"):
+            agent_input = st.text_input(
+                "Input",
+                value="CCO",
+                help="SMILES (e.g., CCO), .fif/.edf path, or NIfTI directory path",
+            )
+            agent_question = st.text_input(
+                "Question (optional)",
+                value="",
+                help="Ask in any language — the agent will mirror it in the response",
+            )
+            submitted = st.form_submit_button("Run agent")
+        if submitted and agent_input:
+            with st.spinner("Agent is reasoning..."):
+                try:
+                    payload: dict = {"user_input": agent_input}
+                    if agent_question:
+                        payload["user_question"] = agent_question
+                    response = _post("/agent/run", payload, timeout=120.0)
+                except Exception as e:
+                    st.error(f"Agent run failed: {e}")
+                else:
+                    st.markdown("#### Response")
+                    st.write(response.get("text", ""))
+                    st.caption(
+                        f"model: `{response.get('model', '?')}` · "
+                        f"finish: `{response.get('finish_reason', '?')}`"
+                    )
+                    trace = response.get("trace", [])
+                    expander_title = f"🧠 Decision trace ({len(trace)} step{'s' if len(trace) != 1 else ''})"
+                    with st.expander(expander_title, expanded=True):
+                        if not trace:
+                            st.write("_(no tool calls)_")
+                        for i, step in enumerate(trace, start=1):
+                            st.markdown(f"**{i}. `{step['name']}`**")
+                            if step.get("error"):
+                                st.error(step["error"])
+                            else:
+                                st.json(step.get("args", {}))
+                                st.json(step.get("result", {}))
 if __name__ == "__main__":
     main()

src/rag/__init__.py ADDED Viewed

File without changes

src/rag/chunker.py ADDED Viewed

	@@ -0,0 +1,39 @@

+"""Paragraph-aware recursive character splitter for RAG ingestion.
+Public entry: `chunk_text(text, max_chars, overlap)`. Splits on the first
+of [paragraph break, sentence end, newline, space] that fits inside the
+window. Empty / whitespace-only inputs return [].
+"""
+from __future__ import annotations
+_SEPARATORS: tuple[str, ...] = ("\n\n", ". ", "\n", " ")
+def chunk_text(text: str, max_chars: int = 600, overlap: int = 80) -> list[str]:
+    """Split `text` into chunks of at most `max_chars`, with `overlap` carry-over."""
+    text = text.strip()
+    if not text:
+        return []
+    if len(text) <= max_chars:
+        return [text]
+    chunks: list[str] = []
+    start = 0
+    n = len(text)
+    while start < n:
+        end = min(start + max_chars, n)
+        if end < n:
+            # try to land on a clean boundary inside [start, end]
+            for sep in _SEPARATORS:
+                last = text.rfind(sep, start, end)
+                if last > start:
+                    end = last + len(sep)
+                    break
+        chunk = text[start:end].strip()
+        if chunk:
+            chunks.append(chunk)
+        if end >= n:
+            break
+        start = max(start + 1, end - overlap)
+    return chunks

src/rag/embed.py ADDED Viewed

	@@ -0,0 +1,39 @@

+"""Fastembed wrapper — ONNX-based, CPU-only, no torch dep.
+Public entry: `Embedder().encode(texts) -> np.ndarray[N, D]`. Model is
+loaded lazily on first call. Output is float32 to match FAISS's expected
+input dtype.
+"""
+from __future__ import annotations
+import numpy as np
+from src.core.logger import get_logger
+logger = get_logger(__name__)
+# bge-small-en-v1.5: 384-dim, ~33MB ONNX, MTEB top-tier for size class.
+_MODEL_NAME = "BAAI/bge-small-en-v1.5"
+EMBEDDING_DIM = 384
+class Embedder:
+    """Lazy-loaded fastembed wrapper. One instance per process is enough."""
+    def __init__(self, model_name: str = _MODEL_NAME) -> None:
+        self._model_name = model_name
+        self._model = None  # lazy-loaded on first encode()
+    def _ensure_model(self) -> None:
+        if self._model is None:
+            from fastembed import TextEmbedding
+            logger.info("Loading fastembed model %s (one-time)", self._model_name)
+            self._model = TextEmbedding(model_name=self._model_name)
+    def encode(self, texts: list[str]) -> np.ndarray:
+        if not texts:
+            return np.zeros((0, EMBEDDING_DIM), dtype=np.float32)
+        self._ensure_model()
+        embeddings = list(self._model.embed(texts))
+        return np.array(embeddings, dtype=np.float32)

src/rag/ingest.py ADDED Viewed

	@@ -0,0 +1,85 @@

+"""Walk a knowledge-base directory, chunk each file, embed, persist FAISS index.
+CLI entry point: `python -m src.rag.ingest [<input_dir> [<output_dir>]]`.
+Defaults: input=`data/knowledge_base/`, output=`data/processed/faiss_index/`.
+Supported file types: `.md`, `.txt`, `.pdf`. Other extensions are ignored
+with a logged WARNING.
+"""
+from __future__ import annotations
+import sys
+from pathlib import Path
+from src.core.logger import get_logger
+from src.rag.chunker import chunk_text
+from src.rag.embed import EMBEDDING_DIM, Embedder
+from src.rag.store import FAISSStore
+logger = get_logger(__name__)
+_DEFAULT_INPUT = Path("data/knowledge_base")
+_DEFAULT_OUTPUT = Path("data/processed/faiss_index")
+_SUPPORTED = {".md", ".txt", ".pdf"}
+def _read_pdf(path: Path) -> str:
+    from pypdf import PdfReader
+    reader = PdfReader(str(path))
+    return "\n\n".join(page.extract_text() or "" for page in reader.pages)
+def _read_file(path: Path) -> str:
+    suffix = path.suffix.lower()
+    if suffix == ".pdf":
+        return _read_pdf(path)
+    return path.read_text(encoding="utf-8", errors="replace")
+def ingest_directory(input_dir: Path, output_dir: Path) -> int:
+    """Ingest every supported file in `input_dir` into a FAISS index at `output_dir`.
+    Returns the total number of chunks indexed.
+    """
+    input_dir = Path(input_dir)
+    output_dir = Path(output_dir)
+    files = sorted(p for p in input_dir.rglob("*") if p.suffix.lower() in _SUPPORTED)
+    logger.info("Ingesting %d file(s) from %s", len(files), input_dir)
+    all_chunks: list[dict] = []
+    for path in files:
+        try:
+            text = _read_file(path)
+        except Exception as e:
+            logger.warning("Skipping %s (read failed: %s)", path, e)
+            continue
+        for i, ch in enumerate(chunk_text(text)):
+            all_chunks.append({
+                "text": ch,
+                "source": str(path.relative_to(input_dir)),
+                "chunk_index": i,
+            })
+    store = FAISSStore(dim=EMBEDDING_DIM)
+    if all_chunks:
+        embedder = Embedder()
+        vectors = embedder.encode([c["text"] for c in all_chunks])
+        store.add(vectors, all_chunks)
+    store.save(output_dir)
+    logger.info("Indexed %d chunk(s) → %s", len(all_chunks), output_dir)
+    return len(all_chunks)
+def main() -> None:
+    args = sys.argv[1:]
+    inp = Path(args[0]) if len(args) >= 1 else _DEFAULT_INPUT
+    out = Path(args[1]) if len(args) >= 2 else _DEFAULT_OUTPUT
+    ingest_directory(inp, out)
+    # Per-call summary already logged at INFO inside ingest_directory; no print() in src/.
+if __name__ == "__main__":
+    main()

src/rag/retrieve.py ADDED Viewed

	@@ -0,0 +1,40 @@

+"""Query → top-k chunks. Encapsulates the embedder + store pair so callers
+don't have to assemble both. Loads from disk lazily.
+"""
+from __future__ import annotations
+from pathlib import Path
+from src.core.logger import get_logger
+from src.rag.embed import EMBEDDING_DIM, Embedder
+from src.rag.store import FAISSStore
+logger = get_logger(__name__)
+class RAGRetriever:
+    """Bundle (embedder, store). Use `RAGRetriever.load(dir)` to construct."""
+    def __init__(self, store: FAISSStore, embedder: Embedder) -> None:
+        self._store = store
+        self._embedder = embedder
+    @classmethod
+    def load(cls, index_dir: Path) -> "RAGRetriever":
+        store = FAISSStore.load(Path(index_dir), dim=EMBEDDING_DIM)
+        return cls(store=store, embedder=Embedder())
+    def __len__(self) -> int:
+        return len(self._store)
+    def search(self, query: str, k: int = 5) -> list[dict]:
+        """Return up to `k` chunks most relevant to `query`, sorted by score desc.
+        Each chunk dict carries `text`, `source`, `chunk_index`, `score`.
+        Returns [] for empty query or empty store.
+        """
+        if not query.strip() or len(self._store) == 0:
+            return []
+        vec = self._embedder.encode([query])
+        hits = self._store.search(vec[0], k=k)
+        return [{**chunk, "score": score} for chunk, score in hits]

src/rag/store.py ADDED Viewed

	@@ -0,0 +1,66 @@

+"""FAISS vector store with parallel chunk metadata.
+Public entry: `FAISSStore(dim)`. Vectors are L2-normalized on add and
+search so inner-product == cosine similarity. Chunks are arbitrary dicts;
+`text` and `source` keys are recommended but not enforced.
+"""
+from __future__ import annotations
+import json
+from pathlib import Path
+from typing import Any
+import faiss
+import numpy as np
+class FAISSStore:
+    """Inner-product (cosine after L2-norm) FAISS store with chunk metadata."""
+    def __init__(self, dim: int) -> None:
+        self.dim = dim
+        self._index: faiss.Index = faiss.IndexFlatIP(dim)
+        self._chunks: list[dict[str, Any]] = []
+    def __len__(self) -> int:
+        return len(self._chunks)
+    def add(self, vectors: np.ndarray, chunks: list[dict[str, Any]]) -> None:
+        if vectors.shape[0] != len(chunks):
+            raise ValueError(
+                f"size mismatch: {vectors.shape[0]} vectors vs {len(chunks)} chunks"
+            )
+        if vectors.shape[0] == 0:
+            return
+        v = np.array(vectors, dtype=np.float32, copy=True)
+        faiss.normalize_L2(v)
+        self._index.add(v)
+        self._chunks.extend(chunks)
+    def search(self, query: np.ndarray, k: int = 5) -> list[tuple[dict[str, Any], float]]:
+        if len(self._chunks) == 0:
+            return []
+        q = np.array(query, dtype=np.float32, copy=True)
+        if q.ndim == 1:
+            q = q[np.newaxis, :]
+        faiss.normalize_L2(q)
+        k = min(k, len(self._chunks))
+        scores, idx = self._index.search(q, k)
+        out: list[tuple[dict[str, Any], float]] = []
+        for i, s in zip(idx[0], scores[0]):
+            if i == -1:
+                continue
+            out.append((self._chunks[int(i)], float(s)))
+        return out
+    def save(self, dir_path: Path) -> None:
+        dir_path.mkdir(parents=True, exist_ok=True)
+        faiss.write_index(self._index, str(dir_path / "index.bin"))
+        (dir_path / "chunks.json").write_text(json.dumps(self._chunks, indent=2))
+    @classmethod
+    def load(cls, dir_path: Path, dim: int) -> "FAISSStore":
+        store = cls(dim=dim)
+        store._index = faiss.read_index(str(dir_path / "index.bin"))
+        store._chunks = json.loads((dir_path / "chunks.json").read_text())
+        return store

tests/agents/__init__.py ADDED Viewed

File without changes

tests/agents/test_agent_route.py ADDED Viewed

	@@ -0,0 +1,54 @@

+"""Tests for POST /agent/run — uses a stub orchestrator factory."""
+from __future__ import annotations
+from typing import Any
+from unittest.mock import patch
+import pytest
+from fastapi.testclient import TestClient
+from src.agents.schemas import AgentResult, ToolTraceItem
+from src.api.main import app
+client = TestClient(app)
+class _FakeOrchestrator:
+    """Returns a canned AgentResult; ignores input."""
+    def __init__(self, *args: Any, **kwargs: Any) -> None:
+        pass
+    def run(self, user_input: str) -> AgentResult:
+        return AgentResult(
+            text=f"Synthesized answer for: {user_input}",
+            trace=[
+                ToolTraceItem(name="run_bbb_pipeline", args={"smiles": user_input},
+                              result={"label": 1, "label_text": "permeable"}),
+                ToolTraceItem(name="retrieve_context", args={"query": "BBB"},
+                              result={"chunks": []}),
+            ],
+            model="stub-model",
+            finish_reason="complete",
+        )
+class TestAgentRoute:
+    def test_post_returns_synthesized_text_and_trace(self) -> None:
+        with patch("src.api.routes._build_orchestrator", return_value=_FakeOrchestrator()):
+            r = client.post("/agent/run", json={"user_input": "CCO"})
+        assert r.status_code == 200
+        body = r.json()
+        assert "Synthesized answer for: CCO" in body["text"]
+        assert len(body["trace"]) == 2
+        assert body["trace"][0]["name"] == "run_bbb_pipeline"
+        assert body["model"] == "stub-model"
+        assert body["finish_reason"] == "complete"
+    def test_empty_user_input_422(self) -> None:
+        r = client.post("/agent/run", json={"user_input": ""})
+        assert r.status_code == 422
+    def test_missing_user_input_422(self) -> None:
+        r = client.post("/agent/run", json={})
+        assert r.status_code == 422

tests/agents/test_orchestrator.py ADDED Viewed

	@@ -0,0 +1,161 @@

+"""Tests for src.agents.orchestrator — agent loop with stubbed LLM client.
+We do NOT hit OpenRouter here. We construct a fake client that returns
+scripted tool-call responses, then verify the orchestrator dispatches
+tools and assembles the trace correctly.
+"""
+from __future__ import annotations
+import json
+from typing import Any
+from unittest.mock import MagicMock
+import pytest
+from pydantic import BaseModel
+from src.agents.orchestrator import Orchestrator
+from src.agents.tools import Tool
+# --- Helpers ----------------------------------------------------------------
+def _fake_choice_with_tool_call(name: str, args: dict[str, Any], call_id: str = "c1") -> Any:
+    msg = MagicMock()
+    msg.content = None
+    tc = MagicMock()
+    tc.id = call_id
+    tc.function.name = name
+    tc.function.arguments = json.dumps(args)
+    tc.model_dump = MagicMock(return_value={"id": call_id, "type": "function",
+                                            "function": {"name": name,
+                                                         "arguments": json.dumps(args)}})
+    msg.tool_calls = [tc]
+    choice = MagicMock()
+    choice.message = msg
+    response = MagicMock()
+    response.choices = [choice]
+    return response
+def _fake_choice_with_text(text: str) -> Any:
+    msg = MagicMock()
+    msg.content = text
+    msg.tool_calls = None
+    choice = MagicMock()
+    choice.message = msg
+    response = MagicMock()
+    response.choices = [choice]
+    return response
+class _PingInput(BaseModel):
+    msg: str
+class _PingOutput(BaseModel):
+    echo: str
+def _make_ping_tool() -> Tool:
+    return Tool(
+        name="ping",
+        description="Echo a string back.",
+        input_model=_PingInput,
+        output_model=_PingOutput,
+        execute=lambda inp: _PingOutput(echo=f"pong:{inp.msg}"),
+    )
+# --- Tests ------------------------------------------------------------------
+class TestOrchestrator:
+    def test_single_tool_then_text_response(self) -> None:
+        client = MagicMock()
+        client.chat.completions.create.side_effect = [
+            _fake_choice_with_tool_call("ping", {"msg": "hello"}),
+            _fake_choice_with_text("All done."),
+        ]
+        orch = Orchestrator(
+            llm_client=client,
+            tools=[_make_ping_tool()],
+            system_prompt="sys",
+            model="stub-model",
+            max_steps=4,
+        )
+        result = orch.run("test input")
+        assert result.text == "All done."
+        assert result.finish_reason == "complete"
+        assert len(result.trace) == 1
+        assert result.trace[0].name == "ping"
+        assert result.trace[0].args == {"msg": "hello"}
+        assert result.trace[0].result == {"echo": "pong:hello"}
+    def test_unknown_tool_recorded_as_error(self) -> None:
+        client = MagicMock()
+        client.chat.completions.create.side_effect = [
+            _fake_choice_with_tool_call("nonexistent_tool", {"x": 1}),
+            _fake_choice_with_text("Done."),
+        ]
+        orch = Orchestrator(
+            llm_client=client,
+            tools=[_make_ping_tool()],
+            system_prompt="sys",
+            model="stub-model",
+            max_steps=4,
+        )
+        result = orch.run("test")
+        assert result.trace[0].error is not None
+        assert "unknown tool" in result.trace[0].error
+        assert result.text == "Done."
+    def test_invalid_tool_args_recorded_as_error(self) -> None:
+        client = MagicMock()
+        client.chat.completions.create.side_effect = [
+            _fake_choice_with_tool_call("ping", {"wrong_field": "x"}),
+            _fake_choice_with_text("Recovered."),
+        ]
+        orch = Orchestrator(
+            llm_client=client,
+            tools=[_make_ping_tool()],
+            system_prompt="sys",
+            model="stub-model",
+            max_steps=4,
+        )
+        result = orch.run("test")
+        assert result.trace[0].error is not None
+        assert result.text == "Recovered."
+    def test_max_steps_exhausted_returns_finish_reason(self) -> None:
+        client = MagicMock()
+        # Always return another tool call — never terminates with text
+        client.chat.completions.create.side_effect = [
+            _fake_choice_with_tool_call("ping", {"msg": f"{i}"}, call_id=f"c{i}")
+            for i in range(10)
+        ]
+        orch = Orchestrator(
+            llm_client=client,
+            tools=[_make_ping_tool()],
+            system_prompt="sys",
+            model="stub-model",
+            max_steps=3,
+        )
+        result = orch.run("test")
+        assert result.finish_reason == "max_steps"
+        assert len(result.trace) == 3
+    def test_first_response_is_text_no_tools(self) -> None:
+        client = MagicMock()
+        client.chat.completions.create.side_effect = [
+            _fake_choice_with_text("Direct answer."),
+        ]
+        orch = Orchestrator(
+            llm_client=client,
+            tools=[_make_ping_tool()],
+            system_prompt="sys",
+            model="stub-model",
+        )
+        result = orch.run("trivial input")
+        assert result.text == "Direct answer."
+        assert result.trace == []

tests/agents/test_orchestrator_live.py ADDED Viewed

	@@ -0,0 +1,74 @@

+"""Live integration test — hits real OpenRouter, picks pipeline, retrieves chunks.
+Skipped unless BOTH OPENROUTER_API_KEY is set AND the BBB model artifact
+is built (the `run_bbb_pipeline` tool can't run without it). Marked `slow`
+(network round-trips).
+The dual gate matters because src/llm/explainer.py auto-loads .env at
+import time; without the model-artifact gate, this test would attempt a
+real OpenRouter call in CI/dev and then fail because the BBB tool can't
+execute. In the deployed Docker image both conditions are satisfied
+(secret + build-time training).
+"""
+from __future__ import annotations
+import os
+from pathlib import Path
+import pytest
+from openai import OpenAI
+from src.agents.orchestrator import Orchestrator
+from src.agents.prompts import ORCHESTRATOR_SYSTEM_PROMPT
+from src.agents.tools import build_default_tools
+from src.rag.ingest import ingest_directory
+_FIXTURE_KB = Path(__file__).parent.parent / "fixtures" / "kb_sample"
+_DEFAULT_MODEL = "google/gemini-2.0-flash-exp:free"
+_FALLBACK_MODEL = "anthropic/claude-haiku-4-5"
+_BBB_MODEL_PATH = Path(
+    os.environ.get("BBB_MODEL_PATH", "data/processed/bbb_model.joblib")
+)
+@pytest.mark.slow
+@pytest.mark.skipif(
+    not os.environ.get("OPENROUTER_API_KEY"),
+    reason="OPENROUTER_API_KEY not set",
+)
+@pytest.mark.skipif(
+    not _BBB_MODEL_PATH.exists(),
+    reason=f"BBB model artifact missing at {_BBB_MODEL_PATH} — run python -m src.models.bbb_model",
+)
+class TestOrchestratorLive:
+    @pytest.fixture(scope="class")
+    def rag_dir(self, tmp_path_factory: pytest.TempPathFactory) -> Path:
+        d = tmp_path_factory.mktemp("rag_live")
+        ingest_directory(_FIXTURE_KB, d)
+        return d
+    @pytest.fixture(scope="class")
+    def client(self) -> OpenAI:
+        return OpenAI(
+            base_url="https://openrouter.ai/api/v1",
+            api_key=os.environ["OPENROUTER_API_KEY"],
+            timeout=30.0,
+        )
+    def test_smiles_input_picks_bbb_then_retrieves(self, client: OpenAI, rag_dir: Path) -> None:
+        tools = build_default_tools(rag_index_dir=rag_dir)
+        orch = Orchestrator(
+            llm_client=client,
+            tools=tools,
+            system_prompt=ORCHESTRATOR_SYSTEM_PROMPT,
+            model=os.environ.get("NEUROBRIDGE_AGENT_MODEL", _DEFAULT_MODEL),
+            max_steps=5,
+        )
+        result = orch.run("CCO")
+        # Soft assertions — model behavior varies but the workflow shape is fixed.
+        assert result.finish_reason == "complete", f"got {result.finish_reason}, trace={result.trace}"
+        tool_names = [t.name for t in result.trace]
+        assert "run_bbb_pipeline" in tool_names, f"BBB pipeline not called; trace={tool_names}"
+        assert "retrieve_context" in tool_names, f"RAG not called; trace={tool_names}"
+        assert result.text, "empty final text"

tests/agents/test_tools.py ADDED Viewed

	@@ -0,0 +1,128 @@

+"""Tests for src.agents.tools — Tool dataclass + registry + 4 tool wrappers."""
+from __future__ import annotations
+from pathlib import Path
+import pytest
+from pydantic import BaseModel
+from src.agents.tools import (
+    Tool,
+    build_default_tools,
+    BBBPipelineInput,
+    EEGPipelineInput,
+    MRIPipelineInput,
+    RetrieveContextInput,
+)
+class _DummyInput(BaseModel):
+    x: int
+    y: str = "default"
+class _DummyOutput(BaseModel):
+    result: int
+class TestTool:
+    def test_openai_schema_shape(self) -> None:
+        tool = Tool(
+            name="dummy",
+            description="A dummy tool",
+            input_model=_DummyInput,
+            output_model=_DummyOutput,
+            execute=lambda inp: _DummyOutput(result=inp.x * 2),
+        )
+        schema = tool.openai_schema()
+        assert schema["type"] == "function"
+        assert schema["function"]["name"] == "dummy"
+        assert schema["function"]["description"] == "A dummy tool"
+        params = schema["function"]["parameters"]
+        assert params["type"] == "object"
+        assert "x" in params["properties"]
+        assert "x" in params["required"]
+        assert "y" not in params["required"]  # has default
+    def test_invoke_validates_and_returns_dict(self) -> None:
+        tool = Tool(
+            name="dummy",
+            description="d",
+            input_model=_DummyInput,
+            output_model=_DummyOutput,
+            execute=lambda inp: _DummyOutput(result=inp.x * 2),
+        )
+        out = tool.invoke({"x": 5})
+        assert out == {"result": 10}
+    def test_invoke_invalid_input_raises(self) -> None:
+        tool = Tool(
+            name="dummy",
+            description="d",
+            input_model=_DummyInput,
+            output_model=_DummyOutput,
+            execute=lambda inp: _DummyOutput(result=inp.x * 2),
+        )
+        with pytest.raises(ValueError, match="invalid input"):
+            tool.invoke({"y": "missing-x"})
+class TestBuildDefaultTools:
+    def test_default_set_has_four_tools(self, tmp_path: Path) -> None:
+        # build with placeholder paths; tools won't be invoked here
+        tools = build_default_tools(rag_index_dir=None)
+        names = {t.name for t in tools}
+        assert names == {
+            "run_bbb_pipeline",
+            "run_eeg_pipeline",
+            "run_mri_pipeline",
+            "retrieve_context",
+        }
+    def test_each_tool_has_pydantic_input_model(self) -> None:
+        tools = build_default_tools(rag_index_dir=None)
+        for t in tools:
+            assert issubclass(t.input_model, BaseModel)
+            assert issubclass(t.output_model, BaseModel)
+    def test_input_models_have_smiles_paths(self) -> None:
+        # verify the field names downstream system prompt depends on
+        assert "smiles" in BBBPipelineInput.model_fields
+        assert "input_path" in EEGPipelineInput.model_fields
+        assert "input_dir" in MRIPipelineInput.model_fields
+        assert "sites_csv" in MRIPipelineInput.model_fields
+        assert "query" in RetrieveContextInput.model_fields
+        assert "k" in RetrieveContextInput.model_fields
+    def test_retrieve_context_short_circuits_when_no_index(self) -> None:
+        tools = build_default_tools(rag_index_dir=None)
+        retrieve = next(t for t in tools if t.name == "retrieve_context")
+        out = retrieve.invoke({"query": "anything", "k": 3})
+        assert out == {"query": "anything", "chunks": []}
+    def test_processed_dir_parameter_threads_to_executors(self, tmp_path: Path) -> None:
+        # build_default_tools should accept processed_dir; executors should
+        # eventually write under it (we don't invoke the pipelines here, just
+        # verify the parameter is accepted and tools are built).
+        tools = build_default_tools(rag_index_dir=None, processed_dir=tmp_path)
+        names = {t.name for t in tools}
+        assert "run_eeg_pipeline" in names
+        assert "run_mri_pipeline" in names
+    def test_default_processed_dir_when_omitted(self) -> None:
+        # backwards-compat: omitting processed_dir keeps existing behavior
+        tools = build_default_tools(rag_index_dir=None)
+        # just ensure no exception and 4 tools returned
+        assert len(tools) == 4
+    def test_bbb_executor_translates_httpexception_to_valueerror(self) -> None:
+        from unittest.mock import patch
+        from fastapi import HTTPException
+        tools = build_default_tools(rag_index_dir=None)
+        bbb = next(t for t in tools if t.name == "run_bbb_pipeline")
+        with patch("src.api.routes.predict_bbb",
+                   side_effect=HTTPException(status_code=503, detail="model missing")):
+            with pytest.raises(ValueError, match="bbb tool failed"):
+                bbb.invoke({"smiles": "CCO"})

tests/fixtures/kb_sample/combat_harmonization_primer.md ADDED Viewed

	@@ -0,0 +1,27 @@

+# ComBat Harmonization for Multi-Site Neuroimaging
+ComBat (Johnson et al. 2007, adapted to MRI by Fortin et al. 2017, 2018)
+is the de-facto standard for removing scanner / acquisition-site bias
+from multi-center neuroimaging studies.
+## How it works
+ComBat models per-site location (mean) and scale (variance) parameters
+using an empirical-Bayes hierarchical framework. It estimates these
+parameters jointly across all sites and shrinks them toward a global
+prior — small-N sites are pulled toward the global mean, preventing
+overfitting.
+## Site-gap reduction
+A typical demonstration: the per-site mean of a hippocampus volume
+feature can vary by 5+ standard deviations across hospitals. ComBat
+typically collapses this gap to <0.005 — a 1000x+ reduction — while
+preserving within-site biological variance (age, sex, diagnosis).
+## When it fails
+ComBat requires at least 2 sites with overlapping covariate
+distributions. Single-site data, or sites with completely disjoint
+populations (e.g., one site only-pediatric, another only-elderly),
+produce unreliable harmonization.

tests/fixtures/kb_sample/lipinski_rule_of_five.md ADDED Viewed

	@@ -0,0 +1,30 @@

+# Lipinski's Rule of Five — BBB Permeability Heuristic
+Lipinski's Rule of Five (Lipinski 1997, 2001) is the foundational
+medicinal-chemistry rule for predicting whether a small molecule will
+cross the blood-brain barrier (BBB) by passive diffusion.
+## The four criteria
+A molecule is likely BBB-permeable if it satisfies all four:
+1. Molecular weight (MW) <= 500 Daltons
+2. Octanol-water partition coefficient (logP) <= 5
+3. Hydrogen-bond donors <= 5
+4. Hydrogen-bond acceptors <= 10
+Molecules violating two or more criteria are typically poorly absorbed
+or impermeant.
+## Why ethanol crosses
+Ethanol (CCO) has MW=46 Da, logP=-0.31, 1 H-bond donor, 1 H-bond
+acceptor — well within all four thresholds. This explains its rapid
+CNS penetration despite hydrophilicity.
+## SHAP attribution interpretation
+When a Random Forest BBB classifier flags Morgan fingerprint bits with
+positive SHAP values toward a "permeable" label, the bit usually
+corresponds to a small lipophilic substructure (CH3-, -OCH3-, aromatic
+ring) consistent with Lipinski compliance.

tests/fixtures/kb_sample/mne_ica_basics.md ADDED Viewed

	@@ -0,0 +1,29 @@

+# MNE-Python ICA for EEG Artifact Removal
+Independent Component Analysis (ICA, Hyvärinen 1999) decomposes a
+multi-channel EEG recording into statistically independent source
+components. It is the de-facto method for removing eye-blink and
+heartbeat artifacts before downstream analysis.
+## Why ICA, not PCA
+PCA decomposes signals into orthogonal components — but neural sources
+are not orthogonal in scalp space, they are statistically independent.
+ICA's independence assumption matches the physics: the eye, the heart,
+and cortical sources fire on uncorrelated schedules.
+## The standard workflow
+1. Bandpass the raw recording at 0.5-40 Hz to remove DC drift and line
+   noise (50/60 Hz).
+2. Fit ICA with N components (typically 15-30, less than channel count).
+3. Identify artifact components by correlating each ICA source with the
+   EOG (eye) channel; reject components with |correlation| > 0.5.
+4. Reconstruct the cleaned signal by zeroing out the rejected
+   components and inverse-transforming.
+## Quality check
+Post-ICA, the EOG channel should show minimal residual correlation
+with frontal channels (Fp1/Fp2). If it doesn't, the ICA fit was likely
+unstable — re-run with a different random seed or more components.

tests/rag/__init__.py ADDED Viewed

File without changes

tests/rag/test_chunker.py ADDED Viewed

	@@ -0,0 +1,40 @@

+"""Tests for src.rag.chunker — paragraph-aware character splitter."""
+from __future__ import annotations
+import pytest
+from src.rag.chunker import chunk_text
+class TestChunkText:
+    def test_short_text_returns_single_chunk(self) -> None:
+        out = chunk_text("hello world", max_chars=100, overlap=10)
+        assert out == ["hello world"]
+    def test_empty_text_returns_empty_list(self) -> None:
+        assert chunk_text("", max_chars=100, overlap=10) == []
+        assert chunk_text("   \n\n  ", max_chars=100, overlap=10) == []
+    def test_long_text_splits_into_multiple_chunks(self) -> None:
+        text = "a" * 250
+        out = chunk_text(text, max_chars=100, overlap=10)
+        assert len(out) >= 3
+        # every chunk respects max_chars
+        for c in out:
+            assert len(c) <= 100
+    def test_overlap_between_chunks(self) -> None:
+        text = "abcdefghij" * 30  # 300 chars, no natural break
+        out = chunk_text(text, max_chars=100, overlap=20)
+        # consecutive chunks share at least some characters
+        for i in range(len(out) - 1):
+            assert out[i][-10:] in out[i + 1] or out[i + 1][:10] in out[i]
+    def test_paragraph_boundary_preferred(self) -> None:
+        # First paragraph fits, second doesn't — split at \n\n
+        para_a = "First paragraph content."
+        para_b = "Second paragraph content " * 10
+        text = f"{para_a}\n\n{para_b}"
+        out = chunk_text(text, max_chars=100, overlap=10)
+        # first chunk should end at the paragraph boundary, not mid-word
+        assert para_a in out[0]

tests/rag/test_embed.py ADDED Viewed

	@@ -0,0 +1,42 @@

+"""Tests for src.rag.embed — fastembed wrapper."""
+from __future__ import annotations
+import numpy as np
+import pytest
+from src.rag.embed import Embedder, EMBEDDING_DIM
+class TestEmbedder:
+    @pytest.fixture(scope="class")
+    def embedder(self) -> Embedder:
+        return Embedder()
+    def test_dim_constant_matches_model(self, embedder: Embedder) -> None:
+        out = embedder.encode(["hello"])
+        assert out.shape == (1, EMBEDDING_DIM)
+    def test_batch_encoding(self, embedder: Embedder) -> None:
+        out = embedder.encode(["hello", "world", "blood-brain barrier"])
+        assert out.shape == (3, EMBEDDING_DIM)
+        assert out.dtype == np.float32
+    def test_empty_list_returns_empty_array(self, embedder: Embedder) -> None:
+        out = embedder.encode([])
+        assert out.shape == (0, EMBEDDING_DIM)
+    def test_similar_strings_have_higher_similarity_than_dissimilar(
+        self, embedder: Embedder
+    ) -> None:
+        vecs = embedder.encode([
+            "blood-brain barrier permeability",
+            "BBB drug penetration",
+            "MRI multi-site harmonization",
+        ])
+        # cosine similarity (vectors should be normalized for stable comparison)
+        from numpy.linalg import norm
+        def cos(a, b):
+            return float(np.dot(a, b) / (norm(a) * norm(b)))
+        sim_ab = cos(vecs[0], vecs[1])
+        sim_ac = cos(vecs[0], vecs[2])
+        assert sim_ab > sim_ac, f"Expected BBB-related strings closer; got {sim_ab=} vs {sim_ac=}"

tests/rag/test_ingest.py ADDED Viewed

	@@ -0,0 +1,40 @@

+"""Tests for src.rag.ingest — walk a directory, chunk, embed, persist."""
+from __future__ import annotations
+import shutil
+from pathlib import Path
+import pytest
+from src.rag.ingest import ingest_directory
+from src.rag.store import FAISSStore
+_FIXTURE_KB = Path(__file__).parent.parent / "fixtures" / "kb_sample"
+class TestIngestDirectory:
+    def test_ingests_markdown_files(self, tmp_path: Path) -> None:
+        out_dir = tmp_path / "idx"
+        n = ingest_directory(_FIXTURE_KB, out_dir)
+        assert n > 0  # at least one chunk per fixture file
+        assert (out_dir / "index.bin").exists()
+        assert (out_dir / "chunks.json").exists()
+    def test_loaded_store_is_searchable(self, tmp_path: Path) -> None:
+        out_dir = tmp_path / "idx"
+        ingest_directory(_FIXTURE_KB, out_dir)
+        from src.rag.embed import EMBEDDING_DIM
+        store = FAISSStore.load(out_dir, dim=EMBEDDING_DIM)
+        assert len(store) > 0
+        # chunks have source metadata
+        assert all("source" in c for c in store._chunks)
+        assert all("text" in c for c in store._chunks)
+    def test_empty_directory_creates_empty_index(self, tmp_path: Path) -> None:
+        empty = tmp_path / "empty_kb"
+        empty.mkdir()
+        out_dir = tmp_path / "idx"
+        n = ingest_directory(empty, out_dir)
+        assert n == 0
+        assert (out_dir / "index.bin").exists()

tests/rag/test_retrieve.py ADDED Viewed

	@@ -0,0 +1,45 @@

+"""Tests for src.rag.retrieve — query → top-k chunks."""
+from __future__ import annotations
+from pathlib import Path
+import pytest
+from src.rag.ingest import ingest_directory
+from src.rag.retrieve import RAGRetriever
+_FIXTURE_KB = Path(__file__).parent.parent / "fixtures" / "kb_sample"
+class TestRAGRetriever:
+    @pytest.fixture(scope="class")
+    def retriever(self, tmp_path_factory: pytest.TempPathFactory) -> RAGRetriever:
+        idx_dir = tmp_path_factory.mktemp("rag_idx")
+        ingest_directory(_FIXTURE_KB, idx_dir)
+        return RAGRetriever.load(idx_dir)
+    def test_bbb_query_returns_lipinski_chunk(self, retriever: RAGRetriever) -> None:
+        hits = retriever.search("Why does ethanol cross the blood-brain barrier?", k=3)
+        assert len(hits) == 3
+        sources = [h["source"] for h in hits]
+        assert "lipinski_rule_of_five.md" in sources
+        # top hit should be from lipinski
+        assert hits[0]["source"] == "lipinski_rule_of_five.md"
+    def test_combat_query_returns_combat_chunk(self, retriever: RAGRetriever) -> None:
+        hits = retriever.search("How does ComBat remove scanner bias from MRI data?", k=2)
+        assert hits[0]["source"] == "combat_harmonization_primer.md"
+    def test_eeg_query_returns_ica_chunk(self, retriever: RAGRetriever) -> None:
+        hits = retriever.search("How do you remove eye blink artifacts from EEG?", k=2)
+        assert hits[0]["source"] == "mne_ica_basics.md"
+    def test_search_includes_score_and_text(self, retriever: RAGRetriever) -> None:
+        hits = retriever.search("BBB permeability", k=1)
+        h = hits[0]
+        assert "text" in h
+        assert "source" in h
+        assert "score" in h
+        assert isinstance(h["score"], float)
+        assert 0.0 <= h["score"] <= 1.0

tests/rag/test_store.py ADDED Viewed

	@@ -0,0 +1,70 @@

+"""Tests for src.rag.store — FAISS vector store with metadata."""
+from __future__ import annotations
+from pathlib import Path
+import numpy as np
+import pytest
+from src.rag.store import FAISSStore
+def _rand_vecs(n: int, d: int = 4, seed: int = 0) -> np.ndarray:
+    rng = np.random.default_rng(seed)
+    return rng.standard_normal((n, d), dtype=np.float32)
+class TestFAISSStore:
+    def test_add_then_search(self) -> None:
+        store = FAISSStore(dim=4)
+        vecs = _rand_vecs(3)
+        chunks = [{"text": f"chunk-{i}", "source": "test.md"} for i in range(3)]
+        store.add(vecs, chunks)
+        results = store.search(vecs[0], k=2)
+        assert len(results) == 2
+        # the closest hit is the chunk we used as the query (cosine ~1.0)
+        top_chunk, top_score = results[0]
+        assert top_chunk["text"] == "chunk-0"
+        assert top_score > 0.99
+    def test_add_size_mismatch_raises(self) -> None:
+        store = FAISSStore(dim=4)
+        with pytest.raises(ValueError, match="size mismatch"):
+            store.add(_rand_vecs(3), [{"text": "only-one"}])
+    def test_search_k_larger_than_corpus(self) -> None:
+        store = FAISSStore(dim=4)
+        store.add(_rand_vecs(2), [{"text": f"c{i}"} for i in range(2)])
+        results = store.search(_rand_vecs(1)[0], k=10)
+        assert len(results) == 2
+    def test_save_load_roundtrip(self, tmp_path: Path) -> None:
+        store = FAISSStore(dim=4)
+        vecs = _rand_vecs(3)
+        chunks = [{"text": f"chunk-{i}", "source": "test.md"} for i in range(3)]
+        store.add(vecs, chunks)
+        store.save(tmp_path / "idx")
+        restored = FAISSStore.load(tmp_path / "idx", dim=4)
+        results = restored.search(vecs[0], k=1)
+        assert results[0][0]["text"] == "chunk-0"
+    def test_search_on_empty_store_returns_empty(self) -> None:
+        store = FAISSStore(dim=4)
+        assert store.search(_rand_vecs(1)[0], k=5) == []
+    def test_add_does_not_mutate_caller_vectors(self) -> None:
+        store = FAISSStore(dim=4)
+        vecs = _rand_vecs(3)
+        original = vecs.copy()
+        store.add(vecs, [{"text": f"c{i}"} for i in range(3)])
+        # Caller's array must be unchanged after add() (faiss.normalize_L2 is in-place)
+        assert np.allclose(vecs, original), "store.add() mutated caller's vectors"
+    def test_search_does_not_mutate_caller_query(self) -> None:
+        store = FAISSStore(dim=4)
+        store.add(_rand_vecs(3), [{"text": f"c{i}"} for i in range(3)])
+        query = _rand_vecs(1)[0]
+        original_query = query.copy()
+        store.search(query, k=2)
+        assert np.allclose(query, original_query), "store.search() mutated caller's query"