chore: add real-llm-rationale plan + ignore .worktrees/
Browse filesPlan covers dropping the template-only fallback in favor of real
OpenRouter LLM calls, with template kept as outage fallback only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
.gitignore
CHANGED
|
@@ -28,6 +28,7 @@ mlartifacts/
|
|
| 28 |
|
| 29 |
# Claude Code / agent tooling
|
| 30 |
.sixth/
|
|
|
|
| 31 |
|
| 32 |
# IDE
|
| 33 |
.idea/
|
|
|
|
| 28 |
|
| 29 |
# Claude Code / agent tooling
|
| 30 |
.sixth/
|
| 31 |
+
.worktrees/
|
| 32 |
|
| 33 |
# IDE
|
| 34 |
.idea/
|
docs/superpowers/plans/2026-05-02-real-llm-rationale.md
ADDED
|
@@ -0,0 +1,634 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Real-LLM Rationale (drop the template-only fallback) Implementation Plan
|
| 2 |
+
|
| 3 |
+
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
| 4 |
+
|
| 5 |
+
**Goal:** Make `POST /explain/{bbb,eeg,mri}` return `source="llm"` end-to-end against OpenRouter, instead of the deterministic template — without removing the template (it stays as a true outage fallback, not the everyday path).
|
| 6 |
+
|
| 7 |
+
**Architecture:** The explainer at [src/llm/explainer.py](src/llm/explainer.py) already has both paths; the LLM path is silently failing because (a) the configured `OPENROUTER_API_KEY` returns **401 on every model** today, (b) the `_DEFAULT_FREE_MODEL_CHAIN` lists a mix of speculative IDs that may not exist on OpenRouter, and (c) `401` (auth) is not classified as a fatal-not-recoverable error — it's swept into the generic "fall back to template" branch and a tester would never know auth was the cause. We fix auth → trim the chain to verified-live free models → add an explicit 401/400 short-circuit → add one network-gated integration test that proves an end-to-end LLM call returns `source="llm"`.
|
| 8 |
+
|
| 9 |
+
**Tech Stack:** Python 3.12, `openai==1.51.0` (OpenRouter-compatible), pytest 8.x, FastAPI 0.115, Streamlit 1.x.
|
| 10 |
+
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
## Pre-flight
|
| 14 |
+
|
| 15 |
+
- This plan modifies a single module ([src/llm/explainer.py](src/llm/explainer.py)) plus its test file ([tests/llm/test_explainer.py](tests/llm/test_explainer.py)) and adds a one-shot diagnostic script. Blast radius is small; a feature branch in the current working tree is sufficient. Worktree isolation (`superpowers:using-git-worktrees`) is **not required** unless the engineer wants to keep `main` clean while polling OpenRouter for live model IDs.
|
| 16 |
+
- The user has indicated `.env` already holds an `OPENROUTER_API_KEY`. Task 1 verifies whether that key still works against any model. If every probe returns 401, the engineer must reach the user before proceeding past Task 1 — code changes can't fix an unauthorized key.
|
| 17 |
+
- Test discipline: the deterministic template path is the **source of truth** for unit tests (per the existing module docstring). The LLM path is exercised by **one** opt-in network-gated test that auto-skips when `OPENROUTER_API_KEY` is missing. Do not mock the OpenAI SDK at the unit-test layer for the new integration test — that defeats its purpose.
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
## File Structure
|
| 22 |
+
|
| 23 |
+
| File | Status | Responsibility |
|
| 24 |
+
|---|---|---|
|
| 25 |
+
| [src/llm/explainer.py](src/llm/explainer.py) | Modify | Trim model chain; classify 401/400 explicitly; surface auth-failure log at WARNING with actionable hint |
|
| 26 |
+
| [tests/llm/test_explainer.py](tests/llm/test_explainer.py) | Modify | Add unit tests for new 401/400 classifier + one network-gated end-to-end LLM test |
|
| 27 |
+
| `scripts/diagnose_openrouter.py` | Create | One-shot probe that lists which free model IDs respond OK vs 401/404 — used in Task 1 and again in Task 7 to re-confirm the chain |
|
| 28 |
+
|
| 29 |
+
---
|
| 30 |
+
|
| 31 |
+
### Task 1: Diagnose the live OpenRouter free-tier surface
|
| 32 |
+
|
| 33 |
+
**Files:**
|
| 34 |
+
- Create: `scripts/diagnose_openrouter.py`
|
| 35 |
+
|
| 36 |
+
This task produces the empirical evidence later tasks depend on. **Do not skip** — the current chain in [src/llm/explainer.py:62-73](src/llm/explainer.py#L62-L73) lists IDs like `inclusionai/ling-2.6-1t:free` that may never have existed on OpenRouter. We need the real list before we can trim.
|
| 37 |
+
|
| 38 |
+
- [ ] **Step 1: Create the diagnostic script**
|
| 39 |
+
|
| 40 |
+
```python
|
| 41 |
+
# scripts/diagnose_openrouter.py
|
| 42 |
+
"""Probe OpenRouter for which free-tier model IDs are reachable today.
|
| 43 |
+
|
| 44 |
+
Reads OPENROUTER_API_KEY from .env (or process env). Issues a single
|
| 45 |
+
8-token chat completion against a candidate list and prints one line per
|
| 46 |
+
model: status (OK / HTTP-code / exception name) + a 30-char preview of
|
| 47 |
+
the response when OK.
|
| 48 |
+
|
| 49 |
+
Use:
|
| 50 |
+
python scripts/diagnose_openrouter.py
|
| 51 |
+
|
| 52 |
+
Or to probe a custom list:
|
| 53 |
+
python scripts/diagnose_openrouter.py google/gemma-2-9b-it:free meta-llama/llama-3.2-3b-instruct:free
|
| 54 |
+
"""
|
| 55 |
+
from __future__ import annotations
|
| 56 |
+
|
| 57 |
+
import os
|
| 58 |
+
import sys
|
| 59 |
+
from pathlib import Path
|
| 60 |
+
|
| 61 |
+
# Manually parse .env without python-dotenv (some envs choke on its
|
| 62 |
+
# frame-introspection in heredocs / non-stack-rooted callers).
|
| 63 |
+
_env_path = Path(__file__).resolve().parent.parent / ".env"
|
| 64 |
+
if _env_path.exists():
|
| 65 |
+
for raw in _env_path.read_text().splitlines():
|
| 66 |
+
s = raw.strip()
|
| 67 |
+
if not s or s.startswith("#") or "=" not in s:
|
| 68 |
+
continue
|
| 69 |
+
k, v = s.split("=", 1)
|
| 70 |
+
os.environ.setdefault(k.strip(), v.strip())
|
| 71 |
+
|
| 72 |
+
if not os.environ.get("OPENROUTER_API_KEY"):
|
| 73 |
+
sys.exit("OPENROUTER_API_KEY not set (looked in env and .env)")
|
| 74 |
+
|
| 75 |
+
# Candidate list: well-known stable free-tier IDs as of 2026-Q2.
|
| 76 |
+
# Update by replacing this list — script is a probe, not a config source.
|
| 77 |
+
DEFAULT_CANDIDATES = [
|
| 78 |
+
"google/gemma-2-9b-it:free",
|
| 79 |
+
"google/gemini-2.0-flash-exp:free",
|
| 80 |
+
"meta-llama/llama-3.2-3b-instruct:free",
|
| 81 |
+
"meta-llama/llama-3.3-70b-instruct:free",
|
| 82 |
+
"mistralai/mistral-7b-instruct:free",
|
| 83 |
+
"qwen/qwen-2.5-72b-instruct:free",
|
| 84 |
+
"deepseek/deepseek-r1:free",
|
| 85 |
+
"deepseek/deepseek-chat:free",
|
| 86 |
+
"nousresearch/hermes-3-llama-3.1-405b:free",
|
| 87 |
+
"microsoft/phi-3-mini-128k-instruct:free",
|
| 88 |
+
]
|
| 89 |
+
|
| 90 |
+
candidates = sys.argv[1:] or DEFAULT_CANDIDATES
|
| 91 |
+
|
| 92 |
+
from openai import ( # noqa: E402 (after env load)
|
| 93 |
+
OpenAI, APIStatusError, APIConnectionError, RateLimitError, APITimeoutError,
|
| 94 |
+
)
|
| 95 |
+
|
| 96 |
+
client = OpenAI(
|
| 97 |
+
base_url="https://openrouter.ai/api/v1",
|
| 98 |
+
api_key=os.environ["OPENROUTER_API_KEY"],
|
| 99 |
+
timeout=15.0,
|
| 100 |
+
)
|
| 101 |
+
|
| 102 |
+
for m in candidates:
|
| 103 |
+
try:
|
| 104 |
+
c = client.chat.completions.create(
|
| 105 |
+
model=m,
|
| 106 |
+
messages=[{"role": "user", "content": "Reply with the single word OK."}],
|
| 107 |
+
max_tokens=8,
|
| 108 |
+
temperature=0,
|
| 109 |
+
)
|
| 110 |
+
text = (c.choices[0].message.content or "").strip()
|
| 111 |
+
print(f" OK {m} → {text[:30]!r}")
|
| 112 |
+
except APIStatusError as e:
|
| 113 |
+
code = getattr(e, "status_code", "?")
|
| 114 |
+
print(f" {code:<5} {m}")
|
| 115 |
+
except RateLimitError:
|
| 116 |
+
print(f" 429 {m} (rate-limited)")
|
| 117 |
+
except (APIConnectionError, APITimeoutError) as e:
|
| 118 |
+
print(f" CONN {m} ({type(e).__name__})")
|
| 119 |
+
except Exception as e:
|
| 120 |
+
print(f" ERR {m} ({type(e).__name__}: {e})")
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
- [ ] **Step 2: Run the diagnostic**
|
| 124 |
+
|
| 125 |
+
```bash
|
| 126 |
+
python scripts/diagnose_openrouter.py
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
Expected output: one line per candidate, status code + preview.
|
| 130 |
+
|
| 131 |
+
- [ ] **Step 3: Branch on the result**
|
| 132 |
+
|
| 133 |
+
- If **at least one** model returns `OK ... → 'OK'` (or any non-empty text):
|
| 134 |
+
- Record the OK model IDs — they become the new chain in Task 3.
|
| 135 |
+
- Continue to Task 2.
|
| 136 |
+
- If **every** line shows `401`:
|
| 137 |
+
- The API key is unauthorized. **Stop and reach the user.** Likely causes: key revoked, wrong account, missing free-tier opt-in at https://openrouter.ai/settings/privacy (some free models require enabling "free model training" data sharing). Do not edit code — the chain doesn't matter while auth is broken.
|
| 138 |
+
- If lines show a mix of 401 and 404:
|
| 139 |
+
- 401 = auth failure (still blocking). Same as above.
|
| 140 |
+
- If lines show `404` for all:
|
| 141 |
+
- The chain candidates are all retired. Replace `DEFAULT_CANDIDATES` with fresh IDs from `curl https://openrouter.ai/api/v1/models | jq -r '.data[]|select(.pricing.prompt=="0")|.id'` and re-run.
|
| 142 |
+
|
| 143 |
+
- [ ] **Step 4: Commit the diagnostic script**
|
| 144 |
+
|
| 145 |
+
```bash
|
| 146 |
+
git add scripts/diagnose_openrouter.py
|
| 147 |
+
git commit -m "chore(llm): one-shot OpenRouter free-tier reachability probe"
|
| 148 |
+
```
|
| 149 |
+
|
| 150 |
+
---
|
| 151 |
+
|
| 152 |
+
### Task 2: Lock current behavior with a unit test for the new 401 classifier
|
| 153 |
+
|
| 154 |
+
**Files:**
|
| 155 |
+
- Modify: `tests/llm/test_explainer.py` — add one failing test before Task 3 changes the production code
|
| 156 |
+
|
| 157 |
+
This is TDD discipline: write the test that proves the new behavior **before** writing the code. The test asserts that an unauthorized response (401) classifies as fatal-no-retry — `_llm_explain` returns `None` immediately after one model attempt instead of trying every model in the chain.
|
| 158 |
+
|
| 159 |
+
- [ ] **Step 1: Read the existing test file structure**
|
| 160 |
+
|
| 161 |
+
```bash
|
| 162 |
+
sed -n '1,40p' tests/llm/test_explainer.py
|
| 163 |
+
```
|
| 164 |
+
|
| 165 |
+
Expected: confirm the file uses pytest classes + `monkeypatch.setenv`, and that an `_FIXTURE_PAYLOAD_BBB` (or similar) is defined.
|
| 166 |
+
|
| 167 |
+
- [ ] **Step 2: Write the failing test**
|
| 168 |
+
|
| 169 |
+
Append the following at the bottom of [tests/llm/test_explainer.py](tests/llm/test_explainer.py). The fixture name in the existing file is `_FIXTURE_PAYLOAD_BBB` — confirm by grep before pasting; if it differs, swap to whatever the file already exports.
|
| 170 |
+
|
| 171 |
+
> **Monkeypatch target subtlety:** `src/llm/explainer.py` does `from openai import OpenAI` **inside** `_llm_explain` (the import is local to the function), so `monkeypatch.setattr(ex, "OpenAI", ...)` would silently no-op (the module-level attribute doesn't exist and the function rebinds locally each call). We must patch on the `openai` module itself: `monkeypatch.setattr("openai.OpenAI", factory)`. The local `from openai import OpenAI` then resolves to our stub.
|
| 172 |
+
|
| 173 |
+
```python
|
| 174 |
+
class TestAuthFailureShortCircuits:
|
| 175 |
+
"""A 401 from OpenRouter means the key is unauthorized — every model
|
| 176 |
+
in the chain will fail the same way, so we must short-circuit instead
|
| 177 |
+
of burning the full chain on every request."""
|
| 178 |
+
|
| 179 |
+
def test_401_short_circuits_to_template_after_one_attempt(self, monkeypatch):
|
| 180 |
+
from src.llm import explainer as ex
|
| 181 |
+
from openai import APIStatusError
|
| 182 |
+
import httpx
|
| 183 |
+
|
| 184 |
+
monkeypatch.delenv("NEUROBRIDGE_DISABLE_LLM", raising=False)
|
| 185 |
+
monkeypatch.setenv("OPENROUTER_API_KEY", "sk-or-v1-deliberately-bad")
|
| 186 |
+
|
| 187 |
+
attempts: list[str] = []
|
| 188 |
+
|
| 189 |
+
def _raise_401(**kwargs):
|
| 190 |
+
attempts.append(kwargs["model"])
|
| 191 |
+
req = httpx.Request("POST", "https://openrouter.ai/api/v1/chat/completions")
|
| 192 |
+
resp = httpx.Response(status_code=401, request=req)
|
| 193 |
+
raise APIStatusError(message="No auth credentials found", response=resp, body={})
|
| 194 |
+
|
| 195 |
+
class _StubCompletions:
|
| 196 |
+
create = staticmethod(_raise_401)
|
| 197 |
+
|
| 198 |
+
class _StubChat:
|
| 199 |
+
completions = _StubCompletions()
|
| 200 |
+
|
| 201 |
+
class _StubClient:
|
| 202 |
+
chat = _StubChat()
|
| 203 |
+
def __init__(self, **kwargs):
|
| 204 |
+
pass
|
| 205 |
+
|
| 206 |
+
# Must patch on the `openai` module — the explainer does
|
| 207 |
+
# `from openai import OpenAI` *inside* the function (see
|
| 208 |
+
# src/llm/explainer.py:269-275), so any module-level attribute
|
| 209 |
+
# on `src.llm.explainer` would be a no-op.
|
| 210 |
+
monkeypatch.setattr("openai.OpenAI", _StubClient)
|
| 211 |
+
|
| 212 |
+
out = ex._llm_explain(_FIXTURE_PAYLOAD_BBB, modality="bbb")
|
| 213 |
+
|
| 214 |
+
assert out is None, "401 must surface as a None return (caller falls back to template)"
|
| 215 |
+
assert len(attempts) == 1, f"401 must short-circuit; tried {len(attempts)} models: {attempts}"
|
| 216 |
+
|
| 217 |
+
def test_explain_returns_template_source_on_401(self, monkeypatch):
|
| 218 |
+
from src.llm import explainer as ex
|
| 219 |
+
from openai import APIStatusError
|
| 220 |
+
import httpx
|
| 221 |
+
|
| 222 |
+
monkeypatch.delenv("NEUROBRIDGE_DISABLE_LLM", raising=False)
|
| 223 |
+
monkeypatch.setenv("OPENROUTER_API_KEY", "sk-or-v1-deliberately-bad")
|
| 224 |
+
|
| 225 |
+
def _raise_401(**kwargs):
|
| 226 |
+
req = httpx.Request("POST", "https://openrouter.ai/api/v1/chat/completions")
|
| 227 |
+
raise APIStatusError(
|
| 228 |
+
message="auth",
|
| 229 |
+
response=httpx.Response(401, request=req),
|
| 230 |
+
body={},
|
| 231 |
+
)
|
| 232 |
+
|
| 233 |
+
class _Comp:
|
| 234 |
+
create = staticmethod(_raise_401)
|
| 235 |
+
|
| 236 |
+
class _Chat:
|
| 237 |
+
completions = _Comp()
|
| 238 |
+
|
| 239 |
+
class _Client:
|
| 240 |
+
chat = _Chat()
|
| 241 |
+
def __init__(self, **kwargs):
|
| 242 |
+
pass
|
| 243 |
+
|
| 244 |
+
monkeypatch.setattr("openai.OpenAI", _Client)
|
| 245 |
+
|
| 246 |
+
result = ex.explain(_FIXTURE_PAYLOAD_BBB, modality="bbb")
|
| 247 |
+
|
| 248 |
+
assert result["source"] == "template"
|
| 249 |
+
assert result["model"] is None
|
| 250 |
+
assert result["rationale"], "rationale must never be empty"
|
| 251 |
+
|
| 252 |
+
def test_400_advances_to_next_model_instead_of_short_circuiting(self, monkeypatch):
|
| 253 |
+
"""A 400 from one model is a prompt-shape mismatch with THAT model
|
| 254 |
+
(some models reject system roles, etc.) — try the next, don't give up."""
|
| 255 |
+
from src.llm import explainer as ex
|
| 256 |
+
from openai import APIStatusError
|
| 257 |
+
import httpx
|
| 258 |
+
|
| 259 |
+
monkeypatch.delenv("NEUROBRIDGE_DISABLE_LLM", raising=False)
|
| 260 |
+
monkeypatch.setenv("OPENROUTER_API_KEY", "sk-or-v1-anything")
|
| 261 |
+
|
| 262 |
+
attempts: list[str] = []
|
| 263 |
+
# Force a known multi-model chain so we can count attempts deterministically
|
| 264 |
+
monkeypatch.setenv("OPENROUTER_FREE_MODELS", "model-a:free,model-b:free,model-c:free")
|
| 265 |
+
|
| 266 |
+
def _raise_400(**kwargs):
|
| 267 |
+
attempts.append(kwargs["model"])
|
| 268 |
+
req = httpx.Request("POST", "https://openrouter.ai/api/v1/chat/completions")
|
| 269 |
+
raise APIStatusError(
|
| 270 |
+
message="bad request",
|
| 271 |
+
response=httpx.Response(400, request=req),
|
| 272 |
+
body={},
|
| 273 |
+
)
|
| 274 |
+
|
| 275 |
+
class _Comp:
|
| 276 |
+
create = staticmethod(_raise_400)
|
| 277 |
+
|
| 278 |
+
class _Chat:
|
| 279 |
+
completions = _Comp()
|
| 280 |
+
|
| 281 |
+
class _Client:
|
| 282 |
+
chat = _Chat()
|
| 283 |
+
def __init__(self, **kwargs):
|
| 284 |
+
pass
|
| 285 |
+
|
| 286 |
+
monkeypatch.setattr("openai.OpenAI", _Client)
|
| 287 |
+
|
| 288 |
+
out = ex._llm_explain(_FIXTURE_PAYLOAD_BBB, modality="bbb")
|
| 289 |
+
|
| 290 |
+
assert out is None, "all models 400'd → must return None for template fallback"
|
| 291 |
+
assert attempts == ["model-a:free", "model-b:free", "model-c:free"], (
|
| 292 |
+
f"400 must advance to next model; got attempts={attempts}"
|
| 293 |
+
)
|
| 294 |
+
```
|
| 295 |
+
|
| 296 |
+
- [ ] **Step 3: Run the new tests — at least one MUST fail**
|
| 297 |
+
|
| 298 |
+
```bash
|
| 299 |
+
pytest tests/llm/test_explainer.py::TestAuthFailureShortCircuits -v
|
| 300 |
+
```
|
| 301 |
+
|
| 302 |
+
Expected:
|
| 303 |
+
- `test_400_advances_to_next_model_instead_of_short_circuiting` → **FAIL** (current code at [src/llm/explainer.py:303-310](src/llm/explainer.py#L303-L310) treats 400 as fatal and returns `None` after the first model, so `attempts` will equal `["model-a:free"]`, not the full chain).
|
| 304 |
+
- The two 401 tests may pass by accident with current code (the catch-all `return None` already short-circuits on any unclassified status). They stay as regression guards — Task 3 will explicitly classify 401 with an actionable log message that the test asserts on (we'll extend them in Task 3 Step 2).
|
| 305 |
+
|
| 306 |
+
This is the correct TDD red: at least one test fails on a behavior we are about to implement.
|
| 307 |
+
|
| 308 |
+
- [ ] **Step 4: Commit the failing test**
|
| 309 |
+
|
| 310 |
+
```bash
|
| 311 |
+
git add tests/llm/test_explainer.py
|
| 312 |
+
git commit -m "test(llm): pin 401 short-circuit + 400 try-next-model behavior (red)"
|
| 313 |
+
```
|
| 314 |
+
|
| 315 |
+
---
|
| 316 |
+
|
| 317 |
+
### Task 3: Add explicit 401/400 classification with actionable WARNING
|
| 318 |
+
|
| 319 |
+
**Files:**
|
| 320 |
+
- Modify: `src/llm/explainer.py:303-317`
|
| 321 |
+
|
| 322 |
+
The current code lumps "real auth failure" with "transient model error" in one branch. We split them so logs make the diagnosis obvious.
|
| 323 |
+
|
| 324 |
+
- [ ] **Step 1: Re-read the current except block to make the edit precise**
|
| 325 |
+
|
| 326 |
+
```bash
|
| 327 |
+
sed -n '292,320p' src/llm/explainer.py
|
| 328 |
+
```
|
| 329 |
+
|
| 330 |
+
- [ ] **Step 2: Replace the `APIStatusError` block with explicit classification**
|
| 331 |
+
|
| 332 |
+
Apply this edit to [src/llm/explainer.py](src/llm/explainer.py). Match the existing `except APIStatusError as e:` block (currently at line 303) exactly:
|
| 333 |
+
|
| 334 |
+
```python
|
| 335 |
+
except APIStatusError as e:
|
| 336 |
+
status = getattr(e, "status_code", None)
|
| 337 |
+
# 401 = unauthorized — the key is bad, no model in this chain
|
| 338 |
+
# will succeed. Surface a loud, actionable hint and bail.
|
| 339 |
+
if status == 401:
|
| 340 |
+
logger.warning(
|
| 341 |
+
"OpenRouter 401 unauthorized on %s. The OPENROUTER_API_KEY "
|
| 342 |
+
"is rejected — verify it is current at "
|
| 343 |
+
"https://openrouter.ai/keys and that free-model data-sharing "
|
| 344 |
+
"is enabled at https://openrouter.ai/settings/privacy. "
|
| 345 |
+
"Falling back to deterministic template.",
|
| 346 |
+
model,
|
| 347 |
+
)
|
| 348 |
+
return None
|
| 349 |
+
# 400 = malformed prompt for this specific model (e.g. it
|
| 350 |
+
# rejected our system role). Skip this model, try the next.
|
| 351 |
+
if status == 400:
|
| 352 |
+
logger.info(
|
| 353 |
+
"OpenRouter 400 on %s (likely prompt-shape mismatch); "
|
| 354 |
+
"advancing to next free model.", model,
|
| 355 |
+
)
|
| 356 |
+
continue
|
| 357 |
+
# 402 credits / 403 access / 404 retired-id / 5xx upstream → next.
|
| 358 |
+
if status in (402, 403, 404) or (status is not None and 500 <= status < 600):
|
| 359 |
+
logger.info("OpenRouter %s on %s; advancing to next free model.", status, model)
|
| 360 |
+
continue
|
| 361 |
+
logger.warning("LLM call failed on %s (%s); falling back to template.", model, e)
|
| 362 |
+
return None
|
| 363 |
+
```
|
| 364 |
+
|
| 365 |
+
- [ ] **Step 3: Run the new tests — they MUST pass now**
|
| 366 |
+
|
| 367 |
+
```bash
|
| 368 |
+
pytest tests/llm/test_explainer.py::TestAuthFailureShortCircuits -v
|
| 369 |
+
```
|
| 370 |
+
|
| 371 |
+
Expected: both PASS.
|
| 372 |
+
|
| 373 |
+
- [ ] **Step 4: Run the full LLM-explainer test suite to confirm no regressions**
|
| 374 |
+
|
| 375 |
+
```bash
|
| 376 |
+
pytest tests/llm/ -v
|
| 377 |
+
```
|
| 378 |
+
|
| 379 |
+
Expected: all template-path tests still pass (they should — they're env-gated to `NEUROBRIDGE_DISABLE_LLM=1`, untouched).
|
| 380 |
+
|
| 381 |
+
- [ ] **Step 5: Commit**
|
| 382 |
+
|
| 383 |
+
```bash
|
| 384 |
+
git add src/llm/explainer.py
|
| 385 |
+
git commit -m "feat(llm): classify 401 as fatal+actionable, 400 as skip-this-model"
|
| 386 |
+
```
|
| 387 |
+
|
| 388 |
+
---
|
| 389 |
+
|
| 390 |
+
### Task 4: Refresh `_DEFAULT_FREE_MODEL_CHAIN` with verified-live IDs
|
| 391 |
+
|
| 392 |
+
**Files:**
|
| 393 |
+
- Modify: `src/llm/explainer.py:62-73`
|
| 394 |
+
|
| 395 |
+
Use the OK list from Task 1's diagnostic. The chain should be ordered **smartest → smallest** so the best model is tried first; quota-exhausted models advance to the next.
|
| 396 |
+
|
| 397 |
+
- [ ] **Step 1: Re-run the diagnostic to confirm the chain is still live**
|
| 398 |
+
|
| 399 |
+
```bash
|
| 400 |
+
python scripts/diagnose_openrouter.py
|
| 401 |
+
```
|
| 402 |
+
|
| 403 |
+
Expected: at least 3 lines marked `OK`. Capture them.
|
| 404 |
+
|
| 405 |
+
- [ ] **Step 2: Replace the chain in [src/llm/explainer.py:62-73](src/llm/explainer.py#L62-L73)**
|
| 406 |
+
|
| 407 |
+
The exact replacement depends on Task 1's results. Example assuming Step 1 confirms `gemma-2-9b-it`, `llama-3.3-70b-instruct`, `mistral-7b-instruct`, `llama-3.2-3b-instruct` are OK:
|
| 408 |
+
|
| 409 |
+
```python
|
| 410 |
+
# Free-tier fallback chain, smartest → smallest. When a model returns 429
|
| 411 |
+
# (rate-limit / daily-quota exhausted), 402 (credits), 404 (id retired) or
|
| 412 |
+
# 5xx (upstream), we advance to the next model. Network/timeout errors fall
|
| 413 |
+
# straight to the deterministic template — switching models won't help.
|
| 414 |
+
# Override at runtime via OPENROUTER_FREE_MODELS (comma-separated). Model
|
| 415 |
+
# availability on OpenRouter churns; verify with scripts/diagnose_openrouter.py.
|
| 416 |
+
_DEFAULT_FREE_MODEL_CHAIN: tuple[str, ...] = (
|
| 417 |
+
"meta-llama/llama-3.3-70b-instruct:free", # 70B reasoning-capable
|
| 418 |
+
"google/gemma-2-9b-it:free", # 9B instruct, fast
|
| 419 |
+
"mistralai/mistral-7b-instruct:free", # 7B last-resort
|
| 420 |
+
"meta-llama/llama-3.2-3b-instruct:free", # 3B emergency
|
| 421 |
+
)
|
| 422 |
+
```
|
| 423 |
+
|
| 424 |
+
If Task 1 returned different OK IDs, substitute them; preserve the smartest-first ordering.
|
| 425 |
+
|
| 426 |
+
- [ ] **Step 3: Re-run the unit suite — must still pass**
|
| 427 |
+
|
| 428 |
+
```bash
|
| 429 |
+
pytest tests/llm/ -v
|
| 430 |
+
```
|
| 431 |
+
|
| 432 |
+
Expected: all green. The chain change is semantic-only (no test asserts specific model IDs).
|
| 433 |
+
|
| 434 |
+
- [ ] **Step 4: Commit**
|
| 435 |
+
|
| 436 |
+
```bash
|
| 437 |
+
git add src/llm/explainer.py
|
| 438 |
+
git commit -m "feat(llm): refresh free-tier chain with verified-live OpenRouter IDs"
|
| 439 |
+
```
|
| 440 |
+
|
| 441 |
+
---
|
| 442 |
+
|
| 443 |
+
### Task 5: Add one network-gated end-to-end LLM integration test
|
| 444 |
+
|
| 445 |
+
**Files:**
|
| 446 |
+
- Modify: `tests/llm/test_explainer.py` — append a new class
|
| 447 |
+
|
| 448 |
+
The unit suite proves classifier behavior with mocked errors. This test proves the **real** path: with a working key, `explain()` returns `source="llm"` and a non-empty rationale. It auto-skips when the key is missing so CI without secrets stays green.
|
| 449 |
+
|
| 450 |
+
- [ ] **Step 1: Append the integration test**
|
| 451 |
+
|
| 452 |
+
Add at the bottom of [tests/llm/test_explainer.py](tests/llm/test_explainer.py):
|
| 453 |
+
|
| 454 |
+
```python
|
| 455 |
+
import os as _os
|
| 456 |
+
|
| 457 |
+
import pytest as _pytest
|
| 458 |
+
|
| 459 |
+
|
| 460 |
+
@_pytest.mark.skipif(
|
| 461 |
+
not _os.environ.get("OPENROUTER_API_KEY"),
|
| 462 |
+
reason="OPENROUTER_API_KEY not set — skipping live LLM integration test",
|
| 463 |
+
)
|
| 464 |
+
@_pytest.mark.skipif(
|
| 465 |
+
_os.environ.get("NEUROBRIDGE_DISABLE_LLM") == "1",
|
| 466 |
+
reason="NEUROBRIDGE_DISABLE_LLM=1 — skipping live LLM integration test",
|
| 467 |
+
)
|
| 468 |
+
class TestLiveOpenRouterLLM:
|
| 469 |
+
"""End-to-end: hit a real OpenRouter free-tier model and assert
|
| 470 |
+
`explain()` returns source='llm' with non-empty content. Skipped
|
| 471 |
+
when no key is set or the kill-switch is on."""
|
| 472 |
+
|
| 473 |
+
def test_bbb_explain_returns_llm_source_with_real_key(self):
|
| 474 |
+
from src.llm import explainer as ex
|
| 475 |
+
|
| 476 |
+
result = ex.explain(_FIXTURE_PAYLOAD_BBB, modality="bbb")
|
| 477 |
+
|
| 478 |
+
# If every model in the chain is rate-limited or unreachable RIGHT NOW
|
| 479 |
+
# the result will fall back to template — that's a flaky-network
|
| 480 |
+
# condition, not a code bug. Surface it as an XFAIL-style assertion
|
| 481 |
+
# message instead of a hard failure.
|
| 482 |
+
if result["source"] == "template":
|
| 483 |
+
_pytest.skip(
|
| 484 |
+
"All free models in the chain were rate-limited or unreachable "
|
| 485 |
+
"at test time. Re-run later or run scripts/diagnose_openrouter.py."
|
| 486 |
+
)
|
| 487 |
+
|
| 488 |
+
assert result["source"] == "llm"
|
| 489 |
+
assert result["model"] is not None and result["model"].endswith(":free")
|
| 490 |
+
assert result["rationale"].strip(), "LLM returned empty rationale"
|
| 491 |
+
# Sanity: the rationale should mention SOMETHING about the prediction.
|
| 492 |
+
# We do not assert on exact model wording (non-deterministic), but
|
| 493 |
+
# we do assert it isn't a generic refusal/safety-filter response.
|
| 494 |
+
lowered = result["rationale"].lower()
|
| 495 |
+
assert not lowered.startswith("i cannot"), f"LLM refused: {result['rationale']!r}"
|
| 496 |
+
```
|
| 497 |
+
|
| 498 |
+
- [ ] **Step 2: Run the integration test**
|
| 499 |
+
|
| 500 |
+
```bash
|
| 501 |
+
pytest tests/llm/test_explainer.py::TestLiveOpenRouterLLM -v -s
|
| 502 |
+
```
|
| 503 |
+
|
| 504 |
+
Expected (with a working key, post-Task 1 fix): PASS, with `-s` showing OpenRouter response in the WARNING/INFO logs if any.
|
| 505 |
+
|
| 506 |
+
If it skips with "rate-limited or unreachable": wait 60s and retry. If it skips with "OPENROUTER_API_KEY not set": Task 1's auth issue is unresolved — go back to Task 1 Step 3.
|
| 507 |
+
|
| 508 |
+
- [ ] **Step 3: Run the FULL test suite to confirm 188 → 190 (or higher)**
|
| 509 |
+
|
| 510 |
+
```bash
|
| 511 |
+
pytest -q --tb=line
|
| 512 |
+
```
|
| 513 |
+
|
| 514 |
+
Expected: previous count + 2 new passing unit tests + 1 new (passing or skipping) integration test. **Zero failures.**
|
| 515 |
+
|
| 516 |
+
- [ ] **Step 4: Commit**
|
| 517 |
+
|
| 518 |
+
```bash
|
| 519 |
+
git add tests/llm/test_explainer.py
|
| 520 |
+
git commit -m "test(llm): add network-gated end-to-end OpenRouter integration test"
|
| 521 |
+
```
|
| 522 |
+
|
| 523 |
+
---
|
| 524 |
+
|
| 525 |
+
### Task 6: End-to-end live verification through FastAPI + Streamlit
|
| 526 |
+
|
| 527 |
+
**Files:** none (verification only)
|
| 528 |
+
|
| 529 |
+
Confirm the wiring works the same way the user's UI smoke-test did, but with LLM **enabled**.
|
| 530 |
+
|
| 531 |
+
- [ ] **Step 1: Start FastAPI WITHOUT the kill-switch**
|
| 532 |
+
|
| 533 |
+
```bash
|
| 534 |
+
NEUROBRIDGE_DISABLE_MLFLOW=1 \
|
| 535 |
+
uvicorn src.api.main:app --host 127.0.0.1 --port 8000 --log-level info &
|
| 536 |
+
sleep 4
|
| 537 |
+
curl -s http://127.0.0.1:8000/health | python -m json.tool
|
| 538 |
+
```
|
| 539 |
+
|
| 540 |
+
Expected: `{"status":"ok","pipelines":["bbb","eeg","mri"]}`. **Note the absence of `NEUROBRIDGE_DISABLE_LLM`** — that's the whole point.
|
| 541 |
+
|
| 542 |
+
- [ ] **Step 2: Hit /explain/bbb with a real prediction payload**
|
| 543 |
+
|
| 544 |
+
```bash
|
| 545 |
+
curl -s -X POST http://127.0.0.1:8000/explain/bbb \
|
| 546 |
+
-H 'Content-Type: application/json' \
|
| 547 |
+
-d '{
|
| 548 |
+
"smiles": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
|
| 549 |
+
"label": 1,
|
| 550 |
+
"label_text": "permeable",
|
| 551 |
+
"confidence": 0.98,
|
| 552 |
+
"top_features": [
|
| 553 |
+
{"feature":"fp_1822","shap_value":0.0796},
|
| 554 |
+
{"feature":"fp_1224","shap_value":0.0637},
|
| 555 |
+
{"feature":"fp_1323","shap_value":0.0570}
|
| 556 |
+
]
|
| 557 |
+
}' | python -m json.tool
|
| 558 |
+
```
|
| 559 |
+
|
| 560 |
+
Expected JSON: `"source": "llm"`, `"model": "<one of the chain ids>"`, `"rationale": "<2-4 free-form sentences mentioning caffeine / permeability / SHAP>"`. **Not** `"source": "template"`.
|
| 561 |
+
|
| 562 |
+
If `"source": "template"`: check the uvicorn log for the WARNING line added in Task 3 — it will tell you whether 401 (key issue), all-models-exhausted (quota/network), or something else.
|
| 563 |
+
|
| 564 |
+
- [ ] **Step 3: Hit /explain/eeg and /explain/mri**
|
| 565 |
+
|
| 566 |
+
```bash
|
| 567 |
+
curl -s -X POST http://127.0.0.1:8000/explain/eeg \
|
| 568 |
+
-H 'Content-Type: application/json' \
|
| 569 |
+
-d '{"rows": 62, "columns": 640, "duration_sec": 1.86, "mlflow_run_id": "test"}' \
|
| 570 |
+
| python -m json.tool
|
| 571 |
+
|
| 572 |
+
curl -s -X POST http://127.0.0.1:8000/explain/mri \
|
| 573 |
+
-H 'Content-Type: application/json' \
|
| 574 |
+
-d '{"site_gap_pre": 8975.3, "site_gap_post": 3057.6, "reduction_factor": 3, "n_subjects": 6}' \
|
| 575 |
+
| python -m json.tool
|
| 576 |
+
```
|
| 577 |
+
|
| 578 |
+
Expected: both return `"source": "llm"` with modality-appropriate prose.
|
| 579 |
+
|
| 580 |
+
- [ ] **Step 4: Start Streamlit and load the UI**
|
| 581 |
+
|
| 582 |
+
```bash
|
| 583 |
+
NEUROBRIDGE_API_URL=http://127.0.0.1:8000 \
|
| 584 |
+
NEUROBRIDGE_DISABLE_MLFLOW=1 \
|
| 585 |
+
streamlit run src/frontend/app.py --server.port 8501 \
|
| 586 |
+
--server.headless true --browser.gatherUsageStats false &
|
| 587 |
+
sleep 5
|
| 588 |
+
curl -s -o /dev/null -w "HTTP %{http_code}\n" http://127.0.0.1:8501/
|
| 589 |
+
```
|
| 590 |
+
|
| 591 |
+
Expected: HTTP 200.
|
| 592 |
+
|
| 593 |
+
- [ ] **Step 5: Manually verify the UI status badge flipped**
|
| 594 |
+
|
| 595 |
+
Open http://127.0.0.1:8501 in a browser. The Molecule (BBB) tab header should show `explainer · llm online` (green dot), **not** `explainer · template only` (mute). The status-line render is at [src/frontend/app.py:961-977](src/frontend/app.py#L961-L977) and depends on `_LLM_DISABLED` which reads `NEUROBRIDGE_DISABLE_LLM` at import time — since we did not set it, it should be False.
|
| 596 |
+
|
| 597 |
+
Then: predict a SMILES (e.g. caffeine `CN1C=NC2=C1C(=O)N(C(=O)N2C)C`), click the AI Assistant tab, generate a rationale. The rationale text should be free-form prose (not the templated "Predicted **X** with N% confidence." sentence pattern). The AI Assistant tab status indicator at [src/frontend/app.py:1056-1062](src/frontend/app.py#L1056-L1062) should also read `llm · online`.
|
| 598 |
+
|
| 599 |
+
If the badge still says `template only`: the env var leaked from a parent shell. `unset NEUROBRIDGE_DISABLE_LLM` and restart Streamlit.
|
| 600 |
+
|
| 601 |
+
- [ ] **Step 6: Tear down**
|
| 602 |
+
|
| 603 |
+
```bash
|
| 604 |
+
pkill -f "uvicorn src.api.main"
|
| 605 |
+
pkill -f "streamlit run src/frontend"
|
| 606 |
+
sleep 2
|
| 607 |
+
lsof -iTCP:8000 -sTCP:LISTEN 2>/dev/null
|
| 608 |
+
lsof -iTCP:8501 -sTCP:LISTEN 2>/dev/null
|
| 609 |
+
echo "(both empty = down)"
|
| 610 |
+
```
|
| 611 |
+
|
| 612 |
+
- [ ] **Step 7: No commit (verification-only task)**
|
| 613 |
+
|
| 614 |
+
If Step 2 or Step 5 surfaced any issue, fix it in the relevant earlier task and re-run from Step 1. Do not paper over a `source: "template"` response with a follow-up commit — root-cause it.
|
| 615 |
+
|
| 616 |
+
---
|
| 617 |
+
|
| 618 |
+
## Self-Review Checklist (run before declaring done)
|
| 619 |
+
|
| 620 |
+
- [ ] `pytest -q` reports the previous baseline + 2 new passing unit tests + 1 new passing-or-skipping integration test, zero failures.
|
| 621 |
+
- [ ] `python scripts/diagnose_openrouter.py` lists ≥1 OK model among the IDs hard-coded in `_DEFAULT_FREE_MODEL_CHAIN`.
|
| 622 |
+
- [ ] `curl /explain/bbb` with a real payload returns `"source": "llm"`.
|
| 623 |
+
- [ ] Streamlit BBB tab badge shows `explainer · llm online`, AI Assistant tab badge shows `llm · online`.
|
| 624 |
+
- [ ] Module docstring at [src/llm/explainer.py:1-10](src/llm/explainer.py#L1-L10) is still accurate (template = source of truth for unit tests, LLM = primary path in production).
|
| 625 |
+
- [ ] `NEUROBRIDGE_DISABLE_LLM=1` still forces template (existing test `test_disable_flag_forces_template_even_with_key_set` still passes — kill-switch preserved).
|
| 626 |
+
|
| 627 |
+
---
|
| 628 |
+
|
| 629 |
+
## Out of Scope (explicit non-goals)
|
| 630 |
+
|
| 631 |
+
- **Removing the template entirely.** Template stays as the outage fallback. The user said "remove from template" not "remove the template" — and even if they meant the latter, removing the template would mean a network blip = HTTP 500 from `/explain/*`, which the system-reliability shape of the project explicitly avoids (see [src/llm/explainer.py:1-10](src/llm/explainer.py#L1-L10) and the existing `test_disable_flag_forces_template_even_with_key_set` test).
|
| 632 |
+
- **Switching to a paid model / different provider.** The free-tier story is part of the hackathon narrative ("public-deployable on HF Spaces with one push"). Anthropic / OpenAI direct integration is a separate plan.
|
| 633 |
+
- **Streaming responses.** OpenRouter supports SSE streaming but neither the current API contract (`BBBExplainResponse` is a single string) nor the Streamlit UI ask for it.
|
| 634 |
+
- **Caching identical (payload, model) pairs.** Could halve latency for repeat clicks but adds a cache-invalidation surface; defer until a user actually complains about latency.
|