mekosotto commited on
Commit
3acc658
·
1 Parent(s): decc9ff

md updates

Browse files
Files changed (3) hide show
  1. AGENTS.md +36 -7
  2. PROJECT_OVERVIEW.md +14 -7
  3. README.md +19 -7
AGENTS.md CHANGED
@@ -214,24 +214,51 @@ renders a one-line caption with a magnitude tag (in-band, mild,
214
  significant). Worker restart clears the deque; this is acceptable for
215
  demo and removes the audit-trail concern.
216
 
217
- ## 11. LLM Explainer Surface (Day 7)
218
 
219
  `src/llm/explainer.py` is the single entry point for natural-language
220
  rationales. `explain(payload)` always returns `{rationale, source,
221
  model}`. The deterministic template path is the source of truth for
222
- tests; the LLM path is OpenRouter via the `openai==1.51.0` SDK using
223
- `meta-llama/llama-3.2-3b-instruct:free`. Two env knobs control the
224
- behavior:
 
 
 
 
 
 
 
 
 
 
 
225
 
226
  - `OPENROUTER_API_KEY` — when absent, fallback to template.
227
  - `NEUROBRIDGE_DISABLE_LLM=1` — hard kill-switch; force template even
228
  if a key is set. Use this for demo days when you want fully
229
  deterministic, reproducible rationales.
230
 
 
 
 
 
 
 
 
 
231
  The `POST /explain/bbb` endpoint mirrors this contract. Pydantic
232
  enforces a non-empty `top_features` list (422 on empty); every other
233
  failure mode degrades to template + WARNING log + `source="template"`.
234
 
 
 
 
 
 
 
 
 
235
  ## 12. Multi-Modal Explainer (Day 8)
236
 
237
  `src/llm/explainer.py` exposes `explain(payload, modality)` where
@@ -270,9 +297,11 @@ bakes the model artifact into the image so the first `/predict/bbb`
270
  call is instant on cold start.
271
 
272
  Default environment: `DEPLOY_ENV=hf_spaces`,
273
- `NEUROBRIDGE_DISABLE_MLFLOW=1`, `NEUROBRIDGE_DISABLE_LLM=1`.
274
- Operators can opt back into LLM by setting `OPENROUTER_API_KEY` in
275
- the HF Space's Secrets panel and unsetting the disable flag.
 
 
276
 
277
  The README's YAML front-matter declares the Space metadata
278
  (SDK=docker, port=7860, app_file=src/frontend/app.py).
 
214
  significant). Worker restart clears the deque; this is acceptable for
215
  demo and removes the audit-trail concern.
216
 
217
+ ## 11. LLM Explainer Surface (Day 7 + 9)
218
 
219
  `src/llm/explainer.py` is the single entry point for natural-language
220
  rationales. `explain(payload)` always returns `{rationale, source,
221
  model}`. The deterministic template path is the source of truth for
222
+ tests; the LLM path is OpenRouter via the `openai==1.51.0` SDK and
223
+ walks a **smartest smallest free-tier fallback chain**
224
+ (`_DEFAULT_FREE_MODEL_CHAIN`, 10 ids — head: `inclusionai/ling-2.6-1t:free`).
225
+ The chain is overridable at runtime via `OPENROUTER_FREE_MODELS`
226
+ (comma-separated). Status-code classification:
227
+
228
+ - `401` → key is bad → bail to template + actionable WARNING (rotate at
229
+ https://openrouter.ai/keys, enable free-model data-sharing at
230
+ https://openrouter.ai/settings/privacy).
231
+ - `400` → prompt-shape mismatch on this model → advance to next.
232
+ - `402 / 403 / 404 / 429 / 5xx` → advance to next.
233
+ - Network/timeout → bail to template (switching models won't help).
234
+
235
+ Two env knobs control the gate:
236
 
237
  - `OPENROUTER_API_KEY` — when absent, fallback to template.
238
  - `NEUROBRIDGE_DISABLE_LLM=1` — hard kill-switch; force template even
239
  if a key is set. Use this for demo days when you want fully
240
  deterministic, reproducible rationales.
241
 
242
+ **Prompt design** (`_build_llm_prompt`): two intent modes. When the
243
+ caller supplies `user_question`, the model is instructed to
244
+ language-match (Turkish question → Turkish answer), answer the
245
+ question directly (not a canned paper-style summary), and respond
246
+ conversationally to off-topic / greeting questions. When no
247
+ `user_question` is supplied, falls back to the original 2-4 sentence
248
+ paper-style rationale.
249
+
250
  The `POST /explain/bbb` endpoint mirrors this contract. Pydantic
251
  enforces a non-empty `top_features` list (422 on empty); every other
252
  failure mode degrades to template + WARNING log + `source="template"`.
253
 
254
+ **Diagnostics**: `GET /diag/openrouter` (`src/api/main.py`) returns
255
+ key-presence (length + 12-char prefix only), kill-switch state, chain
256
+ length, first model id, and the result of an 8-token probe call
257
+ against that model. Surfaced in Streamlit as the sidebar "🔧 Diagnose
258
+ LLM" button. Use it when the deployed Space shows `source="template"`
259
+ unexpectedly — the most common causes are a missing/misnamed
260
+ `OPENROUTER_API_KEY` Space secret or a revoked key.
261
+
262
  ## 12. Multi-Modal Explainer (Day 8)
263
 
264
  `src/llm/explainer.py` exposes `explain(payload, modality)` where
 
297
  call is instant on cold start.
298
 
299
  Default environment: `DEPLOY_ENV=hf_spaces`,
300
+ `NEUROBRIDGE_DISABLE_MLFLOW=1`. The LLM kill-switch is **not** set —
301
+ deployed Spaces use the real OpenRouter free-tier chain (§11) when
302
+ `OPENROUTER_API_KEY` is configured in the Space's Secrets panel. Set
303
+ `NEUROBRIDGE_DISABLE_LLM=1` only when you want to force the
304
+ deterministic template path for a fully-reproducible demo.
305
 
306
  The README's YAML front-matter declares the Space metadata
307
  (SDK=docker, port=7860, app_file=src/frontend/app.py).
PROJECT_OVERVIEW.md CHANGED
@@ -120,7 +120,7 @@ Bir BBB tahmini yaptığında karar kartında 7 ayrı sinyal görüyorsun. Her b
120
  | 4 | **Calibration caption** | "≥75% güven üreten tahminler hold-out test'te %92 precision (n=18)" | Train sırasında 80/20 split, 6 threshold bin'i, lookup |
121
  | 5 | **Drift caption** | "trailing-100 confidence median +0.42σ from train (within range)" | Module-level `deque(maxlen=100)` + train-time median/std |
122
  | 6 | **Top SHAP attributions bar chart** | Hangi fingerprint bit'leri kararı pushladı? | `shap.TreeExplainer(model).shap_values(X)` exact, 3-branch dispatch (sklearn version compat) |
123
- | 7 | **AI Assistant rationale** (opsiyonel) | "Predicted **permeable** with 82% confidence. Top SHAP attributions toward this label..." | OpenRouter LLM (`llama-3.2-3b-instruct:free`) veya deterministik template, hybrid fallback |
124
 
125
  Bu yedi katman birlikte "Black-Box AI ≠ Trust" mit'ini çürütür: kara kutu değil, **camdan kutu**.
126
 
@@ -215,12 +215,19 @@ Streamlit, container içinde `httpx.post("http://127.0.0.1:8000/...")` ile FastA
215
  - **Biology-preserving:** Covariate'lere koşullu (age, sex, diagnosis), site bias'ı kaldırırken biological variance'ı tutar
216
  - **Alternatif neden değil:** Z-score normalization sadece location düzeltir, scale farkı kalır; CycleGAN denemeleri var ama train zamanı ve hyperparameter complexity hackathon'a uygun değil
217
 
218
- ### 6.6 LLM Provider: OpenRouter (free tier llama-3.2-3b)
219
 
220
  - **Free tier:** Hackathon süresince ücret yok, jüri demosunda da maliyet riski yok
221
  - **OpenAI-compatible:** `openai==1.51.0` SDK doğrudan çalışır, custom HTTP client yok
222
- - **Multiple model fallback:** Free tier'da llama, gemini, qwen seçenekleri var
223
- - **Hybrid contract:** API key yoksa veya HTTP/network hatası varsa **deterministik template path**'e düşer demo gününde **asla çökmez**
 
 
 
 
 
 
 
224
  - **Alternatif neden değil:** OpenAI direct API ücretli, Anthropic API key yoktu, lokal Ollama demo gününde 1B model indirme/load riski
225
 
226
  ### 6.7 Tracking: MLflow
@@ -326,7 +333,7 @@ Demo gününde her şey ters gidebilir. Üç env variable her felaket senaryoyu
326
  | `NEUROBRIDGE_DISABLE_MLFLOW=1` | MLflow lookup yapılmaz, provenance badge "—" gösterir, sistem çalışmaya devam eder |
327
  | `BBB_MODEL_PATH=...` | Default `data/processed/bbb_model.joblib` yerine farklı yol |
328
 
329
- HF Spaces deploy'ında ilk ikisi default `=1`. OpenRouter aktive etmek istersen Settings → Variables and Secrets → `OPENROUTER_API_KEY` ekle ve `NEUROBRIDGE_DISABLE_LLM=0`.
330
 
331
  ### 8.4 Drift detection
332
 
@@ -472,7 +479,7 @@ Z-score sadece **mean**'i sıfıra çeker. ComBat hem mean hem **scale**'i (vari
472
  Olabilirdi ama **Random Forest gibi tree ensemble**'larda SHAP TreeExplainer'ın **exact** çözümü var (Lundberg & Lee 2018). LIME local linear approximation, tree boundary'larında hatalı extrapolation yapabiliyor. Hackathon jürisi sayısal kesinliği seviyor.
473
 
474
  ### "Neden OpenRouter, neden ChatGPT API değil?"
475
- OpenAI API ücretli + key yok. OpenRouter free tier'da llama-3.2-3b ve gemini-flash sunuyor. **Hybrid template fallback** sayesinde key olmasa veya API ölse bile sistem çalışmaya devam eder — demo gününde kritik.
476
 
477
  ### "Neden Streamlit, neden React/Next.js değil?"
478
  Hackathon süresi 8 gün. Python-only stack 4 günü implement'a, 2 günü test'e, 1 günü polish'e, 1 günü deploy'a verdi. React öğrenirken hackathon biterdi. Streamlit'in "fast iteration" değer önerisi tam bizim için.
@@ -643,7 +650,7 @@ Yukarıdaki bölümlerde geçen teknik terimlerin sade Türkçe karşılıkları
643
  ### 16.5 LLM / Açıklanabilirlik
644
 
645
  - **LLM (Large Language Model):** GPT, Llama, Gemini, Claude gibi büyük dil modelleri.
646
- - **OpenRouter:** Birçok LLM provider'ı tek API arkasında toplayan servis. Free tier'da `llama-3.2-3b`, `gemini-flash`, `qwen` gibi seçenekler.
647
  - **API Key:** Bir servise erişim için kişisel jeton. Asla repo'ya commit edilmez (HF Spaces "Variables and Secrets"'a girilir).
648
  - **Rationale (Gerekçe):** Modelin tahminine dair doğal-dil açıklama (örn. "Predicted permeable with 82% confidence; SHAP attributions toward this label include bits 532 and 1024…").
649
  - **Source Label:** Cevabın hangi kaynaktan geldiğini gösteren etiket. Bizde `source: "llm"` veya `source: "template"` — auditability için.
 
120
  | 4 | **Calibration caption** | "≥75% güven üreten tahminler hold-out test'te %92 precision (n=18)" | Train sırasında 80/20 split, 6 threshold bin'i, lookup |
121
  | 5 | **Drift caption** | "trailing-100 confidence median +0.42σ from train (within range)" | Module-level `deque(maxlen=100)` + train-time median/std |
122
  | 6 | **Top SHAP attributions bar chart** | Hangi fingerprint bit'leri kararı pushladı? | `shap.TreeExplainer(model).shap_values(X)` exact, 3-branch dispatch (sklearn version compat) |
123
+ | 7 | **AI Assistant rationale** (opsiyonel) | "Predicted **permeable** with 82% confidence. Top SHAP attributions toward this label..." | OpenRouter free-tier fallback chain (10 model, head: `inclusionai/ling-2.6-1t:free`) `user_question` diline matchler; key yoksa / chain tükenmişse deterministik template |
124
 
125
  Bu yedi katman birlikte "Black-Box AI ≠ Trust" mit'ini çürütür: kara kutu değil, **camdan kutu**.
126
 
 
215
  - **Biology-preserving:** Covariate'lere koşullu (age, sex, diagnosis), site bias'ı kaldırırken biological variance'ı tutar
216
  - **Alternatif neden değil:** Z-score normalization sadece location düzeltir, scale farkı kalır; CycleGAN denemeleri var ama train zamanı ve hyperparameter complexity hackathon'a uygun değil
217
 
218
+ ### 6.6 LLM Provider: OpenRouter (free-tier fallback chain, 10 model)
219
 
220
  - **Free tier:** Hackathon süresince ücret yok, jüri demosunda da maliyet riski yok
221
  - **OpenAI-compatible:** `openai==1.51.0` SDK doğrudan çalışır, custom HTTP client yok
222
+ - **Smartest smallest fallback chain:** `src/llm/explainer.py` içindeki `_DEFAULT_FREE_MODEL_CHAIN` 10 free-tier id tutar (head: `inclusionai/ling-2.6-1t:free` ~1T flagship → tail: `poolside/laguna-xs.2:free`). Status-code classification:
223
+ - `429 / 402 / 403 / 404 / 5xx` bir sonraki modele geç
224
+ - `400` → o modelin prompt-shape mismatch'i, sonrakine geç
225
+ - `401` → key bozuk, tüm chain için anlamsız, **template'e düş + actionable WARNING** (key rotate at https://openrouter.ai/keys)
226
+ - Network/timeout → switching models won't help → template
227
+ - **Runtime override:** `OPENROUTER_FREE_MODELS="modelA,modelB"` env variable ile chain'i değiştirebilirsin (model availability OpenRouter'da haftalık değişiyor; verify with `python scripts/diagnose_openrouter.py`).
228
+ - **Hybrid contract:** API key yoksa, kill-switch açıksa veya tüm chain tükenirse **deterministik template path**'e düşer — demo gününde **asla çökmez**
229
+ - **Prompt design (intent-split):** Caller `user_question` verirse model'a "soru diliyle cevap ver, doğrudan soruyu cevapla, off-topic ise kısa-conversational kal" denir; vermezse default 2-4 cümle paper-style rationale fallback'i devreye girer.
230
+ - **Diagnostics:** `GET /diag/openrouter` (FastAPI) key presence (length + 12-char prefix only), kill-switch state, chain head, ve 8-token probe sonucunu döner. Streamlit'te sidebar "🔧 Diagnose LLM" butonu olarak surface'lenir.
231
  - **Alternatif neden değil:** OpenAI direct API ücretli, Anthropic API key yoktu, lokal Ollama demo gününde 1B model indirme/load riski
232
 
233
  ### 6.7 Tracking: MLflow
 
333
  | `NEUROBRIDGE_DISABLE_MLFLOW=1` | MLflow lookup yapılmaz, provenance badge "—" gösterir, sistem çalışmaya devam eder |
334
  | `BBB_MODEL_PATH=...` | Default `data/processed/bbb_model.joblib` yerine farklı yol |
335
 
336
+ HF Spaces deploy'ında **sadece** `NEUROBRIDGE_DISABLE_MLFLOW=1` set edilir (filesystem read-only edge case). LLM **default ON** — `Dockerfile` ve `Dockerfile.hf` artık `NEUROBRIDGE_DISABLE_LLM`'i hard-code etmiyor; deployed Space `OPENROUTER_API_KEY` Secret'ı varsa free-tier chain'i kullanır, yoksa template'e düşer. LLM'i jüri demosunda %100 deterministik istersen Space → Settings → Variables → `NEUROBRIDGE_DISABLE_LLM=1` ekle. LLM'in hangi state'te olduğunu canlı görmek için sidebar'daki "🔧 Diagnose LLM" butonu (`GET /diag/openrouter`'a vurur) key presence + chain head + 8-token probe döner.
337
 
338
  ### 8.4 Drift detection
339
 
 
479
  Olabilirdi ama **Random Forest gibi tree ensemble**'larda SHAP TreeExplainer'ın **exact** çözümü var (Lundberg & Lee 2018). LIME local linear approximation, tree boundary'larında hatalı extrapolation yapabiliyor. Hackathon jürisi sayısal kesinliği seviyor.
480
 
481
  ### "Neden OpenRouter, neden ChatGPT API değil?"
482
+ OpenAI API ücretli + key yok. OpenRouter free tier'da 1T flagship'ten (`inclusionai/ling-2.6-1t:free`) 30B reasoning'e (`nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free`) kadar 10 model'lik bir **smartest → smallest fallback chain** sunuyor; biri 429 / 4xx / 5xx dönerse otomatik bir sonrakine geçeriz. **Hybrid template fallback** sayesinde key olmasa, chain tükenmiş olsa veya API ölse bile sistem çalışmaya devam eder — demo gününde kritik.
483
 
484
  ### "Neden Streamlit, neden React/Next.js değil?"
485
  Hackathon süresi 8 gün. Python-only stack 4 günü implement'a, 2 günü test'e, 1 günü polish'e, 1 günü deploy'a verdi. React öğrenirken hackathon biterdi. Streamlit'in "fast iteration" değer önerisi tam bizim için.
 
650
  ### 16.5 LLM / Açıklanabilirlik
651
 
652
  - **LLM (Large Language Model):** GPT, Llama, Gemini, Claude gibi büyük dil modelleri.
653
+ - **OpenRouter:** Birçok LLM provider'ı tek API arkasında toplayan servis. Free tier'da `inclusionai/ling-2.6-1t`, `nvidia/nemotron-3-super-120b`, `google/gemma-4-31b`, `qwen3-next-80b` gibi seçenekler — biz smartest → smallest 10-model fallback chain (`_DEFAULT_FREE_MODEL_CHAIN`) ile her birini sırayla deniyoruz.
654
  - **API Key:** Bir servise erişim için kişisel jeton. Asla repo'ya commit edilmez (HF Spaces "Variables and Secrets"'a girilir).
655
  - **Rationale (Gerekçe):** Modelin tahminine dair doğal-dil açıklama (örn. "Predicted permeable with 82% confidence; SHAP attributions toward this label include bits 532 and 1024…").
656
  - **Source Label:** Cevabın hangi kaynaktan geldiğini gösteren etiket. Bizde `source: "llm"` veya `source: "template"` — auditability için.
README.md CHANGED
@@ -224,15 +224,22 @@ finishes in under 4 seconds on a 2024 laptop.
224
  - **Day-8 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-06-day8-grand-finale.md`](docs/superpowers/plans/2026-05-06-day8-grand-finale.md)
225
  - **New surfaces:** `POST /explain/eeg`, `POST /explain/mri`, `GET /experiments/runs`, `POST /experiments/diff`
226
  - **New deploy artifacts:** `Dockerfile.hf`, `supervisord.conf`
 
227
 
228
  ## Day 7 — Demo Recipe
229
 
230
  Pre-flight (one terminal):
231
 
232
  ```bash
233
- # Start API with deterministic explainer (no LLM key needed)
234
- NEUROBRIDGE_DISABLE_LLM=1 BBB_MODEL_PATH=data/processed/bbb_model.joblib \
 
 
 
235
  uvicorn src.api.main:app --port 8000
 
 
 
236
  ```
237
 
238
  Predict + explain (other terminal):
@@ -243,7 +250,10 @@ curl -s -X POST http://localhost:8000/predict/bbb \
243
  -H "Content-Type: application/json" \
244
  -d '{"smiles": "CCO", "top_k": 5}' | jq
245
 
246
- # 2) Explain — feed the predict response back as the explain payload
 
 
 
247
  curl -s -X POST http://localhost:8000/explain/bbb \
248
  -H "Content-Type: application/json" \
249
  -d '{
@@ -258,11 +268,13 @@ curl -s -X POST http://localhost:8000/explain/bbb \
258
  "drift_z": 0.42,
259
  "user_question": "Why permeable?"
260
  }' | jq
 
 
261
 
262
- # 3) Same call but with LLM enabled (set the key first)
263
- unset NEUROBRIDGE_DISABLE_LLM
264
- export OPENROUTER_API_KEY="sk-or-v1-…"
265
- # Repeat the curl above; expect "source": "llm" and a model name.
266
  ```
267
 
268
  Streamlit demo: `streamlit run src/frontend/app.py` → BBB tab → Predict → AI Assistant tab → ask a preset question.
 
224
  - **Day-8 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-06-day8-grand-finale.md`](docs/superpowers/plans/2026-05-06-day8-grand-finale.md)
225
  - **New surfaces:** `POST /explain/eeg`, `POST /explain/mri`, `GET /experiments/runs`, `POST /experiments/diff`
226
  - **New deploy artifacts:** `Dockerfile.hf`, `supervisord.conf`
227
+ - **LLM hardening (post-Day 8):** real OpenRouter LLM is now the default in deployed Spaces — `Dockerfile`/`Dockerfile.hf` no longer hard-code `NEUROBRIDGE_DISABLE_LLM=1`. Free-tier fallback chain (10 models, smartest → smallest) in [`src/llm/explainer.py`](src/llm/explainer.py), 401/400 status classification, and language-matching / intent-split prompt. Diagnostic endpoint `GET /diag/openrouter` ([`src/api/main.py`](src/api/main.py)) + Streamlit sidebar "🔧 Diagnose LLM" button. Live verification helper: [`scripts/diagnose_openrouter.py`](scripts/diagnose_openrouter.py).
228
 
229
  ## Day 7 — Demo Recipe
230
 
231
  Pre-flight (one terminal):
232
 
233
  ```bash
234
+ # Start API. With OPENROUTER_API_KEY set in your shell or .env,
235
+ # /explain/* hits the real LLM via the free-tier fallback chain
236
+ # (10 models, smartest → smallest — see AGENTS.md §11). Without
237
+ # a key, falls back to the deterministic template.
238
+ BBB_MODEL_PATH=data/processed/bbb_model.joblib \
239
  uvicorn src.api.main:app --port 8000
240
+
241
+ # Force the deterministic template path (no network, fully reproducible):
242
+ # NEUROBRIDGE_DISABLE_LLM=1 BBB_MODEL_PATH=... uvicorn ...
243
  ```
244
 
245
  Predict + explain (other terminal):
 
250
  -H "Content-Type: application/json" \
251
  -d '{"smiles": "CCO", "top_k": 5}' | jq
252
 
253
+ # 2) Explain — feed the predict response back as the explain payload.
254
+ # user_question drives the prompt: question language is mirrored
255
+ # (Turkish question → Turkish answer), and the model answers the
256
+ # question directly instead of returning a canned paper summary.
257
  curl -s -X POST http://localhost:8000/explain/bbb \
258
  -H "Content-Type: application/json" \
259
  -d '{
 
268
  "drift_z": 0.42,
269
  "user_question": "Why permeable?"
270
  }' | jq
271
+ # With a valid key: expect "source": "llm" + a model id from the chain.
272
+ # Without: expect "source": "template" + "model": null.
273
 
274
+ # 3) Diagnose OpenRouter reachability from inside the running API
275
+ # (key presence, chain head, 8-token probe). Surfaced in Streamlit
276
+ # as the sidebar "🔧 Diagnose LLM" button.
277
+ curl -s http://localhost:8000/diag/openrouter | jq
278
  ```
279
 
280
  Streamlit demo: `streamlit run src/frontend/app.py` → BBB tab → Predict → AI Assistant tab → ask a preset question.