Spaces:

mekosotto
/

hackathon

Running

App Files Files Community

mekosotto commited on 5 days ago

Commit

3acc658

1 Parent(s): decc9ff

md updates

Browse files

Files changed (3) hide show

AGENTS.md +36 -7
PROJECT_OVERVIEW.md +14 -7
README.md +19 -7

AGENTS.md CHANGED Viewed

@@ -214,24 +214,51 @@ renders a one-line caption with a magnitude tag (in-band, mild,
 significant). Worker restart clears the deque; this is acceptable for
 demo and removes the audit-trail concern.
-## 11. LLM Explainer Surface (Day 7)
 `src/llm/explainer.py` is the single entry point for natural-language
 rationales. `explain(payload)` always returns `{rationale, source,
 model}`. The deterministic template path is the source of truth for
-tests; the LLM path is OpenRouter via the `openai==1.51.0` SDK using
-`meta-llama/llama-3.2-3b-instruct:free`. Two env knobs control the
-behavior:
 - `OPENROUTER_API_KEY` — when absent, fallback to template.
 - `NEUROBRIDGE_DISABLE_LLM=1` — hard kill-switch; force template even
   if a key is set. Use this for demo days when you want fully
   deterministic, reproducible rationales.
 The `POST /explain/bbb` endpoint mirrors this contract. Pydantic
 enforces a non-empty `top_features` list (422 on empty); every other
 failure mode degrades to template + WARNING log + `source="template"`.
 ## 12. Multi-Modal Explainer (Day 8)
 `src/llm/explainer.py` exposes `explain(payload, modality)` where
@@ -270,9 +297,11 @@ bakes the model artifact into the image so the first `/predict/bbb`
 call is instant on cold start.
 Default environment: `DEPLOY_ENV=hf_spaces`,
-`NEUROBRIDGE_DISABLE_MLFLOW=1`, `NEUROBRIDGE_DISABLE_LLM=1`.
-Operators can opt back into LLM by setting `OPENROUTER_API_KEY` in
-the HF Space's Secrets panel and unsetting the disable flag.
 The README's YAML front-matter declares the Space metadata
 (SDK=docker, port=7860, app_file=src/frontend/app.py).

 significant). Worker restart clears the deque; this is acceptable for
 demo and removes the audit-trail concern.
+## 11. LLM Explainer Surface (Day 7 + 9)
 `src/llm/explainer.py` is the single entry point for natural-language
 rationales. `explain(payload)` always returns `{rationale, source,
 model}`. The deterministic template path is the source of truth for
+tests; the LLM path is OpenRouter via the `openai==1.51.0` SDK and
+walks a **smartest → smallest free-tier fallback chain**
+(`_DEFAULT_FREE_MODEL_CHAIN`, 10 ids — head: `inclusionai/ling-2.6-1t:free`).
+The chain is overridable at runtime via `OPENROUTER_FREE_MODELS`
+(comma-separated). Status-code classification:
+- `401` → key is bad → bail to template + actionable WARNING (rotate at
+  https://openrouter.ai/keys, enable free-model data-sharing at
+  https://openrouter.ai/settings/privacy).
+- `400` → prompt-shape mismatch on this model → advance to next.
+- `402 / 403 / 404 / 429 / 5xx` → advance to next.
+- Network/timeout → bail to template (switching models won't help).
+Two env knobs control the gate:
 - `OPENROUTER_API_KEY` — when absent, fallback to template.
 - `NEUROBRIDGE_DISABLE_LLM=1` — hard kill-switch; force template even
   if a key is set. Use this for demo days when you want fully
   deterministic, reproducible rationales.
+**Prompt design** (`_build_llm_prompt`): two intent modes. When the
+caller supplies `user_question`, the model is instructed to
+language-match (Turkish question → Turkish answer), answer the
+question directly (not a canned paper-style summary), and respond
+conversationally to off-topic / greeting questions. When no
+`user_question` is supplied, falls back to the original 2-4 sentence
+paper-style rationale.
 The `POST /explain/bbb` endpoint mirrors this contract. Pydantic
 enforces a non-empty `top_features` list (422 on empty); every other
 failure mode degrades to template + WARNING log + `source="template"`.
+**Diagnostics**: `GET /diag/openrouter` (`src/api/main.py`) returns
+key-presence (length + 12-char prefix only), kill-switch state, chain
+length, first model id, and the result of an 8-token probe call
+against that model. Surfaced in Streamlit as the sidebar "🔧 Diagnose
+LLM" button. Use it when the deployed Space shows `source="template"`
+unexpectedly — the most common causes are a missing/misnamed
+`OPENROUTER_API_KEY` Space secret or a revoked key.
 ## 12. Multi-Modal Explainer (Day 8)
 `src/llm/explainer.py` exposes `explain(payload, modality)` where
 call is instant on cold start.
 Default environment: `DEPLOY_ENV=hf_spaces`,
+`NEUROBRIDGE_DISABLE_MLFLOW=1`. The LLM kill-switch is **not** set —
+deployed Spaces use the real OpenRouter free-tier chain (§11) when
+`OPENROUTER_API_KEY` is configured in the Space's Secrets panel. Set
+`NEUROBRIDGE_DISABLE_LLM=1` only when you want to force the
+deterministic template path for a fully-reproducible demo.
 The README's YAML front-matter declares the Space metadata
 (SDK=docker, port=7860, app_file=src/frontend/app.py).

PROJECT_OVERVIEW.md CHANGED Viewed

@@ -120,7 +120,7 @@ Bir BBB tahmini yaptığında karar kartında 7 ayrı sinyal görüyorsun. Her b
 | 4 | **Calibration caption** | "≥75% güven üreten tahminler hold-out test'te %92 precision (n=18)" | Train sırasında 80/20 split, 6 threshold bin'i, lookup |
 | 5 | **Drift caption** | "trailing-100 confidence median +0.42σ from train (within range)" | Module-level `deque(maxlen=100)` + train-time median/std |
 | 6 | **Top SHAP attributions bar chart** | Hangi fingerprint bit'leri kararı pushladı? | `shap.TreeExplainer(model).shap_values(X)` exact, 3-branch dispatch (sklearn version compat) |
-| 7 | **AI Assistant rationale** (opsiyonel) | "Predicted **permeable** with 82% confidence. Top SHAP attributions toward this label..." | OpenRouter LLM (`llama-3.2-3b-instruct:free`) veya deterministik template, hybrid fallback |
 Bu yedi katman birlikte "Black-Box AI ≠ Trust" mit'ini çürütür: kara kutu değil, **camdan kutu**.
@@ -215,12 +215,19 @@ Streamlit, container içinde `httpx.post("http://127.0.0.1:8000/...")` ile FastA
 - **Biology-preserving:** Covariate'lere koşullu (age, sex, diagnosis), site bias'ı kaldırırken biological variance'ı tutar
 - **Alternatif neden değil:** Z-score normalization sadece location düzeltir, scale farkı kalır; CycleGAN denemeleri var ama train zamanı ve hyperparameter complexity hackathon'a uygun değil
-### 6.6 LLM Provider: OpenRouter (free tier llama-3.2-3b)
 - **Free tier:** Hackathon süresince ücret yok, jüri demosunda da maliyet riski yok
 - **OpenAI-compatible:** `openai==1.51.0` SDK doğrudan çalışır, custom HTTP client yok
-- **Multiple model fallback:** Free tier'da llama, gemini, qwen seçenekleri var
-- **Hybrid contract:** API key yoksa veya HTTP/network hatası varsa **deterministik template path**'e düşer — demo gününde **asla çökmez**
 - **Alternatif neden değil:** OpenAI direct API ücretli, Anthropic API key yoktu, lokal Ollama demo gününde 1B model indirme/load riski
 ### 6.7 Tracking: MLflow
@@ -326,7 +333,7 @@ Demo gününde her şey ters gidebilir. Üç env variable her felaket senaryoyu
 | `NEUROBRIDGE_DISABLE_MLFLOW=1` | MLflow lookup yapılmaz, provenance badge "—" gösterir, sistem çalışmaya devam eder |
 | `BBB_MODEL_PATH=...` | Default `data/processed/bbb_model.joblib` yerine farklı yol |
-HF Spaces deploy'ında ilk ikisi default `=1`. OpenRouter aktive etmek istersen Settings → Variables and Secrets → `OPENROUTER_API_KEY` ekle ve `NEUROBRIDGE_DISABLE_LLM=0`.
 ### 8.4 Drift detection
@@ -472,7 +479,7 @@ Z-score sadece **mean**'i sıfıra çeker. ComBat hem mean hem **scale**'i (vari
 Olabilirdi ama **Random Forest gibi tree ensemble**'larda SHAP TreeExplainer'ın **exact** çözümü var (Lundberg & Lee 2018). LIME local linear approximation, tree boundary'larında hatalı extrapolation yapabiliyor. Hackathon jürisi sayısal kesinliği seviyor.
 ### "Neden OpenRouter, neden ChatGPT API değil?"
-OpenAI API ücretli + key yok. OpenRouter free tier'da llama-3.2-3b ve gemini-flash sunuyor. **Hybrid template fallback** sayesinde key olmasa veya API ölse bile sistem çalışmaya devam eder — demo gününde kritik.
 ### "Neden Streamlit, neden React/Next.js değil?"
 Hackathon süresi 8 gün. Python-only stack 4 günü implement'a, 2 günü test'e, 1 günü polish'e, 1 günü deploy'a verdi. React öğrenirken hackathon biterdi. Streamlit'in "fast iteration" değer önerisi tam bizim için.
@@ -643,7 +650,7 @@ Yukarıdaki bölümlerde geçen teknik terimlerin sade Türkçe karşılıkları
 ### 16.5 LLM / Açıklanabilirlik
 - **LLM (Large Language Model):** GPT, Llama, Gemini, Claude gibi büyük dil modelleri.
-- **OpenRouter:** Birçok LLM provider'ı tek API arkasında toplayan servis. Free tier'da `llama-3.2-3b`, `gemini-flash`, `qwen` gibi seçenekler.
 - **API Key:** Bir servise erişim için kişisel jeton. Asla repo'ya commit edilmez (HF Spaces "Variables and Secrets"'a girilir).
 - **Rationale (Gerekçe):** Modelin tahminine dair doğal-dil açıklama (örn. "Predicted permeable with 82% confidence; SHAP attributions toward this label include bits 532 and 1024…").
 - **Source Label:** Cevabın hangi kaynaktan geldiğini gösteren etiket. Bizde `source: "llm"` veya `source: "template"` — auditability için.

 | 4 | **Calibration caption** | "≥75% güven üreten tahminler hold-out test'te %92 precision (n=18)" | Train sırasında 80/20 split, 6 threshold bin'i, lookup |
 | 5 | **Drift caption** | "trailing-100 confidence median +0.42σ from train (within range)" | Module-level `deque(maxlen=100)` + train-time median/std |
 | 6 | **Top SHAP attributions bar chart** | Hangi fingerprint bit'leri kararı pushladı? | `shap.TreeExplainer(model).shap_values(X)` exact, 3-branch dispatch (sklearn version compat) |
+| 7 | **AI Assistant rationale** (opsiyonel) | "Predicted **permeable** with 82% confidence. Top SHAP attributions toward this label..." | OpenRouter free-tier fallback chain (10 model, head: `inclusionai/ling-2.6-1t:free`) — `user_question` diline matchler; key yoksa / chain tükenmişse deterministik template |
 Bu yedi katman birlikte "Black-Box AI ≠ Trust" mit'ini çürütür: kara kutu değil, **camdan kutu**.
 - **Biology-preserving:** Covariate'lere koşullu (age, sex, diagnosis), site bias'ı kaldırırken biological variance'ı tutar
 - **Alternatif neden değil:** Z-score normalization sadece location düzeltir, scale farkı kalır; CycleGAN denemeleri var ama train zamanı ve hyperparameter complexity hackathon'a uygun değil
+### 6.6 LLM Provider: OpenRouter (free-tier fallback chain, 10 model)
 - **Free tier:** Hackathon süresince ücret yok, jüri demosunda da maliyet riski yok
 - **OpenAI-compatible:** `openai==1.51.0` SDK doğrudan çalışır, custom HTTP client yok
+- **Smartest → smallest fallback chain:** `src/llm/explainer.py` içindeki `_DEFAULT_FREE_MODEL_CHAIN` 10 free-tier id tutar (head: `inclusionai/ling-2.6-1t:free` ~1T flagship → tail: `poolside/laguna-xs.2:free`). Status-code classification:
+  - `429 / 402 / 403 / 404 / 5xx` → bir sonraki modele geç
+  - `400` → o modelin prompt-shape mismatch'i, sonrakine geç
+  - `401` → key bozuk, tüm chain için anlamsız, **template'e düş + actionable WARNING** (key rotate at https://openrouter.ai/keys)
+  - Network/timeout → switching models won't help → template
+- **Runtime override:** `OPENROUTER_FREE_MODELS="modelA,modelB"` env variable ile chain'i değiştirebilirsin (model availability OpenRouter'da haftalık değişiyor; verify with `python scripts/diagnose_openrouter.py`).
+- **Hybrid contract:** API key yoksa, kill-switch açıksa veya tüm chain tükenirse **deterministik template path**'e düşer — demo gününde **asla çökmez**
+- **Prompt design (intent-split):** Caller `user_question` verirse model'a "soru diliyle cevap ver, doğrudan soruyu cevapla, off-topic ise kısa-conversational kal" denir; vermezse default 2-4 cümle paper-style rationale fallback'i devreye girer.
+- **Diagnostics:** `GET /diag/openrouter` (FastAPI) key presence (length + 12-char prefix only), kill-switch state, chain head, ve 8-token probe sonucunu döner. Streamlit'te sidebar "🔧 Diagnose LLM" butonu olarak surface'lenir.
 - **Alternatif neden değil:** OpenAI direct API ücretli, Anthropic API key yoktu, lokal Ollama demo gününde 1B model indirme/load riski
 ### 6.7 Tracking: MLflow
 | `NEUROBRIDGE_DISABLE_MLFLOW=1` | MLflow lookup yapılmaz, provenance badge "—" gösterir, sistem çalışmaya devam eder |
 | `BBB_MODEL_PATH=...` | Default `data/processed/bbb_model.joblib` yerine farklı yol |
+HF Spaces deploy'ında **sadece** `NEUROBRIDGE_DISABLE_MLFLOW=1` set edilir (filesystem read-only edge case). LLM **default ON** — `Dockerfile` ve `Dockerfile.hf` artık `NEUROBRIDGE_DISABLE_LLM`'i hard-code etmiyor; deployed Space `OPENROUTER_API_KEY` Secret'ı varsa free-tier chain'i kullanır, yoksa template'e düşer. LLM'i jüri demosunda %100 deterministik istersen Space → Settings → Variables → `NEUROBRIDGE_DISABLE_LLM=1` ekle. LLM'in hangi state'te olduğunu canlı görmek için sidebar'daki "🔧 Diagnose LLM" butonu (`GET /diag/openrouter`'a vurur) key presence + chain head + 8-token probe döner.
 ### 8.4 Drift detection
 Olabilirdi ama **Random Forest gibi tree ensemble**'larda SHAP TreeExplainer'ın **exact** çözümü var (Lundberg & Lee 2018). LIME local linear approximation, tree boundary'larında hatalı extrapolation yapabiliyor. Hackathon jürisi sayısal kesinliği seviyor.
 ### "Neden OpenRouter, neden ChatGPT API değil?"
+OpenAI API ücretli + key yok. OpenRouter free tier'da 1T flagship'ten (`inclusionai/ling-2.6-1t:free`) 30B reasoning'e (`nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free`) kadar 10 model'lik bir **smartest → smallest fallback chain** sunuyor; biri 429 / 4xx / 5xx dönerse otomatik bir sonrakine geçeriz. **Hybrid template fallback** sayesinde key olmasa, chain tükenmiş olsa veya API ölse bile sistem çalışmaya devam eder — demo gününde kritik.
 ### "Neden Streamlit, neden React/Next.js değil?"
 Hackathon süresi 8 gün. Python-only stack 4 günü implement'a, 2 günü test'e, 1 günü polish'e, 1 günü deploy'a verdi. React öğrenirken hackathon biterdi. Streamlit'in "fast iteration" değer önerisi tam bizim için.
 ### 16.5 LLM / Açıklanabilirlik
 - **LLM (Large Language Model):** GPT, Llama, Gemini, Claude gibi büyük dil modelleri.
+- **OpenRouter:** Birçok LLM provider'ı tek API arkasında toplayan servis. Free tier'da `inclusionai/ling-2.6-1t`, `nvidia/nemotron-3-super-120b`, `google/gemma-4-31b`, `qwen3-next-80b` gibi seçenekler — biz smartest → smallest 10-model fallback chain (`_DEFAULT_FREE_MODEL_CHAIN`) ile her birini sırayla deniyoruz.
 - **API Key:** Bir servise erişim için kişisel jeton. Asla repo'ya commit edilmez (HF Spaces "Variables and Secrets"'a girilir).
 - **Rationale (Gerekçe):** Modelin tahminine dair doğal-dil açıklama (örn. "Predicted permeable with 82% confidence; SHAP attributions toward this label include bits 532 and 1024…").
 - **Source Label:** Cevabın hangi kaynaktan geldiğini gösteren etiket. Bizde `source: "llm"` veya `source: "template"` — auditability için.

README.md CHANGED Viewed

@@ -224,15 +224,22 @@ finishes in under 4 seconds on a 2024 laptop.
 - **Day-8 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-06-day8-grand-finale.md`](docs/superpowers/plans/2026-05-06-day8-grand-finale.md)
 - **New surfaces:** `POST /explain/eeg`, `POST /explain/mri`, `GET /experiments/runs`, `POST /experiments/diff`
 - **New deploy artifacts:** `Dockerfile.hf`, `supervisord.conf`
 ## Day 7 — Demo Recipe
 Pre-flight (one terminal):
 ```bash
-# Start API with deterministic explainer (no LLM key needed)
-NEUROBRIDGE_DISABLE_LLM=1 BBB_MODEL_PATH=data/processed/bbb_model.joblib \
   uvicorn src.api.main:app --port 8000
 ```
 Predict + explain (other terminal):
@@ -243,7 +250,10 @@ curl -s -X POST http://localhost:8000/predict/bbb \
   -H "Content-Type: application/json" \
   -d '{"smiles": "CCO", "top_k": 5}' | jq
-# 2) Explain — feed the predict response back as the explain payload
 curl -s -X POST http://localhost:8000/explain/bbb \
   -H "Content-Type: application/json" \
   -d '{
@@ -258,11 +268,13 @@ curl -s -X POST http://localhost:8000/explain/bbb \
     "drift_z": 0.42,
     "user_question": "Why permeable?"
   }' | jq
-# 3) Same call but with LLM enabled (set the key first)
-unset NEUROBRIDGE_DISABLE_LLM
-export OPENROUTER_API_KEY="sk-or-v1-…"
-# Repeat the curl above; expect "source": "llm" and a model name.
 ```
 Streamlit demo: `streamlit run src/frontend/app.py` → BBB tab → Predict → AI Assistant tab → ask a preset question.

 - **Day-8 plan (full TDD task breakdown):** [`docs/superpowers/plans/2026-05-06-day8-grand-finale.md`](docs/superpowers/plans/2026-05-06-day8-grand-finale.md)
 - **New surfaces:** `POST /explain/eeg`, `POST /explain/mri`, `GET /experiments/runs`, `POST /experiments/diff`
 - **New deploy artifacts:** `Dockerfile.hf`, `supervisord.conf`
+- **LLM hardening (post-Day 8):** real OpenRouter LLM is now the default in deployed Spaces — `Dockerfile`/`Dockerfile.hf` no longer hard-code `NEUROBRIDGE_DISABLE_LLM=1`. Free-tier fallback chain (10 models, smartest → smallest) in [`src/llm/explainer.py`](src/llm/explainer.py), 401/400 status classification, and language-matching / intent-split prompt. Diagnostic endpoint `GET /diag/openrouter` ([`src/api/main.py`](src/api/main.py)) + Streamlit sidebar "🔧 Diagnose LLM" button. Live verification helper: [`scripts/diagnose_openrouter.py`](scripts/diagnose_openrouter.py).
 ## Day 7 — Demo Recipe
 Pre-flight (one terminal):
 ```bash
+# Start API. With OPENROUTER_API_KEY set in your shell or .env,
+# /explain/* hits the real LLM via the free-tier fallback chain
+# (10 models, smartest → smallest — see AGENTS.md §11). Without
+# a key, falls back to the deterministic template.
+BBB_MODEL_PATH=data/processed/bbb_model.joblib \
   uvicorn src.api.main:app --port 8000
+# Force the deterministic template path (no network, fully reproducible):
+#   NEUROBRIDGE_DISABLE_LLM=1 BBB_MODEL_PATH=... uvicorn ...
 ```
 Predict + explain (other terminal):
   -H "Content-Type: application/json" \
   -d '{"smiles": "CCO", "top_k": 5}' | jq
+# 2) Explain — feed the predict response back as the explain payload.
+#    user_question drives the prompt: question language is mirrored
+#    (Turkish question → Turkish answer), and the model answers the
+#    question directly instead of returning a canned paper summary.
 curl -s -X POST http://localhost:8000/explain/bbb \
   -H "Content-Type: application/json" \
   -d '{
     "drift_z": 0.42,
     "user_question": "Why permeable?"
   }' | jq
+# With a valid key: expect "source": "llm" + a model id from the chain.
+# Without:          expect "source": "template" + "model": null.
+# 3) Diagnose OpenRouter reachability from inside the running API
+#    (key presence, chain head, 8-token probe). Surfaced in Streamlit
+#    as the sidebar "🔧 Diagnose LLM" button.
+curl -s http://localhost:8000/diag/openrouter | jq
 ```
 Streamlit demo: `streamlit run src/frontend/app.py` → BBB tab → Predict → AI Assistant tab → ask a preset question.