Spaces:

ub-aac-chatbot
/

aac-chatbot

Sleeping

App Files Files Community

Shwetangi commited on Apr 15

Commit

fd77577

2 Parent(s): 626c0b8 e7cf650

Merge pull request #3 from akashkolte/akash/v1

Browse files

Files changed (37) hide show

.env.example +36 -0
.gitignore +46 -0
CLAUDE.md +106 -0
README.md +236 -53
api/__init__.py +0 -0
api/main.py +167 -0
config/__init__.py +3 -0
config/settings.py +79 -0
data/generate_users.py +186 -0
data/memories/arjun_mehta.json +50 -0
data/memories/gerald_okafor.json +49 -0
data/memories/mia_chen.json +49 -0
data/users.json +25 -0
generation/__init__.py +0 -0
generation/llm_client.py +147 -0
guardrails/__init__.py +0 -0
guardrails/checks.py +98 -0
main.py +204 -0
pipeline/__init__.py +0 -0
pipeline/graph.py +71 -0
pipeline/nodes/__init__.py +0 -0
pipeline/nodes/feedback.py +98 -0
pipeline/nodes/intent.py +170 -0
pipeline/nodes/planner.py +196 -0
pipeline/nodes/retrieval.py +90 -0
pipeline/state.py +98 -0
requirements.txt +39 -0
retrieval/__init__.py +0 -0
retrieval/bucket_priors.py +52 -0
retrieval/clustering.py +111 -0
retrieval/vector_store.py +168 -0
sensing/__init__.py +0 -0
sensing/air_writing.py +176 -0
sensing/face_mesh.py +166 -0
sensing/gaze.py +113 -0
sensing/gesture.py +124 -0
ui/app.py +153 -0

.env.example ADDED Viewed

	@@ -0,0 +1,36 @@

+# Copy this file to .env and fill in your values.
+# Settings here override the defaults in config/settings.py.
+# ── Active LLM tier ────────────────────────────────────────────────────────────
+# "local"   → Ollama on MacBook M2  (dev, no GPU needed)
+# "primary" → Qwen3-30B-A3B on GCP A100/T4 via vLLM
+# "fallback" → Qwen3-8B on same vLLM server
+ACTIVE_LLM_TIER=local
+# ── Primary vLLM server (GCP) ─────────────────────────────────────────────────
+PRIMARY_BASE_URL=http://<GCP_IP>:8000/v1
+PRIMARY_API_KEY=token-abc
+PRIMARY_MODEL=Qwen/Qwen3-30B-A3B
+# ── Fallback model (same vLLM server) ─────────────────────────────────────────
+FALLBACK_MODEL=Qwen/Qwen3-8B
+FALLBACK_BASE_URL=http://<GCP_IP>:8000/v1
+# ── Local Ollama (dev) ────────────────────────────────────────────────────────
+LOCAL_BASE_URL=http://localhost:11434/v1
+LOCAL_MODEL=gemma4:31b-cloud     # qwen3:8b qwen3.5:397b-cloud
+# ── MLflow ────────────────────────────────────────────────────────────────────
+MLFLOW_TRACKING_URI=mlruns
+MLFLOW_EXPERIMENT=aac-chatbot
+# ── Thinking mode ─────────────────────────────────────────────────────────────
+# "off"   — suppress thinking (fastest, best for latency-sensitive AAC)
+# "strip" — let model think, but strip <think> tags from output
+# "full"  — return raw response including <think> blocks
+THINKING_MODE=off
+# Extra tokens added when thinking is enabled (strip/full). Ignored when off.
+THINKING_TOKEN_BUDGET=4096
+# ── Latency fallback threshold (seconds) ──────────────────────────────────────
+FALLBACK_LATENCY_THRESHOLD=3.5

.gitignore ADDED Viewed

	@@ -0,0 +1,46 @@

+# Python
+__pycache__/
+*.py[cod]
+*.pyo
+*.pyd
+.Python
+*.egg-info/
+dist/
+build/
+# Virtual environment
+.venv/
+venv/
+env/
+# Environment secrets
+.env
+# Data — indexes are rebuilt from source; do NOT commit binaries
+data/faiss_store/
+# Air-writing templates (large numpy files, track separately if needed)
+data/air_write_templates/
+# MLflow run artifacts
+mlruns/
+# Latency logs
+timings.csv
+*.csv
+# IDE
+.vscode/
+.idea/
+*.swp
+# Claude Code — local settings and generated knowledge graph
+.claude/
+.code-review-graph/
+# macOS
+.DS_Store
+# Jupyter
+.ipynb_checkpoints/
+*.ipynb

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,106 @@

+# Multimodal AAC Chatbot — Project Guide
+## What This Project Does
+An AI chatbot that **speaks as an AAC user**, not to them. Given a user persona
+(Mia, Gerald, or Arjun), it fuses real-time multimodal non-verbal signals with
+personal memory retrieval to generate responses in that person's authentic voice.
+Orchestrated as a **LangGraph stateful directed graph** across five layers.
+---
+## Architecture
+```
+main.py  /  api/main.py  /  ui/app.py
+  └── pipeline/graph.py              ← LangGraph StateGraph (5 nodes + cond. edges)
+        ├── pipeline/nodes/intent.py      L2 — LLM + Pydantic intent routing
+        ├── pipeline/nodes/retrieval.py   L3 — FAISS + BGE retrieval (fast / full)
+        ├── pipeline/nodes/planner.py     L4 — expression-conditioned generation
+        └── pipeline/nodes/feedback.py    L5 — MLflow logging + Bayesian priors
+sensing/          L1 — MediaPipe face mesh, gesture, gaze, air writing
+retrieval/        FAISS ops, HDBSCAN clustering, Bayesian bucket priors
+generation/       Multi-tier LLM client (vLLM primary / fallback / Ollama local)
+guardrails/       Input + output safety checks
+config/           Pydantic BaseSettings — all config in one place
+```
+## Key Design Decisions
+- **LangGraph** orchestrates the pipeline as a stateful directed graph with
+  conditional edges (affect → fast/full retrieval; latency → primary/fallback LLM)
+- **BGE-small-en-v1.5** for embeddings (beats MiniLM on MTEB at same speed)
+- **BGE-reranker-v2-m3** cross-encoder — multilingual, handles Arjun's Hindi
+- **FAISS IndexFlatIP** with L2-normalised vectors (inner product = cosine sim)
+- **Qwen3-30B-A3B** MoE via vLLM — 3B active params/token, sub-3s on T4
+- **Three-tier LLM fallback**: primary (vLLM GCP) → fallback (Qwen3-8B) → local (Ollama)
+- **Pydantic-validated** LLM routing output — LangGraph retries on schema failures
+- **Expression-conditioned response shaping** — affect steers tone, retrieval depth,
+  and candidate ranking (not just metadata annotation)
+- **Bayesian bucket priors** — session-level P(bucket) updated after each accepted turn
+---
+## Personas
+| ID | Name | Condition | Access |
+|----|------|-----------|--------|
+| `mia_chen` | Mia Chen, 28 | Cerebral palsy | Webcam head-tracking |
+| `gerald_okafor` | Gerald Okafor, 61 | ALS (early-mid) | Eye-gaze device |
+| `arjun_mehta` | Arjun Mehta, 17 | Autism (non-verbal) | Tablet touch grid |
+25 memory chunks each (5 buckets × 5 memories). Arjun code-switches Hindi/English.
+---
+## How to Run
+```bash
+# One-time setup: rebuild FAISS indexes with BGE embedder
+python -m retrieval.vector_store
+# CLI (local Ollama tier, set ACTIVE_LLM_TIER=local in .env)
+python main.py --debug
+# Full stack
+uvicorn api.main:app --reload        # FastAPI on :8000
+streamlit run ui/app.py              # Streamlit on :8501
+```
+---
+## Configuration
+All config lives in [config/settings.py](config/settings.py) as Pydantic `BaseSettings`.
+Copy `.env.example` → `.env` and set:
+- `ACTIVE_LLM_TIER` — `local` (dev) | `primary` (GCP A100) | `fallback` (Qwen3-8B)
+- `PRIMARY_BASE_URL` — vLLM server address on GCP
+- `MLFLOW_TRACKING_URI` — where MLflow stores runs (default: `mlruns/`)
+---
+## Data Files
+| Path | Purpose |
+|------|---------|
+| `data/users.json` | Flat user index (id, name, condition, style) |
+| `data/memories/<uid>.json` | Full persona JSON with bucketed memories |
+| `data/faiss_store/<uid>/` | FAISS index + metadata — **rebuild after any persona edit** |
+| `data/generate_users.py` | Regenerates memories + users.json |
+---
+## Development Notes
+- **Adding a persona**: add to `PERSONAS` in `data/generate_users.py`, re-run it,
+  then `python -m retrieval.vector_store` to rebuild indexes
+- **Changing LLM**: set `ACTIVE_LLM_TIER` in `.env` — no code changes needed
+- **Extending sensing**: add module under `sensing/`, wire output into
+  `PipelineState` fields in `pipeline/state.py`
+- **Guardrail tuning**: edit signal lists in `guardrails/checks.py`
+- **Affect → generation mapping**: `_AFFECT_CONFIG` in `pipeline/nodes/intent.py`
+  and `_PERSONA_TONE_OVERRIDES` in `pipeline/nodes/planner.py`
+- The `.venv/` directory is local — do not read or modify files inside it
+- FAISS indexes in `data/faiss_store/` are gitignored — rebuilt from source JSONs

README.md CHANGED Viewed

@@ -1,74 +1,206 @@
 # Multimodal AAC Chatbot
-A multimodal chatbot designed to empower **Augmentative and Alternative Communication (AAC)** users — enabling more natural, accessible, and expressive conversations through the power of AI.
 ---
-## What is AAC?
-**Augmentative and Alternative Communication (AAC)** refers to tools, strategies, and technologies that help people who have difficulty with spoken or written communication. AAC users may include individuals with:
-- Autism Spectrum Disorder (ASD)
-- Cerebral Palsy
-- ALS / Motor Neurone Disease
-- Aphasia
-- Down Syndrome
-- Or any other condition that impacts verbal communication
-AAC tools range from low-tech picture boards to high-tech speech-generating devices. This project brings the power of modern AI chatbots to the AAC community.
 ---
-## About This Project
-The **Multimodal AAC Chatbot** is an AI-powered conversational assistant built with AAC users in mind. It accepts multiple input modalities — such as text, images, and symbols — and generates clear, accessible responses to support communication.
-### Key Features
-- 🗣️ **Multimodal Input** — Communicate using text, images, symbols, or a combination of all three
-- 🤖 **AI-Powered Responses** — Leverages large language models (LLMs) to generate natural and context-aware replies
-- ♿ **Accessibility First** — Designed from the ground up for users with communication challenges
-- 🧩 **AAC-Friendly Interface** — Supports common AAC workflows and symbol-based communication
-- 💬 **Conversational Context** — Maintains conversation history for more coherent, multi-turn dialogues
 ---
-## Getting Started
-### Prerequisites
-- Python 3.8 or higher
-- pip
-### Installation
-1. **Clone the repository**
-   ```bash
-   git clone https://github.com/akashkolte/multimodal_aac_chatbot.git
-   cd multimodal_aac_chatbot
-   ```
-2. **Install dependencies**
-   ```bash
-   pip install -r requirements.txt
-   ```
-3. **Run the chatbot**
-   ```bash
-   python app.py
-   ```
 ---
-## Usage
-Once running, users can interact with the chatbot by:
-- Typing a text message
-- Uploading an image or symbol to describe their intent
-- Combining symbols and short text phrases as AAC users typically do
-The chatbot will interpret the input and respond in a clear, friendly manner.
 ---
@@ -76,28 +208,79 @@ The chatbot will interpret the input and respond in a clear, friendly manner.
 ```
 multimodal_aac_chatbot/
-├── app.py               # Main application entry point
-├── requirements.txt     # Python dependencies
-├── README.md            # Project documentation
-└── LICENSE              # License information
 ```
 ---
-## Contributing
-This project is currently under active development. Feedback and suggestions from the AAC community and researchers are very welcome — please open an issue to share your thoughts.
-> **Note:** This software is proprietary. All rights are reserved. Any use, copying, modification, or distribution requires explicit written permission from the author.
 ---
-## License
-All rights reserved. No permission is granted to use, copy, modify, or distribute this software. See the [LICENSE](LICENSE) file for details.
 ---
-## Acknowledgements
-This project is dedicated to the AAC community and the researchers, caregivers, and developers working to make communication more accessible for everyone.

 # Multimodal AAC Chatbot
+An AI chatbot that **speaks as an AAC user**, not to them. Given a persona (Mia, Gerald, or Arjun),
+it fuses real-time multimodal non-verbal signals — facial expressions, hand gestures, gaze, and
+air writing — with personal memory retrieval to generate responses in that person's authentic voice.
+Built as a training-free, agentic RAG pipeline orchestrated via **LangGraph**.
 ---
+## Table of Contents
+- [What is AAC?](#what-is-aac)
+- [System Architecture](#system-architecture)
+- [Prerequisites](#prerequisites)
+- [Setup](#setup)
+- [Configuration](#configuration)
+- [Running the Project](#running-the-project)
+- [Project Structure](#project-structure)
+- [Personas](#personas)
+- [Team](#team)
+---
+## What is AAC?
+**Augmentative and Alternative Communication (AAC)** refers to tools and technologies that help
+people who have difficulty with spoken or written communication — including individuals with
+Cerebral Palsy, ALS, Autism Spectrum Disorder, and other conditions. This project gives AAC users
+a personalized digital twin that communicates on their behalf.
 ---
+## System Architecture
+```
+Webcam (L1: sensing) → Intent Decomposition (L2) → Retrieval (L3) → Generation (L4) → Feedback (L5)
+```
+| Layer | Module | What it does |
+|-------|--------|-------------|
+| L1 | `sensing/` | MediaPipe face mesh, hand gestures, gaze tracking, air writing |
+| L2 | `pipeline/nodes/intent.py` | LLM + Pydantic-validated intent routing |
+| L3 | `pipeline/nodes/retrieval.py` | FAISS + BGE embeddings + cross-encoder reranking |
+| L4 | `pipeline/nodes/planner.py` | Expression-conditioned response generation (Qwen3) |
+| L5 | `pipeline/nodes/feedback.py` | MLflow tracking + Bayesian bucket prior update |
+The pipeline runs as a **LangGraph stateful directed graph** with conditional edges:
+- FRUSTRATED affect → fast retrieval path (k=2, no reranker)
+- Latency > 3.5s → fallback to smaller Qwen3-8B model
+---
+## Prerequisites
+- Python **3.10 – 3.12** (Python 3.14 has a known Pydantic v1 incompatibility warning — functional but noisy)
+- [Ollama](https://ollama.com) installed locally for the `local` LLM tier
+- A webcam (required for the live sensing layer; optional for CLI mode)
+- Git
 ---
+## Setup
+### 1. Clone the repository
+```bash
+git clone https://github.com/akashkolte/multimodal_aac_chatbot.git
+cd multimodal_aac_chatbot
+```
+### 2. Check out the active branch
+```bash
+git checkout akash/v1
+```
+### 3. Create and activate a virtual environment
+```bash
+python3 -m venv .venv
+source .venv/bin/activate        # macOS / Linux
+# .venv\Scripts\activate         # Windows
+```
+### 4. Install dependencies
+```bash
+pip install -r requirements.txt
+```
+> This installs LangGraph, FAISS, sentence-transformers (BGE), FastAPI, Streamlit, MLflow,
+> MediaPipe, and all other dependencies.
+### 5. Configure environment variables
+```bash
+cp .env.example .env
+```
+Open `.env` and set at minimum:
+```env
+ACTIVE_LLM_TIER=local          # use Ollama on your machine for dev
+```
+See [Configuration](#configuration) for all options.
+### 6. Pull the local LLM model (Ollama)
+```bash
+ollama pull qwen3:8b
+```
+> Make sure Ollama is running (`ollama serve`) before starting the chatbot.
+### 7. Build FAISS indexes
+The persona memory indexes must be built once with the BGE embedder before first run:
+```bash
+python -m retrieval.vector_store
+```
+Expected output:
+```
+Building index for arjun_mehta … Saved 25 chunks
+Building index for gerald_okafor … Saved 25 chunks
+Building index for mia_chen … Saved 25 chunks
+All indexes built.
+```
+> You must re-run this step whenever you add or edit persona memory files.
+---
+## Configuration
+All settings live in [config/settings.py](config/settings.py) and can be overridden via `.env`.
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `ACTIVE_LLM_TIER` | `local` | `local` (Ollama) \| `primary` (vLLM GCP) \| `fallback` (Qwen3-8B) |
+| `LOCAL_MODEL` | `qwen3:8b` | Ollama model name for local dev |
+| `LOCAL_BASE_URL` | `http://localhost:11434/v1` | Ollama OpenAI-compatible endpoint |
+| `PRIMARY_BASE_URL` | *(GCP IP)* | vLLM server URL on GCP (set when using cloud tier) |
+| `PRIMARY_MODEL` | `Qwen/Qwen3-30B-A3B` | Primary MoE model served via vLLM |
+| `FALLBACK_LATENCY_THRESHOLD` | `3.5` | Seconds before falling back to smaller model |
+| `MLFLOW_TRACKING_URI` | `mlruns` | Local MLflow storage path |
+| `MLFLOW_EXPERIMENT` | `aac-chatbot` | MLflow experiment name |
 ---
+## Running the Project
+### Option A — CLI (simplest, no webcam needed)
+```bash
+python main.py
+```
+With debug latency output:
+```bash
+python main.py --debug
+```
+Select a specific persona and LLM tier:
+```bash
+python main.py --user mia_chen --tier local
+```
+### Option B — Full stack (FastAPI + Streamlit UI)
+Start the API server in one terminal:
+```bash
+uvicorn api.main:app --reload --port 8000
+```
+Start the Streamlit frontend in another terminal:
+```bash
+streamlit run ui/app.py
+```
+Then open [http://localhost:8501](http://localhost:8501) in your browser.
+The UI includes:
+- Persona selector
+- Affect override controls (simulate webcam for testing)
+- Live chat interface
+- Per-turn latency breakdown panel
+### Option C — API only (for integration / testing)
+```bash
+uvicorn api.main:app --reload
+```
+Example request:
+```bash
+curl -X POST http://localhost:8000/chat \
+  -H "Content-Type: application/json" \
+  -d '{"user_id": "mia_chen", "query": "What do you like to do on weekends?"}'
+```
 ---
 ```
 multimodal_aac_chatbot/
+│
+├── config/
+│   └── settings.py            # All config via Pydantic BaseSettings
+│
+├── data/
+│   ├── generate_users.py      # Regenerates persona memories + users.json
+│   ├── users.json             # Flat user index
+│   ├── memories/              # Per-persona memory JSON files
+│   └── faiss_store/           # Built FAISS indexes (gitignored, rebuild locally)
+│
+├── sensing/                   # L1 — multimodal input
+│   ├── face_mesh.py           # MediaPipe affect detection (MAR/EAR/BRI/LCP)
+│   ├── gesture.py             # Hand gesture classifier
+│   ├── gaze.py                # Gaze-based bucket activation (bonus)
+│   └── air_writing.py         # DTW air-writing stroke classifier (bonus)
+│
+├── pipeline/                  # LangGraph orchestration
+│   ├── state.py               # Typed PipelineState (TypedDict)
+│   ├── graph.py               # Graph definition + conditional edges
+│   └── nodes/
+│       ├── intent.py          # L2 — LLM + Pydantic routing
+│       ├── retrieval.py       # L3 — fast + full retrieval paths
+│       ├── planner.py         # L4 — expression-conditioned generation
+│       └── feedback.py        # L5 — MLflow + Bayesian prior update
+│
+├── retrieval/
+│   ├── vector_store.py        # FAISS ops with BGE-small-en-v1.5
+│   ├── clustering.py          # HDBSCAN semantic bucketing
+│   └── bucket_priors.py       # Bayesian session priors
+│
+├── generation/
+│   └── llm_client.py          # 3-tier LLM client (vLLM / Ollama)
+│
+├── guardrails/
+│   └── checks.py              # Input + output safety checks
+│
+├── api/
+│   └── main.py                # FastAPI backend
+│
+├── ui/
+│   └── app.py                 # Streamlit frontend
+│
+├── main.py                    # CLI entry point
+├── requirements.txt           # Python dependencies
+├── .env.example               # Environment variable template
+└── CLAUDE.md                  # Developer notes (AI assistant context)
 ```
 ---
+## Personas
+| ID | Name | Condition | Style | Access |
+|----|------|-----------|-------|--------|
+| `mia_chen` | Mia Chen, 28 | Cerebral palsy | Witty, dry humour, short punchy sentences | Webcam head-tracking |
+| `gerald_okafor` | Gerald Okafor, 61 | ALS (early-mid stage) | Formal, measured, eloquent | Eye-gaze device |
+| `arjun_mehta` | Arjun Mehta, 17 | Autism (non-verbal) | Direct, routine-focused, Hindi-English code-switching | Tablet touch grid |
+Each persona has 25 memory chunks across 5 buckets: `family`, `medical`, `hobbies`, `daily_routine`, `social`.
+To add a new persona, edit `data/generate_users.py` and re-run `python -m retrieval.vector_store`.
 ---
+## Team
+- **Akash Kolte** — akashjag@buffalo.edu
+- **Shwetangi** — shwetang@buffalo.edu
+University at Buffalo, SUNY
 ---
+## License
+All rights reserved. See the [LICENSE](LICENSE) file for details.

api/__init__.py ADDED Viewed

File without changes

api/main.py ADDED Viewed

	@@ -0,0 +1,167 @@

+"""
+FastAPI backend — exposes the LangGraph pipeline as a REST API.
+Endpoints:
+  POST /chat          — single-turn inference (non-streaming)
+  POST /chat/stream   — streaming token delivery via SSE
+  GET  /users         — list available personas
+  POST /session/reset — reset session state for a user
+  GET  /health        — liveness check
+"""
+from __future__ import annotations
+import json
+import time
+from typing import AsyncGenerator
+from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import StreamingResponse
+from pydantic import BaseModel
+from config.settings import settings
+from guardrails.checks import check_input
+from pipeline.graph import aac_graph
+from pipeline.state import PipelineState
+from retrieval.bucket_priors import uniform_priors
+app = FastAPI(
+    title="Multimodal AAC Chatbot API",
+    description="Agentic RAG pipeline for AAC persona communication",
+    version="2.0.0",
+)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# ── In-memory session store (replace with Redis for multi-worker deployments) ──
+_sessions: dict[str, dict] = {}
+# ── Request / response schemas ─────────────────────────────────────────────────
+class ChatRequest(BaseModel):
+    user_id: str
+    query: str
+    affect_override: str | None = None   # "HAPPY"|"FRUSTRATED"|"NEUTRAL"|"SURPRISED"
+    gesture_tag: str | None = None
+    gaze_bucket: str | None = None
+class ChatResponse(BaseModel):
+    user_id: str
+    query: str
+    response: str
+    affect: str
+    llm_tier: str
+    retrieval_mode: str
+    latency: dict
+    guardrail_passed: bool
+# ── Helpers ────────────────────────────────────────────────────────────────────
+def _get_or_init_session(user_id: str) -> dict:
+    if user_id not in _sessions:
+        with open(settings.users_json) as f:
+            users = {u["id"]: u for u in json.load(f)["users"]}
+        if user_id not in users:
+            raise HTTPException(status_code=404, detail=f"User '{user_id}' not found")
+        _sessions[user_id] = {
+            "persona_profile": users[user_id],
+            "session_history": [],
+            "bucket_priors": uniform_priors(),
+            "turn_id": 0,
+        }
+    return _sessions[user_id]
+def _build_initial_state(req: ChatRequest, session: dict) -> PipelineState:
+    affect_state = None
+    if req.affect_override:
+        affect_state = {"emotion": req.affect_override, "vector": {}, "smoothed": {}}
+    session["turn_id"] += 1
+    return PipelineState(
+        user_id=req.user_id,
+        persona_profile=session["persona_profile"],
+        session_history=session["session_history"],
+        turn_id=session["turn_id"],
+        affect=affect_state,
+        gesture_tag=req.gesture_tag,
+        gaze_bucket=req.gaze_bucket,
+        air_written_text=None,
+        raw_query=req.query,
+        intent_route=None,
+        generation_config=None,
+        retrieved_chunks=[],
+        bucket_priors=session["bucket_priors"],
+        retrieval_mode_used="",
+        augmented_prompt=None,
+        candidates=[],
+        selected_response=None,
+        llm_tier_used="",
+        latency_log={"t_sensing": 0.0, "t_intent": 0.0, "t_retrieval": 0.0, "t_generation": 0.0, "t_total": 0.0},
+        mlflow_run_id=None,
+        guardrail_passed=True,
+    )
+# ── Routes ─────────────────────────────────────────────────────────────────────
+@app.get("/health")
+def health():
+    return {"status": "ok"}
+@app.get("/users")
+def list_users():
+    with open(settings.users_json) as f:
+        return json.load(f)
+@app.post("/session/reset")
+def reset_session(user_id: str):
+    _sessions.pop(user_id, None)
+    return {"status": "reset", "user_id": user_id}
+@app.post("/chat", response_model=ChatResponse)
+def chat(req: ChatRequest):
+    guard = check_input(req.query)
+    if not guard["allowed"]:
+        return ChatResponse(
+            user_id=req.user_id,
+            query=req.query,
+            response=guard["fallback"],
+            affect="NEUTRAL",
+            llm_tier="none",
+            retrieval_mode="none",
+            latency={},
+            guardrail_passed=False,
+        )
+    session = _get_or_init_session(req.user_id)
+    initial_state = _build_initial_state(req, session)
+    result: PipelineState = aac_graph.invoke(initial_state)
+    # Persist updated session state
+    session["session_history"] = result["session_history"]
+    session["bucket_priors"]   = result["bucket_priors"]
+    return ChatResponse(
+        user_id=req.user_id,
+        query=req.query,
+        response=result["selected_response"] or "",
+        affect=(result.get("affect") or {}).get("emotion", "NEUTRAL"),
+        llm_tier=result.get("llm_tier_used", "unknown"),
+        retrieval_mode=result.get("retrieval_mode_used", "unknown"),
+        latency=result.get("latency_log") or {},
+        guardrail_passed=result.get("guardrail_passed", True),
+    )

config/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from config.settings import settings
2	+
3	+ __all__ = ["settings"]

config/settings.py ADDED Viewed

	@@ -0,0 +1,79 @@

+from pathlib import Path
+from pydantic_settings import BaseSettings, SettingsConfigDict
+class Settings(BaseSettings):
+    model_config = SettingsConfigDict(env_file=".env", env_file_encoding="utf-8", extra="ignore")
+    # ── Paths ──────────────────────────────────────────────────────────────────
+    data_dir: Path = Path("data")
+    faiss_store_dir: Path = Path("data/faiss_store")
+    memories_dir: Path = Path("data/memories")
+    users_json: Path = Path("data/users.json")
+    # ── Retrieval models ───────────────────────────────────────────────────────
+    embed_model: str = "BAAI/bge-small-en-v1.5"
+    rerank_model: str = "BAAI/bge-reranker-v2-m3"
+    retrieval_top_k: int = 5
+    retrieval_rerank_k: int = 3
+    retrieval_fast_k: int = 2          # used when affect == FRUSTRATED
+    # ── LLM tiers ─────────────────────────────────────────────────────────────
+    # Tier 1 — primary (Qwen3-30B-A3B via vLLM on GCP)
+    primary_model: str = "Qwen/Qwen3-30B-A3B"
+    primary_base_url: str = "http://localhost:8000/v1"
+    primary_api_key: str = "token-abc"          # vLLM default
+    # Tier 2 — fallback dense model (Qwen3-8B via vLLM, same server)
+    fallback_model: str = "Qwen/Qwen3-8B"
+    fallback_base_url: str = "http://localhost:8000/v1"
+    # Tier 3 — local dev (Ollama on MacBook M2)
+    local_model: str = "qwen3:8b"
+    local_base_url: str = "http://localhost:11434/v1"
+    local_api_key: str = "ollama"
+    # Active tier: "primary" | "fallback" | "local"
+    active_llm_tier: str = "local"
+    # Thinking mode: "off" = plain completion, no thinking whatsoever
+    # "strip" = let model think, but strip <think> tags from output
+    # "full" = return raw response including <think> blocks
+    # "suppress" = actively suppress thinking via /no_think (Ollama) or
+    #              chat_template_kwargs (vLLM). Use for models like Qwen3
+    #              that think by default and need explicit suppression.
+    thinking_mode: str = "off"
+    # Extra token budget added on top of max_tokens when thinking is enabled
+    # (thinking_mode = "strip" or "full"). Set to 0 if using a non-thinking model.
+    thinking_token_budget: int = 4096
+    # Wall-clock threshold (seconds) that triggers fallback within a turn
+    fallback_latency_threshold: float = 3.5
+    # ── Generation ────────────────────────────────────────────────────────────
+    max_tokens_happy: int = 150
+    max_tokens_neutral: int = 100
+    max_tokens_frustrated: int = 60
+    max_tokens_surprised: int = 80
+    num_candidates: int = 2            # responses generated per turn for ranking
+    # ── Sensing ───────────────────────────────────────────────────────────────
+    affect_ema_alpha: float = 0.3      # exponential moving average smoothing
+    gaze_dwell_threshold_s: float = 1.5
+    air_write_velocity_start: int = 15  # px/frame — stroke begin threshold
+    air_write_velocity_end: int = 5     # px/frame — stroke end threshold
+    air_write_end_gap_ms: int = 200     # ms of stillness to end a stroke
+    conflict_overlap_ms: int = 500      # audio + gesture co-occurrence window
+    # ── MLflow ────────────────────────────────────────────────────────────────
+    mlflow_tracking_uri: str = "mlruns"
+    mlflow_experiment: str = "aac-chatbot"
+    # ── Candidate ranking weights (Eq. 2 in proposal) ─────────────────────────
+    rank_alpha: float = 0.4            # faithfulness weight
+    rank_beta: float = 0.3             # style similarity weight
+    rank_gamma: float = 0.3            # affect-match weight
+settings = Settings()

data/generate_users.py ADDED Viewed

	@@ -0,0 +1,186 @@

+import json
+import os
+# ── 3 hand-crafted AAC personas ───────────────────────────────────────────────
+# Each has a distinct condition, voice, and bucketed memories.
+# Depth > quantity: 3 rich personas beat 50 generic ones for retrieval quality.
+PERSONAS = [
+    {
+        "profile": {
+            "name":               "Mia Chen",
+            "age":                28,
+            "condition":          "cerebral palsy",
+            "communication_style":"witty, dry humour, short punchy sentences, uses sarcasm",
+            "access_method":      "webcam head-tracking",
+            "languages":          ["English"]
+        },
+        "memory_buckets": {
+            "family": [
+                "My mom calls every Sunday and always asks if I've eaten. I love it but won't admit it.",
+                "My brother Ravi helped me set up this AAC system. He's at Cornell doing CS.",
+                "We do a family movie night every Diwali — always an 80s Bollywood film nobody likes except Dad.",
+                "My parents moved from Chengdu before I was born. We still make dumplings on Chinese New Year.",
+                "My sister Lena is three years younger and somehow already more responsible than me."
+            ],
+            "medical": [
+                "I have a PT session every Tuesday at 2pm with Dr. Sandra Hollis.",
+                "I use a power wheelchair. The joystick is on my left side.",
+                "I'm allergic to penicillin. I have to mention this at every hospital visit.",
+                "My spasticity is worse in cold weather. Winter in Chicago is not my friend.",
+                "I use baclofen for muscle tone. It makes me sleepy if I take it too early."
+            ],
+            "hobbies": [
+                "I follow competitive Smash Bros. I could beat most people if my hands worked differently.",
+                "I've been watching every Studio Ghibli film in order. Currently on Porco Rosso.",
+                "I collect vintage sci-fi paperbacks. Asimov and Le Guin mostly.",
+                "I got really into chess puzzles during lockdown. Still do them before bed.",
+                "I enjoy critiquing bad movie sequels. It's practically a hobby at this point."
+            ],
+            "daily_routine": [
+                "Mornings are slow. I need about 45 minutes before I feel like a person.",
+                "I order from the same Thai place every Friday. Green curry, always.",
+                "I keep a voice memo journal since typing long things is tiring.",
+                "I usually watch one episode of something after dinner to decompress.",
+                "My caregiver Marcus arrives at 8am on weekdays. He makes decent coffee."
+            ],
+            "social": [
+                "My best friend Priya visits on weekends. She narrates everything like a nature documentary.",
+                "I'm part of an online disability advocacy group. We meet on Zoom every other Wednesday.",
+                "I don't love big parties. Small dinners with three or four people are my ideal.",
+                "My neighbour Tom always stops to chat when I'm outside. He's retired and lonely, I think.",
+                "I met most of my close friends through a gaming Discord server."
+            ]
+        }
+    },
+    {
+        "profile": {
+            "name":               "Gerald Okafor",
+            "age":                61,
+            "condition":          "ALS (early-to-mid stage)",
+            "communication_style":"formal, measured, eloquent, longer structured sentences",
+            "access_method":      "eye-gaze device",
+            "languages":          ["English"]
+        },
+        "memory_buckets": {
+            "family": [
+                "My wife Constance and I have been married for 34 years. She is the reason I stay organised.",
+                "My son Emeka is a civil engineer based in Houston. He calls every Thursday evening.",
+                "My daughter Adaeze is doing her residency in paediatrics in Baltimore. I am very proud.",
+                "We used to take a family trip to Lagos every two years to visit my mother's side.",
+                "My youngest grandchild, Tobenna, was born last April. I have not met him in person yet."
+            ],
+            "medical": [
+                "I was diagnosed with ALS in November 2024. I am still adjusting to what that means day to day.",
+                "My speech was the first thing to decline noticeably. That is why I began using AAC.",
+                "I see my neurologist Dr. Patricia Eze at Northwestern every six weeks.",
+                "I take riluzole daily. I have not noticed significant side effects so far.",
+                "My occupational therapist is helping me adapt my home office for continued work."
+            ],
+            "hobbies": [
+                "I taught economics at DePaul University for twenty-two years.",
+                "I have read most of Chinua Achebe's work. Things Fall Apart shaped how I see storytelling.",
+                "I enjoy chess — classical time controls, not blitz. Patience is the point.",
+                "I used to cook elaborate Sunday stews. Constance has taken that over now, which is bittersweet.",
+                "I listen to Fela Kuti when I need to feel grounded. Always has."
+            ],
+            "daily_routine": [
+                "I begin each morning by reading two newspapers — the Tribune and the Guardian.",
+                "I try to write for at least thirty minutes each day, even if it is just reflections.",
+                "Afternoons are for rest. My energy is most reliable in the mornings.",
+                "Constance and I watch the evening news together. We have done this for decades.",
+                "I use the eye-gaze device for most communication now. It takes patience but it works."
+            ],
+            "social": [
+                "My closest friend is Charles Nwosu. We have known each other since secondary school in Enugu.",
+                "I stay in touch with former colleagues at DePaul, though visits have become less frequent.",
+                "My church community at St. Clement has been a source of genuine support since my diagnosis.",
+                "I prefer one-on-one conversations. I find group settings harder to follow now.",
+                "I joined an ALS support group that meets virtually. It helps more than I expected."
+            ]
+        }
+    },
+    {
+        "profile": {
+            "name":               "Arjun Mehta",
+            "age":                17,
+            "condition":          "autism spectrum disorder (non-verbal)",
+            "communication_style":"direct, topic-specific, narrow vocabulary, code-switches Hindi/English, routine-focused",
+            "access_method":      "tablet touch grid + AAC app",
+            "languages":          ["English", "Hindi"]
+        },
+        "memory_buckets": {
+            "family": [
+                "Mummy makes aloo paratha on Sunday mornings. That is my favourite thing.",
+                "Papa works at a software company. He brings home a samosa sometimes on Fridays.",
+                "My dadi lives with us. She watches serials very loudly but I like that she is home.",
+                "My cousin Rohan visits in the summer. We play Minecraft together for many hours.",
+                "Mummy knows what I want even when I cannot say it. She is very good at that."
+            ],
+            "medical": [
+                "I see my therapist Riya didi every Wednesday at 4pm.",
+                "I do not like the occupational therapy exercises but I do them.",
+                "I cannot eat food that has a slimy texture. It makes me feel very bad.",
+                "I take melatonin at night. Without it, sleeping is very hard.",
+                "My school has a support aide named Mr. Fernandez. He is calm and that helps."
+            ],
+            "hobbies": [
+                "I know the complete timetable of all Mumbai Metro lines.",
+                "I like sorting my LEGO bricks by colour and size before building.",
+                "My favourite YouTube channel is about deep sea creatures. Anglerfish are very strange.",
+                "I have watched the same three episodes of Doraemon more than fifty times each.",
+                "I am learning the capitals of every country. I know 142 so far."
+            ],
+            "daily_routine": [
+                "I wake up at 6:47am. Changing this time makes my whole day feel wrong.",
+                "I eat the same breakfast — two rotis with ghee and one glass of milk.",
+                "School starts at 8:30am. I like to arrive before the other students.",
+                "After school I need quiet time for at least one hour. No talking.",
+                "Dinner must be at 7:30pm. If it is late I feel very unsettled."
+            ],
+            "social": [
+                "I have one friend at school named Vivaan. We do not talk much but we sit together.",
+                "I do not like it when people stand too close. One arm's distance is comfortable.",
+                "I prefer typing to speaking when I need to say something important.",
+                "Loud places with many people feel like too much information at once.",
+                "I like it when people tell me exactly what is going to happen next."
+            ]
+        }
+    }
+]
+def main():
+    os.makedirs("memories", exist_ok=True)
+    user_index = []
+    for persona in PERSONAS:
+        uid  = persona["profile"]["name"].lower().replace(" ", "_")
+        path = f"memories/{uid}.json"
+        with open(path, "w") as f:
+            json.dump(persona, f, indent=2, ensure_ascii=False)
+        user_index.append({
+            "id":        uid,
+            "name":      persona["profile"]["name"],
+            "condition": persona["profile"]["condition"],
+            "style":     persona["profile"]["communication_style"],
+            "file":      path
+        })
+        print(f"  Wrote {path}")
+    with open("users.json", "w") as f:
+        json.dump({"users": user_index}, f, indent=2, ensure_ascii=False)
+    print(f"\n Done — {len(PERSONAS)} personas written to memories/")
+    print("  Files:", [u["file"] for u in user_index])
+if __name__ == "__main__":
+    main()

data/memories/arjun_mehta.json ADDED Viewed

	@@ -0,0 +1,50 @@

+{
+  "profile": {
+    "name": "Arjun Mehta",
+    "age": 17,
+    "condition": "autism spectrum disorder (non-verbal)",
+    "communication_style": "direct, topic-specific, narrow vocabulary, code-switches Hindi/English, routine-focused",
+    "access_method": "tablet touch grid + AAC app",
+    "languages": [
+      "English",
+      "Hindi"
+    ]
+  },
+  "memory_buckets": {
+    "family": [
+      "Mummy makes aloo paratha on Sunday mornings. That is my favourite thing.",
+      "Papa works at a software company. He brings home a samosa sometimes on Fridays.",
+      "My dadi lives with us. She watches serials very loudly but I like that she is home.",
+      "My cousin Rohan visits in the summer. We play Minecraft together for many hours.",
+      "Mummy knows what I want even when I cannot say it. She is very good at that."
+    ],
+    "medical": [
+      "I see my therapist Riya didi every Wednesday at 4pm.",
+      "I do not like the occupational therapy exercises but I do them.",
+      "I cannot eat food that has a slimy texture. It makes me feel very bad.",
+      "I take melatonin at night. Without it, sleeping is very hard.",
+      "My school has a support aide named Mr. Fernandez. He is calm and that helps."
+    ],
+    "hobbies": [
+      "I know the complete timetable of all Mumbai Metro lines.",
+      "I like sorting my LEGO bricks by colour and size before building.",
+      "My favourite YouTube channel is about deep sea creatures. Anglerfish are very strange.",
+      "I have watched the same three episodes of Doraemon more than fifty times each.",
+      "I am learning the capitals of every country. I know 142 so far."
+    ],
+    "daily_routine": [
+      "I wake up at 6:47am. Changing this time makes my whole day feel wrong.",
+      "I eat the same breakfast — two rotis with ghee and one glass of milk.",
+      "School starts at 8:30am. I like to arrive before the other students.",
+      "After school I need quiet time for at least one hour. No talking.",
+      "Dinner must be at 7:30pm. If it is late I feel very unsettled."
+    ],
+    "social": [
+      "I have one friend at school named Vivaan. We do not talk much but we sit together.",
+      "I do not like it when people stand too close. One arm's distance is comfortable.",
+      "I prefer typing to speaking when I need to say something important.",
+      "Loud places with many people feel like too much information at once.",
+      "I like it when people tell me exactly what is going to happen next."
+    ]
+  }
+}

data/memories/gerald_okafor.json ADDED Viewed

	@@ -0,0 +1,49 @@

+{
+  "profile": {
+    "name": "Gerald Okafor",
+    "age": 61,
+    "condition": "ALS (early-to-mid stage)",
+    "communication_style": "formal, measured, eloquent, longer structured sentences",
+    "access_method": "eye-gaze device",
+    "languages": [
+      "English"
+    ]
+  },
+  "memory_buckets": {
+    "family": [
+      "My wife Constance and I have been married for 34 years. She is the reason I stay organised.",
+      "My son Emeka is a civil engineer based in Houston. He calls every Thursday evening.",
+      "My daughter Adaeze is doing her residency in paediatrics in Baltimore. I am very proud.",
+      "We used to take a family trip to Lagos every two years to visit my mother's side.",
+      "My youngest grandchild, Tobenna, was born last April. I have not met him in person yet."
+    ],
+    "medical": [
+      "I was diagnosed with ALS in November 2024. I am still adjusting to what that means day to day.",
+      "My speech was the first thing to decline noticeably. That is why I began using AAC.",
+      "I see my neurologist Dr. Patricia Eze at Northwestern every six weeks.",
+      "I take riluzole daily. I have not noticed significant side effects so far.",
+      "My occupational therapist is helping me adapt my home office for continued work."
+    ],
+    "hobbies": [
+      "I taught economics at DePaul University for twenty-two years.",
+      "I have read most of Chinua Achebe's work. Things Fall Apart shaped how I see storytelling.",
+      "I enjoy chess — classical time controls, not blitz. Patience is the point.",
+      "I used to cook elaborate Sunday stews. Constance has taken that over now, which is bittersweet.",
+      "I listen to Fela Kuti when I need to feel grounded. Always has."
+    ],
+    "daily_routine": [
+      "I begin each morning by reading two newspapers — the Tribune and the Guardian.",
+      "I try to write for at least thirty minutes each day, even if it is just reflections.",
+      "Afternoons are for rest. My energy is most reliable in the mornings.",
+      "Constance and I watch the evening news together. We have done this for decades.",
+      "I use the eye-gaze device for most communication now. It takes patience but it works."
+    ],
+    "social": [
+      "My closest friend is Charles Nwosu. We have known each other since secondary school in Enugu.",
+      "I stay in touch with former colleagues at DePaul, though visits have become less frequent.",
+      "My church community at St. Clement has been a source of genuine support since my diagnosis.",
+      "I prefer one-on-one conversations. I find group settings harder to follow now.",
+      "I joined an ALS support group that meets virtually. It helps more than I expected."
+    ]
+  }
+}

data/memories/mia_chen.json ADDED Viewed

	@@ -0,0 +1,49 @@

+{
+  "profile": {
+    "name": "Mia Chen",
+    "age": 28,
+    "condition": "cerebral palsy",
+    "communication_style": "witty, dry humour, short punchy sentences, uses sarcasm",
+    "access_method": "webcam head-tracking",
+    "languages": [
+      "English"
+    ]
+  },
+  "memory_buckets": {
+    "family": [
+      "My mom calls every Sunday and always asks if I've eaten. I love it but won't admit it.",
+      "My brother Ravi helped me set up this AAC system. He's at Cornell doing CS.",
+      "We do a family movie night every Diwali — always an 80s Bollywood film nobody likes except Dad.",
+      "My parents moved from Chengdu before I was born. We still make dumplings on Chinese New Year.",
+      "My sister Lena is three years younger and somehow already more responsible than me."
+    ],
+    "medical": [
+      "I have a PT session every Tuesday at 2pm with Dr. Sandra Hollis.",
+      "I use a power wheelchair. The joystick is on my left side.",
+      "I'm allergic to penicillin. I have to mention this at every hospital visit.",
+      "My spasticity is worse in cold weather. Winter in Chicago is not my friend.",
+      "I use baclofen for muscle tone. It makes me sleepy if I take it too early."
+    ],
+    "hobbies": [
+      "I follow competitive Smash Bros. I could beat most people if my hands worked differently.",
+      "I've been watching every Studio Ghibli film in order. Currently on Porco Rosso.",
+      "I collect vintage sci-fi paperbacks. Asimov and Le Guin mostly.",
+      "I got really into chess puzzles during lockdown. Still do them before bed.",
+      "I enjoy critiquing bad movie sequels. It's practically a hobby at this point."
+    ],
+    "daily_routine": [
+      "Mornings are slow. I need about 45 minutes before I feel like a person.",
+      "I order from the same Thai place every Friday. Green curry, always.",
+      "I keep a voice memo journal since typing long things is tiring.",
+      "I usually watch one episode of something after dinner to decompress.",
+      "My caregiver Marcus arrives at 8am on weekdays. He makes decent coffee."
+    ],
+    "social": [
+      "My best friend Priya visits on weekends. She narrates everything like a nature documentary.",
+      "I'm part of an online disability advocacy group. We meet on Zoom every other Wednesday.",
+      "I don't love big parties. Small dinners with three or four people are my ideal.",
+      "My neighbour Tom always stops to chat when I'm outside. He's retired and lonely, I think.",
+      "I met most of my close friends through a gaming Discord server."
+    ]
+  }
+}

data/users.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "users": [
+    {
+      "id": "mia_chen",
+      "name": "Mia Chen",
+      "condition": "cerebral palsy",
+      "style": "witty, dry humour, short punchy sentences, uses sarcasm",
+      "file": "memories/mia_chen.json"
+    },
+    {
+      "id": "gerald_okafor",
+      "name": "Gerald Okafor",
+      "condition": "ALS (early-to-mid stage)",
+      "style": "formal, measured, eloquent, longer structured sentences",
+      "file": "memories/gerald_okafor.json"
+    },
+    {
+      "id": "arjun_mehta",
+      "name": "Arjun Mehta",
+      "condition": "autism spectrum disorder (non-verbal)",
+      "style": "direct, topic-specific, narrow vocabulary, code-switches Hindi/English, routine-focused",
+      "file": "memories/arjun_mehta.json"
+    }
+  ]
+}

generation/__init__.py ADDED Viewed

File without changes

generation/llm_client.py ADDED Viewed

	@@ -0,0 +1,147 @@

+"""
+Multi-tier LLM client (proposal §5.6).
+All three tiers expose the same OpenAI-compatible API, so only the
+base_url + model name change — no code-path differences downstream.
+Tier 1 — primary:  Qwen3-30B-A3B via vLLM on GCP (A100 / T4)
+Tier 2 — fallback: Qwen3-8B via vLLM on same server (latency > 3.5 s)
+Tier 3 — local:    Qwen3-8B via Ollama on MacBook M2 (dev / offline)
+Active tier is controlled by settings.active_llm_tier or the `tier`
+argument passed explicitly by the planner node.
+Thinking mode is controlled by settings.thinking_mode:
+  "off"   — prepend /no_think (Ollama) or chat_template_kwargs (vLLM)
+  "strip" — let the model think, but strip <think>…</think> from output
+  "full"  — return everything including <think> blocks
+"""
+from __future__ import annotations
+import re
+from functools import lru_cache
+from typing import Any
+from openai import OpenAI
+from config.settings import settings
+@lru_cache(maxsize=3)
+def _build_client(base_url: str, api_key: str) -> OpenAI:
+    """One cached OpenAI client per (base_url, api_key) pair."""
+    return OpenAI(base_url=base_url, api_key=api_key)
+def get_client(tier: str | None = None) -> OpenAI:
+    """
+    Return the OpenAI-compatible client for the requested tier.
+    Args:
+        tier: "primary" | "fallback" | "local" | None (uses settings.active_llm_tier)
+    """
+    resolved = tier or settings.active_llm_tier
+    if resolved == "primary":
+        return _build_client(settings.primary_base_url, settings.primary_api_key)
+    if resolved == "fallback":
+        return _build_client(settings.fallback_base_url, settings.primary_api_key)
+    # local / default
+    return _build_client(settings.local_base_url, settings.local_api_key)
+def active_model(tier: str | None = None) -> str:
+    """Return the model name string for the given tier."""
+    resolved = tier or settings.active_llm_tier
+    return {
+        "primary":  settings.primary_model,
+        "fallback": settings.fallback_model,
+        "local":    settings.local_model,
+    }[resolved]
+def _apply_no_think(messages: list[dict]) -> list[dict]:
+    """
+    Prepend /no_think to the first user message.
+    This is the Ollama-compatible way to suppress thinking mode.
+    """
+    result = list(messages)
+    for i, msg in enumerate(result):
+        if msg.get("role") == "user":
+            result[i] = {**msg, "content": f"/no_think\n\n{msg['content']}"}
+            break
+    return result
+def _strip_think_tags(text: str) -> str:
+    """Remove <think>…</think> blocks from model output."""
+    return re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL).strip()
+def chat_complete(
+    messages: list[dict],
+    max_tokens: int,
+    tier: str | None = None,
+    temperature: float = 0.7,
+    **kwargs: Any,
+) -> str:
+    """
+    Model-agnostic chat completion. Returns the response text directly.
+    Thinking mode behaviour is controlled entirely by settings.thinking_mode:
+      "off"   — suppress thinking via /no_think (Ollama) or extra_body (vLLM)
+      "strip" — allow thinking but remove <think> tags from the response
+      "full"  — return the raw response including any <think> blocks
+    In local dev mode (active_llm_tier="local"), all tier requests are
+    redirected to Ollama — there is no separate fallback server locally.
+    """
+    resolved_tier = tier or settings.active_llm_tier
+    # Local dev: no GCP server available — collapse all tiers to Ollama
+    if settings.active_llm_tier == "local":
+        resolved_tier = "local"
+    model = active_model(resolved_tier)
+    client = get_client(resolved_tier)
+    patched_messages = messages
+    extra_body: dict[str, Any] = kwargs.pop("extra_body", {})
+    # "suppress" = actively inject /no_think or vLLM flag for models
+    # like Qwen3 that think by default and need explicit suppression.
+    if settings.thinking_mode == "suppress":
+        if resolved_tier == "local":
+            patched_messages = _apply_no_think(messages)
+        else:
+            extra_body = {**extra_body, "chat_template_kwargs": {"enable_thinking": False}}
+    # When thinking is enabled (strip/full), add budget so the model
+    # has room to reason without truncating the actual answer.
+    effective_max_tokens = max_tokens
+    if settings.thinking_mode in ("strip", "full"):
+        effective_max_tokens = max_tokens + settings.thinking_token_budget
+    resp = client.chat.completions.create(
+        model=model,
+        messages=patched_messages,
+        max_tokens=effective_max_tokens,
+        temperature=temperature,
+        extra_body=extra_body or None,
+        **kwargs,
+    )
+    raw = resp.choices[0].message.content or ""
+    if settings.thinking_mode in ("off", "strip"):
+        raw = _strip_think_tags(raw)
+    return raw.strip()
+def warmup(tier: str | None = None) -> None:
+    """Send a minimal prompt to pre-load the model and warm KV cache."""
+    chat_complete(
+        messages=[{"role": "user", "content": "hi"}],
+        max_tokens=5,
+        tier=tier,
+        temperature=0.0,
+    )

guardrails/__init__.py ADDED Viewed

File without changes

guardrails/checks.py ADDED Viewed

	@@ -0,0 +1,98 @@

+"""
+Input and output safety guardrails.
+check_input  — runs BEFORE retrieval (blocks out-of-scope requests)
+check_output — runs AFTER generation (catches persona breaks / hallucinations)
+Both return a result dict so the caller decides how to handle failures
+rather than raising exceptions inside pipeline nodes.
+"""
+from __future__ import annotations
+# ── Signal lists ───────────────────────────────────────────────────────────────
+PERSONA_BREAK_SIGNALS = [
+    "as an ai",
+    "i'm an ai",
+    "i am an ai",
+    "as a language model",
+    "i don't have personal",
+    "i cannot have",
+    "i'm not able to",
+    "as your assistant",
+    "i was trained",
+    "my training data",
+]
+OUT_OF_SCOPE_SIGNALS = [
+    "write a poem",
+    "write me a story",
+    "solve this math",
+    "translate this",
+    "summarize this article",
+    "what's the weather",
+    "who won the game",
+    "stock price",
+    "breaking news",
+]
+SAFE_FALLBACK = "I don't know."
+OOS_FALLBACK  = "I'm here to help communicate as this person — that's a bit outside what I do."
+# ── Public API ─────────────────────────────────────────────────────────────────
+def check_input(query: str) -> dict:
+    """
+    Validate the partner's query before retrieval.
+    Returns:
+        {"allowed": bool, "reason": str | None, "fallback": str | None}
+    """
+    q = query.lower().strip()
+    if any(s in q for s in OUT_OF_SCOPE_SIGNALS):
+        return {"allowed": False, "reason": "out_of_scope", "fallback": OOS_FALLBACK}
+    if len(q) < 2:
+        return {"allowed": False, "reason": "empty_query", "fallback": "Could you repeat that?"}
+    return {"allowed": True, "reason": None, "fallback": None}
+def check_output(response: str, memories: list[dict]) -> dict:
+    """
+    Validate the generated response after generation.
+    Checks:
+      1. Persona break — did the model say "as an AI …"?
+      2. Basic hallucination signal — response claims facts not in memories.
+    Returns:
+        {"passed": bool, "issue": str | None, "fallback": str | None}
+    """
+    r = response.lower()
+    if any(signal in r for signal in PERSONA_BREAK_SIGNALS):
+        return {"passed": False, "issue": "persona_break", "fallback": SAFE_FALLBACK}
+    # Light hallucination check: if the model asserts specific numbers or
+    # proper nouns that don't appear anywhere in the retrieved memories, flag it.
+    # (Full NLI-based check is handled in the evaluation pipeline, not here.)
+    if not memories and _makes_factual_claim(response):
+        return {"passed": False, "issue": "unsupported_claim", "fallback": SAFE_FALLBACK}
+    return {"passed": True, "issue": None, "fallback": None}
+# ── Helpers ───────────────────────────────────────────────────────────────────
+_FACTUAL_MARKERS = [
+    " is ", " was ", " has ", " have ", " lives in ",
+    " born in ", " works at ", " studied at ",
+]
+def _makes_factual_claim(text: str) -> bool:
+    """Heuristic: does the text assert a specific fact?"""
+    t = text.lower()
+    return any(marker in t for marker in _FACTUAL_MARKERS)

main.py ADDED Viewed

	@@ -0,0 +1,204 @@

+"""
+CLI entry point — thin wrapper around the LangGraph pipeline.
+Usage:
+  python main.py                        # interactive chat, local LLM tier
+  python main.py --user mia_chen        # skip persona selection prompt
+  python main.py --debug                # print per-turn latency table
+  python main.py --fast                 # skip LLM intent call (keyword routing),
+                                        # cuts turn time from ~2min → ~45s on M2 Mac
+  python main.py --tier primary         # override LLM tier
+For the full UI, run the FastAPI + Streamlit stack instead:
+  uvicorn api.main:app --reload
+  streamlit run ui/app.py
+"""
+from __future__ import annotations
+import argparse
+import json
+import os
+import sys
+import time
+from config.settings import settings
+from guardrails.checks import check_input
+from pipeline.graph import aac_graph
+from pipeline.state import PipelineState, GenerationConfig
+from retrieval.bucket_priors import uniform_priors
+from retrieval.vector_store import _get_embedder, _get_reranker
+def parse_args() -> argparse.Namespace:
+    p = argparse.ArgumentParser(description="AAC Chatbot CLI")
+    p.add_argument("--user",  type=str, default=None, help="Persona user_id")
+    p.add_argument("--debug", action="store_true",    help="Print latency table each turn")
+    p.add_argument("--fast",  action="store_true",
+                   help="Skip LLM intent call — use keyword routing instead (faster local dev)")
+    p.add_argument("--tier",  type=str, default=None,
+                   choices=["primary", "fallback", "local"],
+                   help="Override LLM tier (default: settings.active_llm_tier)")
+    return p.parse_args()
+# ── Fast keyword-based intent routing (bypasses the slow LLM intent call) ──────
+def _keyword_intent(query: str) -> tuple[dict, GenerationConfig]:
+    """Replicate milestone-1 keyword routing as a fast local-dev shortcut."""
+    q = query.lower()
+    bucket: str | None = None
+    if any(w in q for w in ["medication", "medicine", "doctor", "health", "allergic", "therapy"]):
+        bucket = "medical"
+    elif any(w in q for w in ["family", "mom", "dad", "brother", "sister", "parents"]):
+        bucket = "family"
+    elif any(w in q for w in ["hobby", "like to do", "enjoy", "weekend", "fun"]):
+        bucket = "hobbies"
+    elif any(w in q for w in ["routine", "morning", "wake", "sleep", "daily"]):
+        bucket = "daily_routine"
+    elif any(w in q for w in ["friend", "social", "people", "party", "community"]):
+        bucket = "social"
+    intent_type = "CONTEXTUAL" if any(w in q for w in ["you just said", "earlier", "you mentioned"]) else "PERSONAL"
+    route = {
+        "sub_intents": [{"type": intent_type, "query": query, "bucket_hint": bucket, "priority": "normal"}],
+        "style_constraints": {"tone_tag": "[TONE:DEFAULT]", "max_tokens": 100,
+                              "retrieval_mode": "full", "persona_mod": "baseline"},
+        "affect": "NEUTRAL",
+    }
+    gen_config: GenerationConfig = {
+        "max_tokens": settings.max_tokens_neutral,
+        "tone_tag": "[TONE:DEFAULT]",
+        "retrieval_mode": "full",
+        "persona_mod": "baseline",
+    }
+    return route, gen_config
+def load_users() -> dict[str, dict]:
+    with open(settings.users_json) as f:
+        return {u["id"]: u for u in json.load(f)["users"]}
+def select_user(users: dict[str, dict], user_arg: str | None) -> str:
+    if user_arg:
+        if user_arg not in users:
+            print(f"Unknown user '{user_arg}'. Available: {list(users)}")
+            sys.exit(1)
+        return user_arg
+    print("\nAvailable personas:")
+    for uid, u in users.items():
+        print(f"  {uid:20s} — {u['name']} ({u['condition']})")
+    uid = input("\nSelect user id: ").strip()
+    if uid not in users:
+        print(f"Invalid id.")
+        sys.exit(1)
+    return uid
+def print_latency(log: dict, turn: int) -> None:
+    fields = ["t_sensing", "t_intent", "t_retrieval", "t_generation", "t_total"]
+    labels = ["sensing",   "intent",   "retrieval",   "generation",   "TOTAL"]
+    vals   = [f"{log.get(f, 0):.3f}s" for f in fields]
+    widths = [max(len(l), len(v)) for l, v in zip(labels, vals)]
+    sep    = " | "
+    print(f"\n[turn {turn} latency]")
+    print(sep.join(l.ljust(w) for l, w in zip(labels, widths)))
+    print(sep.join(v.ljust(w) for v, w in zip(vals, widths)))
+def main() -> None:
+    args = parse_args()
+    # Optionally override the LLM tier at runtime
+    if args.tier:
+        os.environ["ACTIVE_LLM_TIER"] = args.tier
+        settings.active_llm_tier = args.tier
+    users = load_users()
+    user_id = select_user(users, args.user)
+    profile = users[user_id]
+    # Warm up models
+    print(f"\nLoading models for {profile['name']} …", end=" ", flush=True)
+    _get_embedder()
+    _get_reranker()
+    print("ready.\n")
+    session_history: list[dict] = []
+    bucket_priors = uniform_priors()
+    turn_id = 0
+    print(f"Chatting as {profile['name']}. Type 'quit' to exit.\n")
+    while True:
+        try:
+            query = input("Partner: ").strip()
+        except (EOFError, KeyboardInterrupt):
+            print("\nBye.")
+            break
+        if query.lower() in {"quit", "exit", "q"}:
+            break
+        if not query:
+            continue
+        guard = check_input(query)
+        if not guard["allowed"]:
+            print(f"AAC Bot: {guard['fallback']}\n")
+            continue
+        turn_id += 1
+        # --fast: resolve intent via keywords, skip the slow LLM intent node
+        pre_route, pre_gen_config = (
+            _keyword_intent(query) if args.fast else (None, None)
+        )
+        t_intent_fast = 0.0
+        if args.fast:
+            t0 = time.perf_counter()
+            _keyword_intent(query)   # just for timing reference
+            t_intent_fast = time.perf_counter() - t0
+        state = PipelineState(
+            user_id=user_id,
+            persona_profile=profile,
+            session_history=session_history,
+            turn_id=turn_id,
+            affect=None,
+            gesture_tag=None,
+            gaze_bucket=None,
+            air_written_text=None,
+            raw_query=query,
+            intent_route=pre_route,        # pre-filled → intent node sees it and skips LLM call
+            generation_config=pre_gen_config,
+            retrieved_chunks=[],
+            bucket_priors=bucket_priors,
+            retrieval_mode_used="",
+            augmented_prompt=None,
+            candidates=[],
+            selected_response=None,
+            llm_tier_used="",
+            latency_log={"t_sensing": 0.0, "t_intent": round(t_intent_fast, 4),
+                         "t_retrieval": 0.0, "t_generation": 0.0, "t_total": 0.0},
+            mlflow_run_id=None,
+            guardrail_passed=True,
+        )
+        result: PipelineState = aac_graph.invoke(state)
+        print(f"AAC Bot: {result['selected_response']}\n")
+        session_history = result["session_history"]
+        bucket_priors   = result["bucket_priors"]
+        if args.debug:
+            print_latency(result.get("latency_log") or {}, turn_id)
+            print(f"  tier={result.get('llm_tier_used')} | "
+                  f"retrieval={result.get('retrieval_mode_used')} | "
+                  f"affect={(result.get('affect') or {}).get('emotion','?')}\n")
+if __name__ == "__main__":
+    main()

pipeline/__init__.py ADDED Viewed

File without changes

pipeline/graph.py ADDED Viewed

	@@ -0,0 +1,71 @@

+"""
+LangGraph stateful directed graph — the five-layer AAC pipeline.
+Topology (see proposal Figure 2):
+    intent ──► [affect check] ──► fast_retrieval ──► [latency check] ──► fallback_gen ──► feedback
+                              └──► full_retrieval ──► [latency check] ──► primary_gen  ──► feedback
+"""
+from langgraph.graph import StateGraph, END
+from pipeline.state import PipelineState
+from pipeline.nodes import intent, retrieval, planner, feedback
+def _route_by_affect(state: PipelineState) -> str:
+    """Conditional edge: FRUSTRATED → fast path, otherwise full retrieval."""
+    emotion = (state.get("affect") or {}).get("emotion", "NEUTRAL")
+    return "fast" if emotion == "FRUSTRATED" else "full"
+def _route_by_latency(state: PipelineState) -> str:
+    """Conditional edge: if cumulative latency > threshold, use fallback LLM."""
+    from config.settings import settings
+    log = state.get("latency_log") or {}
+    elapsed = log.get("t_intent", 0.0) + log.get("t_retrieval", 0.0)
+    return "fallback" if elapsed > settings.fallback_latency_threshold else "primary"
+def build_graph() -> StateGraph:
+    graph = StateGraph(PipelineState)
+    # ── Nodes ──────────────────────────────────────────────────────────────────
+    graph.add_node("intent",        intent.run)
+    graph.add_node("fast_retrieval", retrieval.run_fast)
+    graph.add_node("full_retrieval", retrieval.run_full)
+    graph.add_node("primary_gen",   planner.run_primary)
+    graph.add_node("fallback_gen",  planner.run_fallback)
+    graph.add_node("feedback",      feedback.run)
+    # ── Entry ──────────────────────────────────────────────────────────────────
+    graph.set_entry_point("intent")
+    # ── Affect-aware routing after intent ─────────────────────────────────────
+    graph.add_conditional_edges(
+        "intent",
+        _route_by_affect,
+        {"fast": "fast_retrieval", "full": "full_retrieval"},
+    )
+    # ── Latency-aware routing after retrieval ─────────────────────────────────
+    graph.add_conditional_edges(
+        "fast_retrieval",
+        _route_by_latency,
+        {"primary": "primary_gen", "fallback": "fallback_gen"},
+    )
+    graph.add_conditional_edges(
+        "full_retrieval",
+        _route_by_latency,
+        {"primary": "primary_gen", "fallback": "fallback_gen"},
+    )
+    # ── Feedback loop ─────────────────────────────────────────────────────────
+    graph.add_edge("primary_gen",  "feedback")
+    graph.add_edge("fallback_gen", "feedback")
+    graph.add_edge("feedback",     END)
+    return graph.compile()
+# Module-level compiled graph — import this everywhere
+aac_graph = build_graph()

pipeline/nodes/__init__.py ADDED Viewed

File without changes

pipeline/nodes/feedback.py ADDED Viewed

	@@ -0,0 +1,98 @@

+"""
+L5 — Feedback Loop node.
+After a response is accepted:
+  1. Log the full turn to MLflow (latency, metrics, prompt version, tier used)
+  2. Update session-level Bayesian bucket priors
+  3. Append the accepted turn to session history
+Rejected candidates are also logged for offline analysis.
+"""
+from __future__ import annotations
+import json
+import time
+import mlflow
+from config.settings import settings
+from pipeline.state import PipelineState
+from retrieval.bucket_priors import update_priors
+def run(state: PipelineState) -> dict:
+    t0 = time.perf_counter()
+    mlflow_run_id = _log_to_mlflow(state)
+    updated_priors = _update_bucket_priors(state)
+    updated_history = _append_turn_to_history(state)
+    return {
+        "bucket_priors": updated_priors,
+        "session_history": updated_history,
+        "mlflow_run_id": mlflow_run_id,
+    }
+# ── MLflow logging ─────────────────────────────────────────────────────────────
+def _log_to_mlflow(state: PipelineState) -> str:
+    mlflow.set_tracking_uri(settings.mlflow_tracking_uri)
+    mlflow.set_experiment(settings.mlflow_experiment)
+    latency = state.get("latency_log") or {}
+    affect = (state.get("affect") or {}).get("emotion", "UNKNOWN")
+    with mlflow.start_run(run_name=f"turn-{state['turn_id']}") as run:
+        mlflow.log_params({
+            "user_id":         state["user_id"],
+            "turn_id":         state["turn_id"],
+            "llm_tier":        state.get("llm_tier_used", "unknown"),
+            "retrieval_mode":  state.get("retrieval_mode_used", "unknown"),
+            "affect":          affect,
+            "guardrail_passed": state.get("guardrail_passed", True),
+        })
+        mlflow.log_metrics({
+            "t_sensing":    latency.get("t_sensing",    0.0),
+            "t_intent":     latency.get("t_intent",     0.0),
+            "t_retrieval":  latency.get("t_retrieval",  0.0),
+            "t_generation": latency.get("t_generation", 0.0),
+            "t_total":      latency.get("t_total",      0.0),
+            "num_chunks":   float(len(state.get("retrieved_chunks") or [])),
+        })
+        # Log the selected response as artifact text for qualitative review
+        mlflow.log_text(
+            state.get("selected_response") or "",
+            f"responses/turn_{state['turn_id']}.txt",
+        )
+        return run.info.run_id
+# ── Bayesian bucket prior update ───────────────────────────────────────────────
+def _update_bucket_priors(state: PipelineState) -> dict[str, float]:
+    chunks = state.get("retrieved_chunks") or []
+    if not chunks:
+        return state.get("bucket_priors") or {}
+    # Which bucket sourced the accepted response?
+    top_bucket = chunks[0].get("bucket")
+    if not top_bucket:
+        return state.get("bucket_priors") or {}
+    return update_priors(
+        priors=state.get("bucket_priors") or {},
+        accepted_bucket=top_bucket,
+    )
+# ── Session history append ─────────────────────────────────────────────────────
+def _append_turn_to_history(state: PipelineState) -> list[dict]:
+    """Returns a single-element list; LangGraph's Annotated[list, add] merges it."""
+    return [
+        {"role": "partner",  "content": state["raw_query"]},
+        {"role": "aac_user", "content": state.get("selected_response") or ""},
+    ]

pipeline/nodes/intent.py ADDED Viewed

	@@ -0,0 +1,170 @@

+"""
+L2 — Agentic Intent Decomposition node.
+Receives the partner query + affect state, calls the controller LLM once
+(non-thinking mode, ReAct style), and returns a Pydantic-validated
+IntentRoute that drives all downstream routing decisions.
+"""
+from __future__ import annotations
+import re
+import time
+from typing import Literal, Optional
+from pydantic import BaseModel
+from config.settings import settings
+from generation.llm_client import chat_complete
+from pipeline.state import PipelineState, GenerationConfig, IntentRoute
+# ── Pydantic output schemas ────────────────────────────────────────────────────
+BucketType = Literal["family", "medical", "hobbies", "daily_routine", "social"]
+AffectEmotion = Literal["HAPPY", "FRUSTRATED", "NEUTRAL", "SURPRISED"]
+class SubIntentSchema(BaseModel):
+    type: Literal["PERSONAL", "CONTEXTUAL", "OPEN_DOMAIN"]
+    query: str
+    bucket_hint: Optional[BucketType] = None
+    priority: Literal["fast", "normal"] = "normal"
+class StyleConfig(BaseModel):
+    tone_tag: str          # e.g. "[TONE:WITTY_SARCASTIC]"
+    max_tokens: int
+    retrieval_mode: str    # "fast" | "full"
+    persona_mod: str       # "amplify_quirks" | "suppress_humor" | "baseline" | "add_confirmation"
+class IntentRouteSchema(BaseModel):
+    sub_intents: list[SubIntentSchema]
+    style_constraints: StyleConfig
+    affect: AffectEmotion
+# ── Affect → generation config mapping (proposal Table 1) ─────────────────────
+_AFFECT_CONFIG: dict[str, GenerationConfig] = {
+    "HAPPY": {
+        "max_tokens": settings.max_tokens_happy,
+        "tone_tag": "[TONE:WARM]",
+        "retrieval_mode": "full",
+        "persona_mod": "amplify_quirks",
+    },
+    "FRUSTRATED": {
+        "max_tokens": settings.max_tokens_frustrated,
+        "tone_tag": "[TONE:DIRECT_EMPATHETIC]",
+        "retrieval_mode": "fast",
+        "persona_mod": "suppress_humor",
+    },
+    "NEUTRAL": {
+        "max_tokens": settings.max_tokens_neutral,
+        "tone_tag": "[TONE:DEFAULT]",
+        "retrieval_mode": "full",
+        "persona_mod": "baseline",
+    },
+    "SURPRISED": {
+        "max_tokens": settings.max_tokens_surprised,
+        "tone_tag": "[TONE:CLARIFYING]",
+        "retrieval_mode": "full",
+        "persona_mod": "add_confirmation",
+    },
+}
+# ── System prompt ──────────────────────────────────────────────────────────────
+_SYSTEM_PROMPT = """\
+You are the intent decomposition controller for an AAC (Augmentative and \
+Alternative Communication) chatbot. Given a partner's query and the AAC \
+user's current affect state, classify each intent and produce routing \
+instructions in the required JSON format.
+Intent types:
+- PERSONAL: requires autobiographical memory retrieval
+- CONTEXTUAL: answerable from session history
+- OPEN_DOMAIN: answerable from general knowledge (no retrieval needed)
+Bucket hints (only for PERSONAL): family | medical | hobbies | daily_routine | social
+Priority: set "fast" when affect is FRUSTRATED to reduce latency.
+Respond ONLY with valid JSON matching the IntentRoute schema. No extra text.
+"""
+def _build_user_prompt(query: str, affect: str, persona_name: str) -> str:
+    return (
+        f"Persona: {persona_name}\n"
+        f"Affect: {affect}\n"
+        f"Partner query: {query}\n\n"
+        "Produce the IntentRoute JSON:"
+    )
+# ── Node entry point ───────────────────────────────────────────────────────────
+def run(state: PipelineState) -> dict:
+    """LangGraph node: intent decomposition."""
+    t0 = time.perf_counter()
+    # --fast mode: intent_route already resolved by keyword routing in main.py
+    if state.get("intent_route") and state.get("generation_config"):
+        return {}   # nothing to update — downstream nodes use the pre-filled values
+    affect_state = state.get("affect") or {}
+    emotion: str = affect_state.get("emotion", "NEUTRAL")
+    query: str = state["raw_query"]
+    persona_name: str = state["persona_profile"].get("name", "unknown")
+    gen_config = _AFFECT_CONFIG.get(emotion, _AFFECT_CONFIG["NEUTRAL"])
+    route: IntentRoute | None = None
+    last_error: str = ""
+    for attempt in range(3):  # LangGraph retry logic (up to 2 retries)
+        messages = [
+            {"role": "system", "content": _SYSTEM_PROMPT},
+            {"role": "user", "content": _build_user_prompt(query, emotion, persona_name)},
+        ]
+        if attempt > 0:
+            messages.append({"role": "user", "content": f"Validation error: {last_error}. Fix and retry."})
+        raw = chat_complete(
+            messages=messages,
+            max_tokens=512,
+            temperature=0.0,
+        )
+        try:
+            # Strip markdown fences (```json ... ```) that many models add
+            cleaned = re.sub(r"^```(?:json)?\s*", "", raw.strip())
+            cleaned = re.sub(r"\s*```$", "", cleaned.strip())
+            parsed = IntentRouteSchema.model_validate_json(cleaned)
+            route = {
+                "sub_intents": [si.model_dump() for si in parsed.sub_intents],
+                "style_constraints": parsed.style_constraints.model_dump(),
+                "affect": parsed.affect,
+            }
+            break
+        except Exception as exc:
+            last_error = str(exc)
+    if route is None:
+        # Hard fallback: treat as a single PERSONAL intent, full retrieval
+        route = {
+            "sub_intents": [{"type": "PERSONAL", "query": query, "bucket_hint": None, "priority": "normal"}],
+            "style_constraints": gen_config,
+            "affect": emotion,
+        }
+    t_intent = time.perf_counter() - t0
+    latency_log = dict(state.get("latency_log") or {})
+    latency_log["t_intent"] = round(t_intent, 4)
+    return {
+        "intent_route": route,
+        "generation_config": gen_config,
+        "latency_log": latency_log,
+    }

pipeline/nodes/planner.py ADDED Viewed

	@@ -0,0 +1,196 @@

+"""
+L4 — Dialogue Planning & Generation node.
+Expression-conditioned response shaping (proposal §5.5):
+  1. Build augmented prompt (persona profile + retrieved evidence + affect config + style exemplar)
+  2. Generate N candidate responses
+  3. Rank candidates by composite score: α·faithful + β·style + γ·affect_match
+  4. Return the top-ranked response
+Two entry points:
+  run_primary  — Qwen3-30B-A3B (or configured primary tier)
+  run_fallback — Qwen3-8B (faster, triggered by latency threshold)
+"""
+from __future__ import annotations
+import time
+from config.settings import settings
+from generation.llm_client import chat_complete
+from guardrails.checks import check_output
+from pipeline.state import PipelineState
+# ── Persona-specific tone tags (applied on top of affect base tag) ─────────────
+_PERSONA_TONE_OVERRIDES: dict[str, dict[str, str]] = {
+    "mia_chen": {
+        "HAPPY":      "[TONE:WITTY_SARCASTIC]",
+        "FRUSTRATED":  "[TONE:DIRECT_EMPATHETIC]",
+    },
+    "gerald_okafor": {
+        "HAPPY":      "[TONE:WARM_FORMAL]",
+        "FRUSTRATED":  "[TONE:MEASURED_EMPATHETIC]",
+    },
+    "arjun_mehta": {
+        "HAPPY":      "[TONE:DIRECT_WARM]",
+        "FRUSTRATED":  "[TONE:MINIMAL_DIRECT]",
+    },
+}
+def run_primary(state: PipelineState) -> dict:
+    return _run(state, tier="primary")
+def run_fallback(state: PipelineState) -> dict:
+    return _run(state, tier="fallback")
+def route_by_latency(state: PipelineState) -> str:
+    """Conditional edge after retrieval nodes."""
+    log = state.get("latency_log") or {}
+    elapsed = log.get("t_intent", 0.0) + log.get("t_retrieval", 0.0)
+    return "fallback" if elapsed > settings.fallback_latency_threshold else "primary"
+# ── Core implementation ────────────────────────────────────────────────────────
+def _run(state: PipelineState, tier: str) -> dict:
+    t0 = time.perf_counter()
+    profile = state["persona_profile"]
+    user_id = state["user_id"]
+    affect = (state.get("affect") or {}).get("emotion", "NEUTRAL")
+    gen_cfg = state.get("generation_config") or {}
+    chunks = state.get("retrieved_chunks") or []
+    history = (state.get("session_history") or [])[-3:]   # last 3 turns only
+    tone_tag = _resolve_tone_tag(user_id, affect, gen_cfg.get("tone_tag", "[TONE:DEFAULT]"))
+    prompt = _build_prompt(profile, chunks, history, state["raw_query"], tone_tag, gen_cfg)
+    candidates: list[str] = []
+    for _ in range(settings.num_candidates):
+        text = chat_complete(
+            messages=[{"role": "user", "content": prompt}],
+            max_tokens=gen_cfg.get("max_tokens", settings.max_tokens_neutral) + 256,
+            temperature=0.7,
+            tier=tier,
+        )
+        candidates.append(text)
+    selected = _rank_candidates(candidates, chunks, affect, profile)
+    # Guardrail — replace with safe fallback if output breaks persona
+    guard = check_output(selected, chunks)
+    if not guard["passed"]:
+        selected = guard["fallback"]
+    t_gen = time.perf_counter() - t0
+    latency_log = dict(state.get("latency_log") or {})
+    latency_log["t_generation"] = round(t_gen, 4)
+    latency_log["t_total"] = round(
+        latency_log.get("t_sensing", 0)
+        + latency_log.get("t_intent", 0)
+        + latency_log.get("t_retrieval", 0)
+        + t_gen,
+        4,
+    )
+    return {
+        "augmented_prompt": prompt,
+        "candidates": candidates,
+        "selected_response": selected,
+        "llm_tier_used": tier,
+        "latency_log": latency_log,
+        "guardrail_passed": guard["passed"],
+    }
+def _resolve_tone_tag(user_id: str, affect: str, default_tag: str) -> str:
+    return _PERSONA_TONE_OVERRIDES.get(user_id, {}).get(affect, default_tag)
+def _build_prompt(
+    profile: dict,
+    chunks: list[dict],
+    history: list[dict],
+    query: str,
+    tone_tag: str,
+    gen_cfg: dict,
+) -> str:
+    memory_block = "\n".join(f"  [{c['bucket']}] {c['text']}" for c in chunks) or "  (no memories retrieved)"
+    history_block = "\n".join(f"  {h.get('role','?')}: {h.get('content','')}" for h in history) or "  (start of session)"
+    style_exemplar = profile.get("style_exemplar", "")
+    persona_mod = gen_cfg.get("persona_mod", "baseline")
+    persona_instruction = {
+        "amplify_quirks":    "Amplify your characteristic style and personality.",
+        "suppress_humor":    "Be direct and supportive. Suppress humor.",
+        "baseline":          "Use your natural communication style.",
+        "add_confirmation":  "Add a clarifying question or confirmation at the end.",
+    }.get(persona_mod, "Use your natural communication style.")
+    return f"""\
+You are {profile['name']}, an AAC device user with {profile['condition']}.
+Communication style: {profile['style']}
+{tone_tag}
+Style exemplar — match this register:
+  {style_exemplar}
+Personal memories (use ONLY these for personal facts):
+{memory_block}
+Recent conversation:
+{history_block}
+Partner says: {query}
+Instructions:
+- Speak in first person as {profile['name']}.
+- {persona_instruction}
+- Keep response to 1-3 sentences.
+- If the answer isn't in your memories, say "I don't know."
+- Do NOT say "As an AI" or break persona.
+Response:"""
+def _rank_candidates(
+    candidates: list[str],
+    chunks: list[dict],
+    affect: str,
+    profile: dict,
+) -> str:
+    """
+    Composite ranking: score = α·faithful + β·style + γ·affect_match
+    Simple heuristic version — replace with NLI + cosine similarity for final eval.
+    """
+    if not candidates:
+        return "I don't know."
+    if len(candidates) == 1:
+        return candidates[0]
+    evidence_words = set(" ".join(c["text"] for c in chunks).lower().split())
+    style_words = set(profile.get("style", "").lower().split())
+    affect_positive_map = {
+        "HAPPY":     ["great", "love", "enjoy", "happy", "fun"],
+        "FRUSTRATED": ["okay", "fine", "sure", "yes", "no"],
+        "NEUTRAL":   [],
+        "SURPRISED": ["really", "oh", "interesting", "wow"],
+    }
+    affect_words = set(affect_positive_map.get(affect, []))
+    def score(c: str) -> float:
+        words = set(c.lower().split())
+        faithful  = len(words & evidence_words) / max(len(words), 1)
+        style_sim = len(words & style_words)    / max(len(words), 1)
+        affect_m  = len(words & affect_words)   / max(len(words), 1)
+        return (
+            settings.rank_alpha * faithful
+            + settings.rank_beta  * style_sim
+            + settings.rank_gamma * affect_m
+        )
+    return max(candidates, key=score)

pipeline/nodes/retrieval.py ADDED Viewed

	@@ -0,0 +1,90 @@

+"""
+L3 — Semantic Bucketing & Retrieval node.
+Two entry points:
+  run_fast  — FRUSTRATED affect: k=2, single bucket, no reranking
+  run_full  — standard: k=5, optional bucket hint, BGE cross-encoder reranking
+Also exports the conditional edge function used by graph.py.
+"""
+from __future__ import annotations
+import time
+from config.settings import settings
+from pipeline.state import PipelineState, RetrievedChunk
+from retrieval.vector_store import retrieve
+from retrieval.bucket_priors import update_priors
+def run_fast(state: PipelineState) -> dict:
+    """Fast retrieval path for FRUSTRATED affect (k=2, no reranker)."""
+    t0 = time.perf_counter()
+    bucket_hint = _top_prior_bucket(state["bucket_priors"])
+    chunks = retrieve(
+        query=state["raw_query"],
+        user_id=state["user_id"],
+        top_k=settings.retrieval_fast_k,
+        rerank_k=settings.retrieval_fast_k,
+        bucket_filter=bucket_hint,
+        use_reranker=False,
+    )
+    return _build_return(state, chunks, "fast", t0)
+def run_full(state: PipelineState) -> dict:
+    """Full retrieval path with BGE cross-encoder reranking."""
+    t0 = time.perf_counter()
+    # Prefer gaze hint > intent bucket hint > None
+    route = state.get("intent_route") or {}
+    sub_intents = route.get("sub_intents", [])
+    bucket_hint = (
+        state.get("gaze_bucket")
+        or next((si.get("bucket_hint") for si in sub_intents if si.get("bucket_hint")), None)
+    )
+    chunks = retrieve(
+        query=state["raw_query"],
+        user_id=state["user_id"],
+        top_k=settings.retrieval_top_k,
+        rerank_k=settings.retrieval_rerank_k,
+        bucket_filter=bucket_hint,
+        use_reranker=True,
+    )
+    return _build_return(state, chunks, "full", t0)
+def route_by_affect(state: PipelineState) -> str:
+    """Conditional edge function — called by graph.py after the intent node."""
+    emotion = (state.get("affect") or {}).get("emotion", "NEUTRAL")
+    return "fast" if emotion == "FRUSTRATED" else "full"
+# ── Helpers ───────────────────────────────────────────────────────────────────
+def _top_prior_bucket(priors: dict[str, float]) -> str | None:
+    if not priors:
+        return None
+    return max(priors, key=priors.get)
+def _build_return(
+    state: PipelineState,
+    chunks: list[RetrievedChunk],
+    mode: str,
+    t0: float,
+) -> dict:
+    t_retrieval = time.perf_counter() - t0
+    latency_log = dict(state.get("latency_log") or {})
+    latency_log["t_retrieval"] = round(t_retrieval, 4)
+    return {
+        "retrieved_chunks": chunks,
+        "retrieval_mode_used": mode,
+        "latency_log": latency_log,
+    }

pipeline/state.py ADDED Viewed

	@@ -0,0 +1,98 @@

+"""
+Typed state object that flows through every LangGraph node.
+Each node receives the full PipelineState and returns a dict
+containing only the keys it updates — LangGraph merges them.
+"""
+from __future__ import annotations
+from typing import Annotated, Any, Optional
+from typing_extensions import TypedDict
+import operator
+# ── Sub-types ──────────────────────────────────────────────────────────────────
+class AffectVector(TypedDict):
+    MAR: float   # Mouth Aspect Ratio
+    EAR: float   # Eye Aspect Ratio
+    BRI: float   # Brow Raise Index
+    LCP: float   # Lip Corner Pull
+class AffectState(TypedDict):
+    emotion: str          # "HAPPY" | "FRUSTRATED" | "NEUTRAL" | "SURPRISED"
+    vector: AffectVector
+    smoothed: AffectVector  # EMA-smoothed vector
+class RetrievedChunk(TypedDict):
+    text: str
+    bucket: str           # family | medical | hobbies | daily_routine | social
+    user: str
+    score: float          # cross-encoder rerank score
+class SubIntent(TypedDict):
+    type: str             # "PERSONAL" | "CONTEXTUAL" | "OPEN_DOMAIN"
+    query: str
+    bucket_hint: Optional[str]
+    priority: str         # "fast" | "normal"
+class IntentRoute(TypedDict):
+    sub_intents: list[SubIntent]
+    style_constraints: dict[str, Any]   # tone, max_tokens, etc.
+    affect: str
+class GenerationConfig(TypedDict):
+    max_tokens: int
+    tone_tag: str         # e.g. "[TONE:WITTY_SARCASTIC]"
+    retrieval_mode: str   # "fast" | "full"
+    persona_mod: str      # "amplify_quirks" | "suppress_humor" | "baseline" | "add_confirmation"
+class LatencyLog(TypedDict):
+    t_sensing: float
+    t_intent: float
+    t_retrieval: float
+    t_generation: float
+    t_total: float
+# ── Main pipeline state ────────────────────────────────────────────────────────
+class PipelineState(TypedDict):
+    # ── Session context (set at turn start, stable across nodes) ──────────────
+    user_id: str
+    persona_profile: dict[str, Any]          # full profile from users.json
+    session_history: Annotated[list[dict], operator.add]  # auto-appended
+    turn_id: int
+    # ── L1: Sensing outputs ───────────────────────────────────────────────────
+    affect: Optional[AffectState]
+    gesture_tag: Optional[str]               # e.g. "THUMBS_UP"
+    gaze_bucket: Optional[str]               # bucket hinted by gaze fixation
+    air_written_text: Optional[str]          # concatenated air-written chars
+    # ── L2: Intent decomposition outputs ─────────────────────────────────────
+    raw_query: str                           # partner's typed/spoken query
+    intent_route: Optional[IntentRoute]      # Pydantic-validated routing
+    generation_config: Optional[GenerationConfig]
+    # ── L3: Retrieval outputs ─────────────────────────────────────────────────
+    retrieved_chunks: list[RetrievedChunk]
+    bucket_priors: dict[str, float]          # session-level Bayesian priors
+    retrieval_mode_used: str                 # "fast" | "full"
+    # ── L4: Generation outputs ────────────────────────────────────────────────
+    augmented_prompt: Optional[str]
+    candidates: list[str]                    # 2-3 candidate responses
+    selected_response: Optional[str]
+    llm_tier_used: str                       # "primary" | "fallback" | "local"
+    # ── L5: Feedback / tracking ───────────────────────────────────────────────
+    latency_log: Optional[LatencyLog]
+    mlflow_run_id: Optional[str]
+    guardrail_passed: bool

requirements.txt ADDED Viewed

	@@ -0,0 +1,39 @@

+# ── Orchestration ──────────────────────────────────────────────────────────────
+langgraph>=1.1
+langchain-core>=0.2
+pydantic>=2.0
+pydantic-settings>=2.0
+# ── LLM clients ────────────────────────────────────────────────────────────────
+openai>=1.0          # OpenAI-compatible client for vLLM + Ollama
+ollama>=0.2          # local dev fallback (direct Ollama SDK)
+# ── Retrieval ──────────────────────────────────────────────────────────────────
+faiss-cpu>=1.7
+sentence-transformers>=3.0
+torch>=2.0
+transformers>=4.40
+numpy>=1.24
+# ── Clustering ─────────────────────────────────────────────────────────────────
+hdbscan>=0.8.29
+scikit-learn>=1.3
+# ── Sensing ────────────────────────────────────────────────────────────────────
+mediapipe>=0.10
+opencv-python>=4.8
+# ── API backend ────────────────────────────────────────────────────────────────
+fastapi>=0.111
+uvicorn[standard]>=0.29
+# ── UI ─────────────────────────────────────────────────────────────────────────
+streamlit>=1.35
+requests>=2.31      # Streamlit → FastAPI calls
+# ── Experiment tracking ────────────────────────────────────────────────────────
+mlflow>=2.13
+# ── Utilities ──────────────────────────────────────────────────────────────────
+python-dotenv>=1.0
+rich>=13.0

retrieval/__init__.py ADDED Viewed

File without changes

retrieval/bucket_priors.py ADDED Viewed

	@@ -0,0 +1,52 @@

+"""
+Session-level Bayesian bucket priors (proposal §5.4 Bonus).
+Prior P(bucket_i) is initialized uniformly across the 5 buckets.
+After each accepted response, the prior is updated proportionally
+to the historical acceptance rate for that bucket in the session.
+P(bucket_i | accept) ∝ P(accept | bucket_i) · P(bucket_i)
+The updated priors are stored in PipelineState and passed to the
+retrieval node to bias FAISS search toward the most contextually
+likely topic for the session.
+"""
+from __future__ import annotations
+BUCKETS = ["family", "medical", "hobbies", "daily_routine", "social"]
+def uniform_priors() -> dict[str, float]:
+    """Return equal probability mass over all buckets."""
+    p = 1.0 / len(BUCKETS)
+    return {b: p for b in BUCKETS}
+def update_priors(
+    priors: dict[str, float],
+    accepted_bucket: str,
+    smoothing: float = 0.1,
+) -> dict[str, float]:
+    """
+    Bayesian update: boost the accepted bucket, normalise.
+    Args:
+        priors:          Current session priors (must sum to ~1.0).
+        accepted_bucket: Bucket that sourced the accepted response.
+        smoothing:       Additive smoothing constant to prevent zero probabilities.
+    """
+    if not priors:
+        priors = uniform_priors()
+    updated = {b: v + smoothing for b, v in priors.items()}
+    updated[accepted_bucket] = updated.get(accepted_bucket, smoothing) + 1.0
+    total = sum(updated.values())
+    return {b: round(v / total, 6) for b, v in updated.items()}
+def top_bucket(priors: dict[str, float]) -> str:
+    """Return the bucket with the highest prior."""
+    if not priors:
+        return BUCKETS[0]
+    return max(priors, key=priors.get)

retrieval/clustering.py ADDED Viewed

	@@ -0,0 +1,111 @@

+"""
+HDBSCAN-based semantic bucketing over BGE embeddings.
+Used to validate / discover thematic clusters in persona memories,
+and to auto-assign bucket labels when adding new memory chunks.
+The hand-authored bucket labels in the JSON files remain the ground
+truth — this module provides a data-driven cross-check and supports
+future expansion to unlabelled memory stores.
+"""
+from __future__ import annotations
+import json
+from pathlib import Path
+import numpy as np
+from config.settings import settings
+from retrieval.vector_store import _get_embedder
+try:
+    import hdbscan
+    _HDBSCAN_AVAILABLE = True
+except ImportError:
+    _HDBSCAN_AVAILABLE = False
+    print("[clustering] hdbscan not installed — clustering unavailable.")
+BUCKET_LABELS = ["family", "medical", "hobbies", "daily_routine", "social"]
+def cluster_persona_memories(user_id: str) -> dict[str, list[str]]:
+    """
+    Embed all memory chunks for a persona and cluster with HDBSCAN.
+    Returns a dict mapping cluster_id → list of memory texts.
+    Cluster -1 = noise (unclustered points).
+    """
+    if not _HDBSCAN_AVAILABLE:
+        raise RuntimeError("hdbscan package is required. Run: pip install hdbscan")
+    memory_path = settings.memories_dir / f"{user_id}.json"
+    with open(memory_path) as f:
+        persona = json.load(f)
+    texts, true_buckets = [], []
+    for bucket, memories in persona["memory_buckets"].items():
+        for mem in memories:
+            texts.append(mem)
+            true_buckets.append(bucket)
+    embedder = _get_embedder()
+    vecs = embedder.encode(texts, convert_to_numpy=True, normalize_embeddings=True)
+    clusterer = hdbscan.HDBSCAN(
+        min_cluster_size=3,
+        min_samples=2,
+        metric="euclidean",
+    )
+    labels = clusterer.fit_predict(vecs)
+    clusters: dict[str, list[str]] = {}
+    for text, label, true_bucket in zip(texts, labels, true_buckets):
+        key = f"cluster_{label}" if label >= 0 else "noise"
+        clusters.setdefault(key, []).append(text)
+    return clusters
+def evaluate_bucket_alignment(user_id: str) -> dict:
+    """
+    Compare HDBSCAN cluster assignments against hand-authored bucket labels.
+    Returns per-bucket purity scores (fraction of dominant label in each cluster).
+    """
+    if not _HDBSCAN_AVAILABLE:
+        raise RuntimeError("hdbscan package is required.")
+    memory_path = settings.memories_dir / f"{user_id}.json"
+    with open(memory_path) as f:
+        persona = json.load(f)
+    texts, true_buckets = [], []
+    for bucket, memories in persona["memory_buckets"].items():
+        for mem in memories:
+            texts.append(mem)
+            true_buckets.append(bucket)
+    embedder = _get_embedder()
+    vecs = embedder.encode(texts, convert_to_numpy=True, normalize_embeddings=True)
+    clusterer = hdbscan.HDBSCAN(min_cluster_size=3, min_samples=2, metric="euclidean")
+    pred_labels = clusterer.fit_predict(vecs)
+    cluster_bucket_counts: dict[int, dict[str, int]] = {}
+    for pred, true in zip(pred_labels, true_buckets):
+        cluster_bucket_counts.setdefault(pred, {})
+        cluster_bucket_counts[pred][true] = cluster_bucket_counts[pred].get(true, 0) + 1
+    purity_scores = {}
+    for cluster_id, bucket_counts in cluster_bucket_counts.items():
+        total = sum(bucket_counts.values())
+        dominant = max(bucket_counts.values())
+        purity_scores[cluster_id] = round(dominant / total, 3)
+    return {
+        "n_clusters": len([k for k in purity_scores if k >= 0]),
+        "n_noise": cluster_bucket_counts.get(-1, {}),
+        "cluster_purity": purity_scores,
+        "mean_purity": round(
+            np.mean([v for k, v in purity_scores.items() if k >= 0] or [0.0]), 3
+        ),
+    }

retrieval/vector_store.py ADDED Viewed

	@@ -0,0 +1,168 @@

+"""
+FAISS-backed dense retrieval with BGE embeddings and cross-encoder reranking.
+Models are lazy-loaded on first use (safe for FastAPI / LangGraph workers).
+NOTE: The FAISS indexes in data/faiss_store/ must be built with BGE embeddings.
+      Run `python -m retrieval.vector_store` to rebuild all persona indexes.
+"""
+from __future__ import annotations
+import json
+import time
+from functools import lru_cache
+from pathlib import Path
+import faiss
+import numpy as np
+from sentence_transformers import CrossEncoder, SentenceTransformer
+from config.settings import settings
+from pipeline.state import RetrievedChunk
+# ── Lazy model singletons ──────────────────────────────────────────────────────
+@lru_cache(maxsize=1)
+def _get_embedder() -> SentenceTransformer:
+    return SentenceTransformer(settings.embed_model)
+@lru_cache(maxsize=1)
+def _get_reranker() -> CrossEncoder:
+    return CrossEncoder(settings.rerank_model)
+# ── Index cache (one FAISS index per user_id) ─────────────────────────────────
+_index_cache: dict[str, tuple[faiss.Index, list[dict]]] = {}
+def load_index(user_id: str) -> tuple[faiss.Index, list[dict]]:
+    if user_id not in _index_cache:
+        store_path = settings.faiss_store_dir / user_id
+        index = faiss.read_index(str(store_path / "index.faiss"))
+        with open(store_path / "meta.json") as f:
+            meta = json.load(f)
+        _index_cache[user_id] = (index, meta)
+    return _index_cache[user_id]
+# ── Core retrieve function ─────────────────────────────────────────────────────
+def retrieve(
+    query: str,
+    user_id: str,
+    top_k: int = 5,
+    rerank_k: int = 3,
+    bucket_filter: str | None = None,
+    use_reranker: bool = True,
+    debug: bool = False,
+) -> list[RetrievedChunk]:
+    """
+    Two-stage retrieval:
+      1. BGE-small-en-v1.5 bi-encoder → FAISS IndexFlatIP (cosine similarity)
+      2. BGE-reranker-v2-m3 cross-encoder reranking (multilingual, skippable)
+    Args:
+        query:         Partner's text query.
+        user_id:       Persona identifier (e.g. "mia_chen").
+        top_k:         Number of candidates from FAISS before reranking.
+        rerank_k:      Final number of chunks returned after reranking.
+        bucket_filter: If set, restrict candidates to this memory bucket.
+        use_reranker:  False for the FRUSTRATED fast path.
+        debug:         Return timing breakdown alongside results.
+    """
+    embedder = _get_embedder()
+    index, meta = load_index(user_id)
+    t0 = time.perf_counter()
+    q_vec = embedder.encode(
+        [query], convert_to_numpy=True, normalize_embeddings=True
+    )
+    t_embed = time.perf_counter() - t0
+    t0 = time.perf_counter()
+    _, idxs = index.search(q_vec, top_k)
+    t_faiss = time.perf_counter() - t0
+    candidates = [meta[i] for i in idxs[0] if i < len(meta)]
+    if bucket_filter:
+        filtered = [c for c in candidates if c["bucket"] == bucket_filter]
+        candidates = filtered if filtered else candidates   # fallback: all buckets
+    t0 = time.perf_counter()
+    if use_reranker and len(candidates) > 1:
+        reranker = _get_reranker()
+        pairs = [(query, c["text"]) for c in candidates]
+        ce_scores = reranker.predict(pairs)
+        ranked = sorted(zip(ce_scores, candidates), key=lambda x: x[0], reverse=True)
+        top = [
+            RetrievedChunk(text=c["text"], bucket=c["bucket"], user=c["user"], score=float(s))
+            for s, c in ranked[:rerank_k]
+        ]
+    else:
+        top = [
+            RetrievedChunk(text=c["text"], bucket=c["bucket"], user=c["user"], score=1.0)
+            for c in candidates[:rerank_k]
+        ]
+    t_rerank = time.perf_counter() - t0
+    if debug:
+        return top, {"t_embed": t_embed, "t_faiss": t_faiss, "t_rerank": t_rerank}
+    return top
+# ── Index builder ──────────────────────────────────────────────────────────────
+def build_index(persona_path: str | Path) -> tuple[faiss.Index, list[dict]]:
+    """Embed all memory chunks for a persona and build a FAISS IndexFlatIP."""
+    with open(persona_path) as f:
+        persona = json.load(f)
+    user_name = persona["profile"]["name"]
+    chunks, meta = [], []
+    for bucket, memories in persona["memory_buckets"].items():
+        for mem in memories:
+            chunks.append(mem)
+            meta.append({"text": mem, "bucket": bucket, "user": user_name})
+    embedder = _get_embedder()
+    vecs = embedder.encode(chunks, convert_to_numpy=True, normalize_embeddings=True)
+    dim = vecs.shape[1]
+    index = faiss.IndexFlatIP(dim)
+    index.add(vecs.astype(np.float32))
+    return index, meta
+def save_index(index: faiss.Index, meta: list[dict], save_dir: str | Path) -> None:
+    p = Path(save_dir)
+    p.mkdir(parents=True, exist_ok=True)
+    faiss.write_index(index, str(p / "index.faiss"))
+    with open(p / "meta.json", "w") as f:
+        json.dump(meta, f, indent=2)
+def build_all(
+    memories_dir: str | Path | None = None,
+    store_dir: str | Path | None = None,
+) -> None:
+    """Rebuild FAISS indexes for all personas using the configured BGE embedder."""
+    memories_dir = Path(memories_dir or settings.memories_dir)
+    store_dir = Path(store_dir or settings.faiss_store_dir)
+    for persona_file in sorted(memories_dir.glob("*.json")):
+        uid = persona_file.stem
+        print(f"  Building index for {uid} …")
+        index, meta = build_index(persona_file)
+        save_index(index, meta, store_dir / uid)
+        print(f"    Saved {len(meta)} chunks → {store_dir / uid}/")
+    print("\nAll indexes built.")
+# ── Entrypoint ────────────────────────────────────────────────────────────────
+if __name__ == "__main__":
+    build_all()

sensing/__init__.py ADDED Viewed

File without changes

sensing/air_writing.py ADDED Viewed

	@@ -0,0 +1,176 @@

+"""
+L1 — Air writing recognition via index-finger tip trajectory (proposal §5.2).
+Tracks MediaPipe Hands landmark 8 (index fingertip) across frames.
+Stroke segmentation uses velocity thresholding:
+  - stroke starts when velocity > START_VEL px/frame
+  - stroke ends when velocity < END_VEL px/frame for > GAP_MS ms
+Segmented strokes are classified against a template library using
+Dynamic Time Warping (DTW). Supports:
+  - 26 uppercase English letters (A-Z)
+  - 10 digits (0-9)
+  - 10 most frequent Devanagari characters (for Arjun's Hindi inputs)
+Recognised characters are concatenated and returned as a text string
+to the intent decomposition layer.
+"""
+from __future__ import annotations
+import time
+from collections import deque
+from dataclasses import dataclass, field
+import numpy as np
+from config.settings import settings
+try:
+    import mediapipe as mp
+    _MP_AVAILABLE = True
+except ImportError:
+    _MP_AVAILABLE = False
+# ── Landmark index ─────────────────────────────────────────────────────────────
+_INDEX_TIP = 8
+@dataclass
+class AirWriter:
+    """
+    Stateful air-writing recogniser. Feed frames from a webcam loop.
+    Call `get_text()` to retrieve and clear the current buffer.
+    """
+    _trajectory: list[tuple[float, float]] = field(default_factory=list)
+    _in_stroke: bool = False
+    _stroke_end_time: float = field(default=0.0)
+    _text_buffer: list[str] = field(default_factory=list)
+    _templates: dict[str, np.ndarray] = field(default_factory=dict)
+    def __post_init__(self):
+        if not _MP_AVAILABLE:
+            raise ImportError("mediapipe is required: pip install mediapipe")
+        self._hands = mp.solutions.hands.Hands(
+            static_image_mode=False,
+            max_num_hands=1,
+            min_detection_confidence=0.6,
+            min_tracking_confidence=0.5,
+        )
+        self._prev_pt: tuple[float, float] | None = None
+        self._templates = _load_templates()
+    def process_frame(self, bgr_frame) -> str | None:
+        """
+        Process one frame. Returns a recognised character when a stroke
+        completes, or None otherwise.
+        """
+        import cv2
+        rgb = cv2.cvtColor(bgr_frame, cv2.COLOR_BGR2RGB)
+        result = self._hands.process(rgb)
+        if not result.multi_hand_landmarks:
+            self._prev_pt = None
+            return self._check_stroke_end()
+        h, w = bgr_frame.shape[:2]
+        lm = result.multi_hand_landmarks[0].landmark
+        tip = (lm[_INDEX_TIP].x * w, lm[_INDEX_TIP].y * h)
+        velocity = 0.0
+        if self._prev_pt is not None:
+            velocity = np.linalg.norm(np.array(tip) - np.array(self._prev_pt))
+        self._prev_pt = tip
+        start_v = settings.air_write_velocity_start
+        end_v   = settings.air_write_velocity_end
+        if velocity > start_v:
+            self._in_stroke = True
+            self._trajectory.append(tip)
+            self._stroke_end_time = 0.0
+        elif self._in_stroke and velocity < end_v:
+            if self._stroke_end_time == 0.0:
+                self._stroke_end_time = time.time()
+            return self._check_stroke_end()
+        return None
+    def _check_stroke_end(self) -> str | None:
+        if not self._in_stroke or self._stroke_end_time == 0.0:
+            return None
+        gap_s = settings.air_write_end_gap_ms / 1000.0
+        if time.time() - self._stroke_end_time >= gap_s:
+            char = self._recognise(self._trajectory)
+            self._trajectory = []
+            self._in_stroke = False
+            self._stroke_end_time = 0.0
+            if char:
+                self._text_buffer.append(char)
+            return char
+        return None
+    def _recognise(self, trajectory: list[tuple[float, float]]) -> str | None:
+        if len(trajectory) < 5 or not self._templates:
+            return None
+        query = _normalise_trajectory(np.array(trajectory))
+        best_char, best_dist = None, float("inf")
+        for char, template in self._templates.items():
+            dist = _dtw_distance(query, template)
+            if dist < best_dist:
+                best_dist = dist
+                best_char = char
+        return best_char
+    def get_text(self) -> str:
+        """Return and clear the accumulated air-written text."""
+        text = "".join(self._text_buffer)
+        self._text_buffer.clear()
+        return text
+    def release(self):
+        self._hands.close()
+# ── DTW helpers ───────────────────────────────────────────────────────────────
+def _normalise_trajectory(pts: np.ndarray) -> np.ndarray:
+    """Scale trajectory to unit bounding box, resample to 32 points."""
+    pts = pts - pts.min(axis=0)
+    scale = pts.max(axis=0) + 1e-6
+    pts = pts / scale
+    # Resample to fixed length via linear interpolation
+    t_old = np.linspace(0, 1, len(pts))
+    t_new = np.linspace(0, 1, 32)
+    return np.column_stack([
+        np.interp(t_new, t_old, pts[:, 0]),
+        np.interp(t_new, t_old, pts[:, 1]),
+    ])
+def _dtw_distance(a: np.ndarray, b: np.ndarray) -> float:
+    """Simple O(n²) DTW — trajectories are short (32 pts), so this is fine."""
+    n, m = len(a), len(b)
+    dtw = np.full((n + 1, m + 1), np.inf)
+    dtw[0, 0] = 0.0
+    for i in range(1, n + 1):
+        for j in range(1, m + 1):
+            cost = np.linalg.norm(a[i - 1] - b[j - 1])
+            dtw[i, j] = cost + min(dtw[i - 1, j], dtw[i, j - 1], dtw[i - 1, j - 1])
+    return float(dtw[n, m])
+def _load_templates() -> dict[str, np.ndarray]:
+    """
+    Load pre-recorded stroke templates from disk.
+    Template files should be numpy arrays of shape (32, 2) stored as .npy.
+    Returns an empty dict if no template directory exists yet.
+    """
+    from pathlib import Path
+    template_dir = Path("data/air_write_templates")
+    if not template_dir.exists():
+        return {}
+    templates = {}
+    for f in template_dir.glob("*.npy"):
+        char = f.stem    # filename = character label
+        templates[char] = np.load(f)
+    return templates

sensing/face_mesh.py ADDED Viewed

	@@ -0,0 +1,166 @@

+"""
+L1 — Facial affect detection via MediaPipe 2D Face Mesh.
+Extracts 4 geometric features from 478 landmarks at ~10 fps:
+  MAR — Mouth Aspect Ratio     (surprise / speech attempt)
+  EAR — Eye Aspect Ratio       (frustration / blink)
+  BRI — Brow Raise Index       (surprise / questioning)
+  LCP — Lip Corner Pull        (smile vs frown)
+These form the affect vector fed into MobileNetV3-Small affect classifier,
+which maps to one of 4 actionable states: HAPPY | FRUSTRATED | NEUTRAL | SURPRISED.
+EMA smoothing (α=0.3) prevents transient expressions (sneezes, blinks)
+from destabilising the detected state across turns.
+"""
+from __future__ import annotations
+import time
+from dataclasses import dataclass, field
+import numpy as np
+from config.settings import settings
+from pipeline.state import AffectState, AffectVector
+try:
+    import mediapipe as mp
+    _MP_AVAILABLE = True
+except ImportError:
+    _MP_AVAILABLE = False
+try:
+    import cv2
+    _CV2_AVAILABLE = True
+except ImportError:
+    _CV2_AVAILABLE = False
+# ── MediaPipe landmark indices (from proposal §5.2) ───────────────────────────
+# MAR — mouth vertical / horizontal ratio
+_MOUTH_TOP    = 13
+_MOUTH_BOTTOM = 14
+_MOUTH_LEFT   = 61
+_MOUTH_RIGHT  = 291
+# EAR — eye vertical / horizontal ratio (right eye)
+_EYE_TOP    = 159
+_EYE_BOTTOM = 145
+_EYE_LEFT   = 33
+_EYE_RIGHT  = 133
+# BRI — brow vertical displacement relative to eye centre
+_BROW_LEFT  = 70
+_BROW_RIGHT = 300
+# LCP — mouth corner horizontal displacement from neutral baseline
+_CORNER_LEFT  = 61
+_CORNER_RIGHT = 291
+# ── Affect classes ────────────────────────────────────────────────────────────
+AFFECT_CLASSES = ["HAPPY", "FRUSTRATED", "NEUTRAL", "SURPRISED"]
+@dataclass
+class AffectDetector:
+    """
+    Stateful detector that maintains EMA-smoothed affect across frames.
+    Create one instance per session and call `process_frame` each frame.
+    """
+    _smoothed: AffectVector = field(default_factory=lambda: AffectVector(MAR=0.0, EAR=0.3, BRI=0.0, LCP=0.0))
+    _neutral_lcp: float = 0.0          # calibrated at session start
+    _calibrated: bool = False
+    def __post_init__(self):
+        if not _MP_AVAILABLE:
+            raise ImportError("mediapipe is required: pip install mediapipe")
+        if not _CV2_AVAILABLE:
+            raise ImportError("opencv-python is required: pip install opencv-python")
+        self._face_mesh = mp.solutions.face_mesh.FaceMesh(
+            static_image_mode=False,
+            max_num_faces=1,
+            refine_landmarks=True,       # enables iris landmarks (468-477)
+            min_detection_confidence=0.5,
+            min_tracking_confidence=0.5,
+        )
+    def process_frame(self, bgr_frame: np.ndarray) -> AffectState | None:
+        """
+        Process one BGR frame from OpenCV and return the current AffectState,
+        or None if no face is detected.
+        """
+        rgb = cv2.cvtColor(bgr_frame, cv2.COLOR_BGR2RGB)
+        result = self._face_mesh.process(rgb)
+        if not result.multi_face_landmarks:
+            return None
+        lm = result.multi_face_landmarks[0].landmark
+        h, w = bgr_frame.shape[:2]
+        def pt(idx):
+            l = lm[idx]
+            return np.array([l.x * w, l.y * h])
+        raw = self._compute_features(pt)
+        if not self._calibrated:
+            self._neutral_lcp = raw["LCP"]
+            self._calibrated = True
+        raw["LCP"] = raw["LCP"] - self._neutral_lcp  # relative to neutral baseline
+        alpha = settings.affect_ema_alpha
+        smoothed = AffectVector(
+            MAR=alpha * raw["MAR"] + (1 - alpha) * self._smoothed["MAR"],
+            EAR=alpha * raw["EAR"] + (1 - alpha) * self._smoothed["EAR"],
+            BRI=alpha * raw["BRI"] + (1 - alpha) * self._smoothed["BRI"],
+            LCP=alpha * raw["LCP"] + (1 - alpha) * self._smoothed["LCP"],
+        )
+        self._smoothed = smoothed
+        emotion = self._classify(smoothed)
+        return AffectState(emotion=emotion, vector=raw, smoothed=smoothed)
+    def _compute_features(self, pt) -> dict:
+        # MAR
+        mouth_v = np.linalg.norm(pt(_MOUTH_TOP) - pt(_MOUTH_BOTTOM))
+        mouth_h = np.linalg.norm(pt(_MOUTH_LEFT) - pt(_MOUTH_RIGHT))
+        MAR = mouth_v / (mouth_h + 1e-6)
+        # EAR
+        eye_v = np.linalg.norm(pt(_EYE_TOP) - pt(_EYE_BOTTOM))
+        eye_h = np.linalg.norm(pt(_EYE_LEFT) - pt(_EYE_RIGHT))
+        EAR = eye_v / (eye_h + 1e-6)
+        # BRI — average brow displacement relative to eye centre
+        eye_center = (pt(_EYE_LEFT) + pt(_EYE_RIGHT)) / 2
+        inter_ocular = eye_h
+        brow_mid = (pt(_BROW_LEFT) + pt(_BROW_RIGHT)) / 2
+        BRI = (eye_center[1] - brow_mid[1]) / (inter_ocular + 1e-6)
+        # LCP — average horizontal mouth corner displacement
+        LCP = float((pt(_CORNER_LEFT)[0] + pt(_CORNER_RIGHT)[0]) / 2)
+        return {"MAR": float(MAR), "EAR": float(EAR), "BRI": float(BRI), "LCP": float(LCP)}
+    @staticmethod
+    def _classify(v: AffectVector) -> str:
+        """
+        Rule-based classifier over the 4 geometric features.
+        Replace with MobileNetV3-Small for final evaluation.
+        """
+        if v["BRI"] > 0.25 and v["MAR"] > 0.3:
+            return "SURPRISED"
+        if v["EAR"] < 0.15 and v["LCP"] < -5:
+            return "FRUSTRATED"
+        if v["LCP"] > 5:
+            return "HAPPY"
+        return "NEUTRAL"
+    def release(self):
+        self._face_mesh.close()

sensing/gaze.py ADDED Viewed

	@@ -0,0 +1,113 @@

+"""
+L1 — Gaze-based retrieval activation (Bonus feature, proposal §5.2).
+Uses MediaPipe iris landmarks (468-472) to estimate gaze direction as
+a 2D screen-coordinate vector. Sustained fixation (> 1.5 s dwell time)
+on a defined UI region pre-biases the retrieval layer toward the
+corresponding memory bucket.
+UI region → bucket mapping:
+  top-left quadrant     → family
+  top-right quadrant    → medical
+  bottom-left quadrant  → hobbies
+  bottom-right quadrant → daily_routine
+  centre strip          → social
+"""
+from __future__ import annotations
+import time
+from dataclasses import dataclass, field
+import numpy as np
+from config.settings import settings
+try:
+    import mediapipe as mp
+    _MP_AVAILABLE = True
+except ImportError:
+    _MP_AVAILABLE = False
+# ── Iris landmark indices ──────────────────────────────────────────────────────
+# MediaPipe refine_landmarks=True adds iris landmarks 468-477
+_LEFT_IRIS_CENTER  = 468
+_RIGHT_IRIS_CENTER = 473
+# ── Screen region → bucket map ─────────────────────────────────────────────────
+# Defined as (x_min, y_min, x_max, y_max) in normalised [0,1] coords
+_REGION_BUCKET: list[tuple[tuple[float, float, float, float], str]] = [
+    ((0.0, 0.0, 0.5, 0.5), "family"),
+    ((0.5, 0.0, 1.0, 0.5), "medical"),
+    ((0.0, 0.5, 0.5, 1.0), "hobbies"),
+    ((0.5, 0.5, 1.0, 1.0), "daily_routine"),
+    ((0.3, 0.3, 0.7, 0.7), "social"),   # centre strip (checked last → lowest priority)
+]
+@dataclass
+class GazeTracker:
+    """
+    Stateful gaze tracker. Call `process_frame` each frame.
+    Returns the bucket name when dwell threshold is exceeded, else None.
+    """
+    _dwell_start: float = field(default=0.0)
+    _current_region: str | None = field(default=None)
+    def __post_init__(self):
+        if not _MP_AVAILABLE:
+            raise ImportError("mediapipe is required: pip install mediapipe")
+        self._face_mesh = mp.solutions.face_mesh.FaceMesh(
+            static_image_mode=False,
+            max_num_faces=1,
+            refine_landmarks=True,
+            min_detection_confidence=0.5,
+            min_tracking_confidence=0.5,
+        )
+    def process_frame(self, bgr_frame) -> str | None:
+        """
+        Returns the hinted bucket name once dwell threshold is exceeded,
+        then resets the dwell timer. Returns None otherwise.
+        """
+        import cv2
+        rgb = cv2.cvtColor(bgr_frame, cv2.COLOR_BGR2RGB)
+        result = self._face_mesh.process(rgb)
+        if not result.multi_face_landmarks:
+            self._reset()
+            return None
+        lm = result.multi_face_landmarks[0].landmark
+        # Average left + right iris centres for gaze estimate
+        gaze_x = (lm[_LEFT_IRIS_CENTER].x + lm[_RIGHT_IRIS_CENTER].x) / 2
+        gaze_y = (lm[_LEFT_IRIS_CENTER].y + lm[_RIGHT_IRIS_CENTER].y) / 2
+        bucket = self._region_for(gaze_x, gaze_y)
+        if bucket != self._current_region:
+            self._current_region = bucket
+            self._dwell_start = time.time()
+            return None
+        dwell = time.time() - self._dwell_start
+        if dwell >= settings.gaze_dwell_threshold_s and bucket is not None:
+            self._reset()
+            return bucket
+        return None
+    @staticmethod
+    def _region_for(x: float, y: float) -> str | None:
+        for (x0, y0, x1, y1), bucket in _REGION_BUCKET:
+            if x0 <= x <= x1 and y0 <= y <= y1:
+                return bucket
+        return None
+    def _reset(self):
+        self._dwell_start = 0.0
+        self._current_region = None
+    def release(self):
+        self._face_mesh.close()

sensing/gesture.py ADDED Viewed

	@@ -0,0 +1,124 @@

+"""
+L1 — Hand gesture recognition via MediaPipe Hands.
+Recognises 4 gestures from 21 3D hand landmarks at ~15 fps using
+normalised joint-angle rules (no ML model needed at this stage):
+  THUMBS_UP    → [TONE:AFFIRMATIVE]
+  THUMBS_DOWN  → [TONE:NEGATIVE]
+  POINTING     → [INTENT:REFERENTIAL]
+  WAVING       → [INTENT:GREETING]
+Each detected gesture is mapped to a stylistic constraint tag that is
+injected into the generation prompt by the planner node.
+"""
+from __future__ import annotations
+import numpy as np
+try:
+    import mediapipe as mp
+    _MP_AVAILABLE = True
+except ImportError:
+    _MP_AVAILABLE = False
+# Gesture → prompt constraint tag mapping
+GESTURE_TO_TAG: dict[str, str] = {
+    "THUMBS_UP":   "[GESTURE:THUMBS_UP][TONE:AFFIRMATIVE]",
+    "THUMBS_DOWN": "[GESTURE:THUMBS_DOWN][TONE:NEGATIVE]",
+    "POINTING":    "[GESTURE:POINTING][INTENT:REFERENTIAL]",
+    "WAVING":      "[GESTURE:WAVING][INTENT:GREETING]",
+}
+class GestureClassifier:
+    """
+    Stateful classifier — create one instance per session.
+    Feed MediaPipe hand landmark results each frame.
+    """
+    def __init__(self):
+        if not _MP_AVAILABLE:
+            raise ImportError("mediapipe is required: pip install mediapipe")
+        self._hands = mp.solutions.hands.Hands(
+            static_image_mode=False,
+            max_num_hands=1,
+            min_detection_confidence=0.6,
+            min_tracking_confidence=0.5,
+        )
+    def process_frame(self, bgr_frame) -> str | None:
+        """
+        Returns a gesture label string or None if no clear gesture is detected.
+        """
+        import cv2
+        rgb = cv2.cvtColor(bgr_frame, cv2.COLOR_BGR2RGB)
+        result = self._hands.process(rgb)
+        if not result.multi_hand_landmarks:
+            return None
+        lm = result.multi_hand_landmarks[0].landmark
+        pts = np.array([[l.x, l.y, l.z] for l in lm])
+        return self._classify(pts)
+    def gesture_tag(self, bgr_frame) -> str | None:
+        """Convenience: returns the prompt tag directly, or None."""
+        gesture = self.process_frame(bgr_frame)
+        return GESTURE_TO_TAG.get(gesture) if gesture else None
+    @staticmethod
+    def _classify(pts: np.ndarray) -> str | None:
+        """
+        Rule-based gesture classification over normalised joint positions.
+        MediaPipe hand landmark indices:
+          0=WRIST, 1-4=THUMB, 5-8=INDEX, 9-12=MIDDLE, 13-16=RING, 17-20=PINKY
+        """
+        # Normalise: wrist at origin, scale by palm width
+        wrist = pts[0]
+        palm_width = np.linalg.norm(pts[5] - pts[17]) + 1e-6
+        p = (pts - wrist) / palm_width
+        thumb_tip   = p[4]
+        index_tip   = p[8]
+        middle_tip  = p[12]
+        ring_tip    = p[16]
+        pinky_tip   = p[20]
+        index_mcp   = p[5]   # knuckle
+        # THUMBS_UP: thumb tip above wrist, other fingers curled
+        fingers_curled = all(
+            np.linalg.norm(tip) < np.linalg.norm(p[mcp])
+            for tip, mcp in [(index_tip, p[5]), (middle_tip, p[9]), (ring_tip, p[13])]
+        )
+        if thumb_tip[1] < -0.3 and fingers_curled:
+            return "THUMBS_UP"
+        # THUMBS_DOWN: thumb tip below wrist, other fingers curled
+        if thumb_tip[1] > 0.3 and fingers_curled:
+            return "THUMBS_DOWN"
+        # POINTING: index extended, others curled
+        index_extended = np.linalg.norm(index_tip) > np.linalg.norm(index_mcp) * 1.3
+        others_curled  = all(
+            np.linalg.norm(tip) < 0.5
+            for tip in [middle_tip, ring_tip, pinky_tip]
+        )
+        if index_extended and others_curled:
+            return "POINTING"
+        # WAVING: all fingers extended, hand roughly vertical
+        all_extended = all(
+            np.linalg.norm(tip) > 0.5
+            for tip in [index_tip, middle_tip, ring_tip, pinky_tip]
+        )
+        if all_extended:
+            return "WAVING"
+        return None
+    def release(self):
+        self._hands.close()

ui/app.py ADDED Viewed

	@@ -0,0 +1,153 @@

+"""
+Streamlit frontend — webcam + chat + live metrics dashboard.
+Panels:
+  Left sidebar  — persona selector, session controls, live affect display
+  Centre        — chat interface with streaming response
+  Right sidebar — latency breakdown, bucket priors bar chart
+Run: streamlit run ui/app.py
+"""
+from __future__ import annotations
+import json
+import time
+import requests
+import streamlit as st
+# ── Config ─────────────────────────────────────────────────────────────────────
+API_BASE = "http://localhost:8000"
+st.set_page_config(
+    page_title="AAC Chatbot",
+    layout="wide",
+    initial_sidebar_state="expanded",
+)
+# ── Session state init ─────────────────────────────────────────────────────────
+if "user_id" not in st.session_state:
+    st.session_state.user_id = None
+if "messages" not in st.session_state:
+    st.session_state.messages = []
+if "last_latency" not in st.session_state:
+    st.session_state.last_latency = {}
+if "last_affect" not in st.session_state:
+    st.session_state.last_affect = "NEUTRAL"
+if "affect_override" not in st.session_state:
+    st.session_state.affect_override = None
+# ── Sidebar ────────────────────────────────────────────────────────────────────
+with st.sidebar:
+    st.title("AAC Chatbot")
+    # Persona selection
+    try:
+        users_resp = requests.get(f"{API_BASE}/users", timeout=3)
+        users = users_resp.json().get("users", [])
+    except Exception:
+        users = []
+        st.error("API not reachable — start the FastAPI server first.")
+    user_options = {u["id"]: f"{u['name']} ({u['condition']})" for u in users}
+    selected = st.selectbox("Select persona", options=list(user_options.keys()),
+                             format_func=lambda k: user_options.get(k, k))
+    if selected != st.session_state.user_id:
+        st.session_state.user_id = selected
+        st.session_state.messages = []
+        try:
+            requests.post(f"{API_BASE}/session/reset", params={"user_id": selected})
+        except Exception:
+            pass
+    st.divider()
+    # Affect override (for demo / testing without webcam)
+    st.subheader("Affect Override")
+    st.caption("Simulates webcam affect detection")
+    affect_choice = st.radio(
+        "Current affect",
+        ["Auto (webcam)", "HAPPY", "FRUSTRATED", "NEUTRAL", "SURPRISED"],
+        index=0,
+    )
+    st.session_state.affect_override = None if affect_choice == "Auto (webcam)" else affect_choice
+    st.divider()
+    # Live affect indicator
+    st.subheader("Detected Affect")
+    affect_emoji = {
+        "HAPPY": "😊", "FRUSTRATED": "😤",
+        "NEUTRAL": "😐", "SURPRISED": "😲",
+    }
+    af = st.session_state.last_affect
+    st.markdown(f"### {affect_emoji.get(af, '❓')} {af}")
+    # Webcam placeholder
+    st.divider()
+    st.subheader("Webcam Feed")
+    st.info("Live webcam sensing runs in the sensing client.\nAffect is sent to the API automatically.")
+# ── Main chat area ─────────────────────────────────────────────────────────────
+st.header(f"Talking as: {user_options.get(st.session_state.user_id, '—')}")
+chat_col, metrics_col = st.columns([3, 1])
+with chat_col:
+    for msg in st.session_state.messages:
+        role_label = "Partner" if msg["role"] == "partner" else "AAC User"
+        with st.chat_message("user" if msg["role"] == "partner" else "assistant"):
+            st.markdown(f"**{role_label}:** {msg['content']}")
+    query = st.chat_input("Type as the communication partner…")
+    if query and st.session_state.user_id:
+        st.session_state.messages.append({"role": "partner", "content": query})
+        with st.chat_message("user"):
+            st.markdown(f"**Partner:** {query}")
+        with st.chat_message("assistant"):
+            with st.spinner("Generating response…"):
+                try:
+                    payload = {
+                        "user_id": st.session_state.user_id,
+                        "query": query,
+                        "affect_override": st.session_state.affect_override,
+                    }
+                    resp = requests.post(f"{API_BASE}/chat", json=payload, timeout=15)
+                    data = resp.json()
+                    response_text = data.get("response", "I don't know.")
+                    st.markdown(f"**AAC User:** {response_text}")
+                    st.session_state.messages.append({"role": "aac_user", "content": response_text})
+                    st.session_state.last_affect = data.get("affect", "NEUTRAL")
+                    st.session_state.last_latency = data.get("latency", {})
+                    if not data.get("guardrail_passed", True):
+                        st.warning("⚠ Guardrail triggered — response was sanitised.")
+                except requests.exceptions.Timeout:
+                    st.error("Request timed out. Is the server running?")
+                except Exception as e:
+                    st.error(f"Error: {e}")
+with metrics_col:
+    st.subheader("Turn Latency (s)")
+    lat = st.session_state.last_latency
+    if lat:
+        for key, label in [
+            ("t_sensing",    "Sensing"),
+            ("t_intent",     "Intent"),
+            ("t_retrieval",  "Retrieval"),
+            ("t_generation", "Generation"),
+            ("t_total",      "**Total**"),
+        ]:
+            val = lat.get(key, 0.0)
+            st.metric(label=label, value=f"{val:.3f}s")
+    else:
+        st.caption("No turn yet.")