aac-chatbot / CLAUDE.md
shwetangisingh's picture
Add voice + air-writing conflict resolution
535a98d
# Multimodal AAC Chatbot β€” Project Guide
## What This Project Does
An AI chatbot that **speaks as an AAC user**, not to them. Given one of 14
personas β€” nine anchored in real memoirs and five in canonical fiction β€”
it fuses real-time multimodal non-verbal signals with personal memory
retrieval to generate responses in that person's authentic voice. Orchestrated
as a **plain Python function chain** across five layers, with two conditional
branches.
---
## Architecture
```
frontend/ React + Vite + TypeScript
src/hooks/useSensing.ts MediaPipe JS β€” affect, gesture, gaze, air-writing (browser-side)
src/components/ChatPanel.tsx Chat UI β†’ POST /chat with sensing labels
backend/ Python (conda env: aac-chatbot)
main.py CLI entry point
api/main.py FastAPI REST API
pipeline/graph.py run_pipeline() β€” plain function chain with 2 conditional branches
pipeline/nodes/intent.py L2 β€” LLM + Pydantic intent routing
pipeline/nodes/retrieval.py L3 β€” BGE embeddings + torch tensor cosine search (fast / full)
pipeline/nodes/planner.py L4 β€” expression-conditioned generation
pipeline/nodes/feedback.py L5 β€” JSONL turn logging + Bayesian bucket priors
sensing/labels.py GESTURE_TO_TAG label map (sensing itself runs in browser)
retrieval/ BGE embeddings (torch), Bayesian bucket priors
generation/ Two-tier LLM client (primary / fallback, both Ollama Cloud)
guardrails/ Input + output safety checks
config/ Pydantic BaseSettings β€” all config in one place
data/ Shared data (personas, vector indexes)
logs/ Per-turn JSONL logs (gitignored)
```
## Key Design Decisions
- **Plain function chain** orchestrates the pipeline (`run_pipeline` in
`backend/pipeline/graph.py`): intent β†’ retrieval β†’ planner β†’ feedback,
with two conditional branches (affect picks fast/full retrieval; cumulative
latency picks primary/fallback LLM). No LangGraph / LangChain dependency.
- **BGE-small-en-v1.5** for embeddings (beats MiniLM on MTEB at same speed)
- **Torch tensor matmul** for vector search on the embedder's device
(mps β†’ cuda β†’ cpu). No FAISS, no separate index format. Stored as
`vectors.pt` per user. Headroom is ~100k vectors before approximate
search (`hnswlib`) becomes worthwhile.
- **No reranker** β€” cosine score from BGE-small carries the ranking signal
at current scales. Revisit when per-query `top_k` grows past ~30.
- **Two-tier Ollama Cloud LLM**: `primary` β†’ `fallback` (when cumulative
latency exceeds `FALLBACK_LATENCY_THRESHOLD`). Both tiers hit Ollama
Cloud over the OpenAI-compatible endpoint. Models default to
`gemma4:31b-cloud`; swap one when a larger cloud model is provisioned.
- **Pydantic-validated** LLM routing output β€” `intent.py` retries on schema
failures (3 attempts) before falling back to a default route
- **Expression-conditioned response shaping** β€” affect steers tone, retrieval depth,
and candidate ranking (not just metadata annotation)
- **Bayesian bucket priors** β€” session-level P(bucket) updated after each accepted turn
- **Per-turn JSONL logging** β€” one line per turn appended to
`logs/turns.jsonl` (no MLflow). Query ad-hoc with DuckDB if needed.
- **Browser-side sensing** β€” MediaPipe JS runs in React frontend, only classified
labels (affect, gesture, gaze bucket) are sent to the backend API
---
## Personas
Fourteen personas shipped. Real-memoir-anchored:
| ID | Name | Condition | Access |
|----|------|-----------|--------|
| `stephen_hawking` | Stephen Hawking | ALS (advanced) | Cheek-twitch + ACAT predictive speech |
| `jean_dominique_bauby` | Jean-Dominique Bauby | Locked-in syndrome | Alphabet-blink with amanuensis |
| `michael_j_fox` | Michael J. Fox | Parkinson's | Voice + adaptive keyboard + dictation |
| `gabby_giffords` | Gabby Giffords | Aphasia + right hemiparesis (post-TBI) | Left-hand typing + speech-to-text |
| `jason_becker` | Jason Becker | ALS (fully locked-in) | Eye-gaze + father's letter-code board |
| `tito_mukhopadhyay` | Tito Mukhopadhyay | Non-verbal autism | Letterboard + pencil |
| `christopher_reeve` | Christopher Reeve | C1–C2 spinal cord injury | Dictation to assistants; sip-and-puff |
| `christy_brown` | Christy Brown | Cerebral palsy (spastic quadriplegia) | Left foot typing / writing |
| `wendy_mitchell` | Wendy Mitchell | Early-onset dementia | Laptop/phone typing + "brain-book" |
Canonical fiction:
| ID | Name | Condition | Access |
|----|------|-----------|--------|
| `abed_nadir` | Abed Nadir (*Community*) | Autism (coded); occasional selective mutism | Mostly verbal; text when overloaded |
| `allie_calhoun` | Allie Hamilton Calhoun (*The Notebook*) | Late-stage Alzheimer's | Verbal when lucid; yes/no otherwise |
| `forrest_gump` | Forrest Gump | Intellectual disability (IQ ~75) | Verbal primarily |
| `raymond_babbitt` | Raymond Babbitt (*Rain Man*) | Savant autism | Verbal when calm + visual schedules |
| `walter_jr_white` | Walter "Flynn" White Jr. (*Breaking Bad*) | Cerebral palsy | Verbal + smartphone typing |
~25 bucketed memory chunks per persona (`family` / `medical` / `hobbies` / `daily_routine` / `social`; buckets tuned per-persona). A short-form voice push-to-talk mic surfaces only for personas whose modelled access method is verbal β€” see `VOICE_CAPABLE_PERSONAS` in [frontend/src/lib/voiceEligibility.ts](frontend/src/lib/voiceEligibility.ts).
---
## How to Run
```bash
# One-time setup
bash setup.sh
# CLI
python -m backend.main --debug
# Full stack
uvicorn backend.api.main:app --reload # FastAPI on :8000
pnpm --dir frontend dev # React on :7550
```
---
## Configuration
All config lives in [backend/config/settings.py](backend/config/settings.py) as Pydantic `BaseSettings`.
Copy `.env.example` β†’ `.env` and set:
- `ACTIVE_LLM_TIER` β€” `primary` | `fallback`
- `PRIMARY_MODEL` / `FALLBACK_MODEL` β€” Ollama Cloud model identifiers
(e.g. `gemma4:31b-cloud`)
- `LOGS_DIR` β€” where per-turn JSONL logs are written (default: `logs/`)
---
## Data Files
| Path | Purpose |
|------|---------|
| `data/users.json` | Flat user index (id, name, condition, style) |
| `data/memories/<uid>.json` | Full persona JSON with bucketed memories |
| `data/vector_store/<uid>/` | `vectors.pt` + `meta.json` β€” **rebuild after any persona edit** |
| `data/generate_users.py` | Regenerates memories + users.json |
---
## Code Style
- **Keep comments to a minimum.** Only comment what isn't obvious from the
code. No file headers explaining what a module does (the name and code
show that). No section divider banners (`# ── Foo ──`). No restating
what the next line does. Prefer one-line comments when needed.
- **Skip `from __future__ import annotations`.** The project is Python 3.10+
and uses native `X | None` / `list[dict]` syntax β€” the import adds nothing.
## Development Notes
- **NEVER use local Ollama models** (e.g. `qwen3:8b`, `gemma3:1b`) β€” this machine
is not powerful enough and will break. Always use cloud-backed models like
`gemma4:31b-cloud` via Ollama Cloud.
- **Adding a persona**: add a memory JSON under `data/memories/<uid>.json` and
a matching entry in `data/users.json` (or regenerate both via
`data/generate_users.py` if present), then
`python -m backend.retrieval.vector_store` to rebuild indexes. If the
persona's modelled access method includes live speech, also add their `id`
to `VOICE_CAPABLE_PERSONAS` in `frontend/src/lib/voiceEligibility.ts` so
the mic button surfaces.
- **Changing LLM**: set `ACTIVE_LLM_TIER` in `.env` β€” no code changes needed
- **Extending sensing**: sensing runs in the React frontend
(`frontend/src/hooks/useSensing.ts`); to add a new signal, classify it
there and add a label field to `PipelineState` in
`backend/pipeline/state.py`. Keep purely-data label maps in
`backend/sensing/labels.py`.
- **Guardrail tuning**: edit signal lists in `backend/guardrails/checks.py`
- **Affect β†’ generation mapping**: `_AFFECT_CONFIG` in `backend/pipeline/nodes/intent.py`
and `_PERSONA_TONE_OVERRIDES` in `backend/pipeline/nodes/planner.py`
- Vector indexes in `data/vector_store/` are gitignored β€” rebuilt from source JSONs
via `python -m backend.retrieval.vector_store`
- Frontend uses pnpm, Node 22+