aac-chatbot / CLAUDE.md
shwetangisingh's picture
Add voice + air-writing conflict resolution
535a98d

Multimodal AAC Chatbot β€” Project Guide

What This Project Does

An AI chatbot that speaks as an AAC user, not to them. Given one of 14 personas β€” nine anchored in real memoirs and five in canonical fiction β€” it fuses real-time multimodal non-verbal signals with personal memory retrieval to generate responses in that person's authentic voice. Orchestrated as a plain Python function chain across five layers, with two conditional branches.


Architecture

frontend/                         React + Vite + TypeScript
  src/hooks/useSensing.ts         MediaPipe JS β€” affect, gesture, gaze, air-writing (browser-side)
  src/components/ChatPanel.tsx    Chat UI β†’ POST /chat with sensing labels

backend/                          Python (conda env: aac-chatbot)
  main.py                         CLI entry point
  api/main.py                     FastAPI REST API
  pipeline/graph.py               run_pipeline() β€” plain function chain with 2 conditional branches
    pipeline/nodes/intent.py        L2 β€” LLM + Pydantic intent routing
    pipeline/nodes/retrieval.py     L3 β€” BGE embeddings + torch tensor cosine search (fast / full)
    pipeline/nodes/planner.py       L4 β€” expression-conditioned generation
    pipeline/nodes/feedback.py      L5 β€” JSONL turn logging + Bayesian bucket priors
  sensing/labels.py               GESTURE_TO_TAG label map (sensing itself runs in browser)
  retrieval/                      BGE embeddings (torch), Bayesian bucket priors
  generation/                     Two-tier LLM client (primary / fallback, both Ollama Cloud)
  guardrails/                     Input + output safety checks
  config/                         Pydantic BaseSettings β€” all config in one place

data/                             Shared data (personas, vector indexes)
logs/                             Per-turn JSONL logs (gitignored)

Key Design Decisions

  • Plain function chain orchestrates the pipeline (run_pipeline in backend/pipeline/graph.py): intent β†’ retrieval β†’ planner β†’ feedback, with two conditional branches (affect picks fast/full retrieval; cumulative latency picks primary/fallback LLM). No LangGraph / LangChain dependency.
  • BGE-small-en-v1.5 for embeddings (beats MiniLM on MTEB at same speed)
  • Torch tensor matmul for vector search on the embedder's device (mps β†’ cuda β†’ cpu). No FAISS, no separate index format. Stored as vectors.pt per user. Headroom is ~100k vectors before approximate search (hnswlib) becomes worthwhile.
  • No reranker β€” cosine score from BGE-small carries the ranking signal at current scales. Revisit when per-query top_k grows past ~30.
  • Two-tier Ollama Cloud LLM: primary β†’ fallback (when cumulative latency exceeds FALLBACK_LATENCY_THRESHOLD). Both tiers hit Ollama Cloud over the OpenAI-compatible endpoint. Models default to gemma4:31b-cloud; swap one when a larger cloud model is provisioned.
  • Pydantic-validated LLM routing output β€” intent.py retries on schema failures (3 attempts) before falling back to a default route
  • Expression-conditioned response shaping β€” affect steers tone, retrieval depth, and candidate ranking (not just metadata annotation)
  • Bayesian bucket priors β€” session-level P(bucket) updated after each accepted turn
  • Per-turn JSONL logging β€” one line per turn appended to logs/turns.jsonl (no MLflow). Query ad-hoc with DuckDB if needed.
  • Browser-side sensing β€” MediaPipe JS runs in React frontend, only classified labels (affect, gesture, gaze bucket) are sent to the backend API

Personas

Fourteen personas shipped. Real-memoir-anchored:

ID Name Condition Access
stephen_hawking Stephen Hawking ALS (advanced) Cheek-twitch + ACAT predictive speech
jean_dominique_bauby Jean-Dominique Bauby Locked-in syndrome Alphabet-blink with amanuensis
michael_j_fox Michael J. Fox Parkinson's Voice + adaptive keyboard + dictation
gabby_giffords Gabby Giffords Aphasia + right hemiparesis (post-TBI) Left-hand typing + speech-to-text
jason_becker Jason Becker ALS (fully locked-in) Eye-gaze + father's letter-code board
tito_mukhopadhyay Tito Mukhopadhyay Non-verbal autism Letterboard + pencil
christopher_reeve Christopher Reeve C1–C2 spinal cord injury Dictation to assistants; sip-and-puff
christy_brown Christy Brown Cerebral palsy (spastic quadriplegia) Left foot typing / writing
wendy_mitchell Wendy Mitchell Early-onset dementia Laptop/phone typing + "brain-book"

Canonical fiction:

ID Name Condition Access
abed_nadir Abed Nadir (Community) Autism (coded); occasional selective mutism Mostly verbal; text when overloaded
allie_calhoun Allie Hamilton Calhoun (The Notebook) Late-stage Alzheimer's Verbal when lucid; yes/no otherwise
forrest_gump Forrest Gump Intellectual disability (IQ ~75) Verbal primarily
raymond_babbitt Raymond Babbitt (Rain Man) Savant autism Verbal when calm + visual schedules
walter_jr_white Walter "Flynn" White Jr. (Breaking Bad) Cerebral palsy Verbal + smartphone typing

~25 bucketed memory chunks per persona (family / medical / hobbies / daily_routine / social; buckets tuned per-persona). A short-form voice push-to-talk mic surfaces only for personas whose modelled access method is verbal β€” see VOICE_CAPABLE_PERSONAS in frontend/src/lib/voiceEligibility.ts.


How to Run

# One-time setup
bash setup.sh

# CLI
python -m backend.main --debug

# Full stack
uvicorn backend.api.main:app --reload    # FastAPI on :8000
pnpm --dir frontend dev                  # React on :7550

Configuration

All config lives in backend/config/settings.py as Pydantic BaseSettings. Copy .env.example β†’ .env and set:

  • ACTIVE_LLM_TIER β€” primary | fallback
  • PRIMARY_MODEL / FALLBACK_MODEL β€” Ollama Cloud model identifiers (e.g. gemma4:31b-cloud)
  • LOGS_DIR β€” where per-turn JSONL logs are written (default: logs/)

Data Files

Path Purpose
data/users.json Flat user index (id, name, condition, style)
data/memories/<uid>.json Full persona JSON with bucketed memories
data/vector_store/<uid>/ vectors.pt + meta.json β€” rebuild after any persona edit
data/generate_users.py Regenerates memories + users.json

Code Style

  • Keep comments to a minimum. Only comment what isn't obvious from the code. No file headers explaining what a module does (the name and code show that). No section divider banners (# ── Foo ──). No restating what the next line does. Prefer one-line comments when needed.
  • Skip from __future__ import annotations. The project is Python 3.10+ and uses native X | None / list[dict] syntax β€” the import adds nothing.

Development Notes

  • NEVER use local Ollama models (e.g. qwen3:8b, gemma3:1b) β€” this machine is not powerful enough and will break. Always use cloud-backed models like gemma4:31b-cloud via Ollama Cloud.
  • Adding a persona: add a memory JSON under data/memories/<uid>.json and a matching entry in data/users.json (or regenerate both via data/generate_users.py if present), then python -m backend.retrieval.vector_store to rebuild indexes. If the persona's modelled access method includes live speech, also add their id to VOICE_CAPABLE_PERSONAS in frontend/src/lib/voiceEligibility.ts so the mic button surfaces.
  • Changing LLM: set ACTIVE_LLM_TIER in .env β€” no code changes needed
  • Extending sensing: sensing runs in the React frontend (frontend/src/hooks/useSensing.ts); to add a new signal, classify it there and add a label field to PipelineState in backend/pipeline/state.py. Keep purely-data label maps in backend/sensing/labels.py.
  • Guardrail tuning: edit signal lists in backend/guardrails/checks.py
  • Affect β†’ generation mapping: _AFFECT_CONFIG in backend/pipeline/nodes/intent.py and _PERSONA_TONE_OVERRIDES in backend/pipeline/nodes/planner.py
  • Vector indexes in data/vector_store/ are gitignored β€” rebuilt from source JSONs via python -m backend.retrieval.vector_store
  • Frontend uses pnpm, Node 22+