Spaces:

ub-aac-chatbot
/

aac-chatbot

Sleeping

App Files Files Community

aac-chatbot / CLAUDE.md

shwetangisingh

Add voice + air-writing conflict resolution

535a98d about 1 month ago

preview code

raw

history blame contribute delete

8.57 kB

	# Multimodal AAC Chatbot — Project Guide

	## What This Project Does

	An AI chatbot that speaks as an AAC user, not to them. Given one of 14
	personas — nine anchored in real memoirs and five in canonical fiction —
	it fuses real-time multimodal non-verbal signals with personal memory
	retrieval to generate responses in that person's authentic voice. Orchestrated
	as a plain Python function chain across five layers, with two conditional
	branches.

	---

	## Architecture

	```
	frontend/ React + Vite + TypeScript
	src/hooks/useSensing.ts MediaPipe JS — affect, gesture, gaze, air-writing (browser-side)
	src/components/ChatPanel.tsx Chat UI → POST /chat with sensing labels

	backend/ Python (conda env: aac-chatbot)
	main.py CLI entry point
	api/main.py FastAPI REST API
	pipeline/graph.py run_pipeline() — plain function chain with 2 conditional branches
	pipeline/nodes/intent.py L2 — LLM + Pydantic intent routing
	pipeline/nodes/retrieval.py L3 — BGE embeddings + torch tensor cosine search (fast / full)
	pipeline/nodes/planner.py L4 — expression-conditioned generation
	pipeline/nodes/feedback.py L5 — JSONL turn logging + Bayesian bucket priors
	sensing/labels.py GESTURE_TO_TAG label map (sensing itself runs in browser)
	retrieval/ BGE embeddings (torch), Bayesian bucket priors
	generation/ Two-tier LLM client (primary / fallback, both Ollama Cloud)
	guardrails/ Input + output safety checks
	config/ Pydantic BaseSettings — all config in one place

	data/ Shared data (personas, vector indexes)
	logs/ Per-turn JSONL logs (gitignored)
	```

	## Key Design Decisions

	- Plain function chain orchestrates the pipeline (`run_pipeline` in
	`backend/pipeline/graph.py`): intent → retrieval → planner → feedback,
	with two conditional branches (affect picks fast/full retrieval; cumulative
	latency picks primary/fallback LLM). No LangGraph / LangChain dependency.
	- BGE-small-en-v1.5 for embeddings (beats MiniLM on MTEB at same speed)
	- Torch tensor matmul for vector search on the embedder's device
	(mps → cuda → cpu). No FAISS, no separate index format. Stored as
	`vectors.pt` per user. Headroom is ~100k vectors before approximate
	search (`hnswlib`) becomes worthwhile.
	- No reranker — cosine score from BGE-small carries the ranking signal
	at current scales. Revisit when per-query `top_k` grows past ~30.
	- Two-tier Ollama Cloud LLM: `primary` → `fallback` (when cumulative
	latency exceeds `FALLBACK_LATENCY_THRESHOLD`). Both tiers hit Ollama
	Cloud over the OpenAI-compatible endpoint. Models default to
	`gemma4:31b-cloud`; swap one when a larger cloud model is provisioned.
	- Pydantic-validated LLM routing output — `intent.py` retries on schema
	failures (3 attempts) before falling back to a default route
	- Expression-conditioned response shaping — affect steers tone, retrieval depth,
	and candidate ranking (not just metadata annotation)
	- Bayesian bucket priors — session-level P(bucket) updated after each accepted turn
	- Per-turn JSONL logging — one line per turn appended to
	`logs/turns.jsonl` (no MLflow). Query ad-hoc with DuckDB if needed.
	- Browser-side sensing — MediaPipe JS runs in React frontend, only classified
	labels (affect, gesture, gaze bucket) are sent to the backend API

	---

	## Personas

	Fourteen personas shipped. Real-memoir-anchored:

	\| ID \| Name \| Condition \| Access \|
	\|----\|------\|-----------\|--------\|
	\| `stephen_hawking` \| Stephen Hawking \| ALS (advanced) \| Cheek-twitch + ACAT predictive speech \|
	\| `jean_dominique_bauby` \| Jean-Dominique Bauby \| Locked-in syndrome \| Alphabet-blink with amanuensis \|
	\| `michael_j_fox` \| Michael J. Fox \| Parkinson's \| Voice + adaptive keyboard + dictation \|
	\| `gabby_giffords` \| Gabby Giffords \| Aphasia + right hemiparesis (post-TBI) \| Left-hand typing + speech-to-text \|
	\| `jason_becker` \| Jason Becker \| ALS (fully locked-in) \| Eye-gaze + father's letter-code board \|
	\| `tito_mukhopadhyay` \| Tito Mukhopadhyay \| Non-verbal autism \| Letterboard + pencil \|
	\| `christopher_reeve` \| Christopher Reeve \| C1–C2 spinal cord injury \| Dictation to assistants; sip-and-puff \|
	\| `christy_brown` \| Christy Brown \| Cerebral palsy (spastic quadriplegia) \| Left foot typing / writing \|
	\| `wendy_mitchell` \| Wendy Mitchell \| Early-onset dementia \| Laptop/phone typing + "brain-book" \|

	Canonical fiction:

	\| ID \| Name \| Condition \| Access \|
	\|----\|------\|-----------\|--------\|
	\| `abed_nadir` \| Abed Nadir (Community) \| Autism (coded); occasional selective mutism \| Mostly verbal; text when overloaded \|
	\| `allie_calhoun` \| Allie Hamilton Calhoun (The Notebook) \| Late-stage Alzheimer's \| Verbal when lucid; yes/no otherwise \|
	\| `forrest_gump` \| Forrest Gump \| Intellectual disability (IQ ~75) \| Verbal primarily \|
	\| `raymond_babbitt` \| Raymond Babbitt (Rain Man) \| Savant autism \| Verbal when calm + visual schedules \|
	\| `walter_jr_white` \| Walter "Flynn" White Jr. (Breaking Bad) \| Cerebral palsy \| Verbal + smartphone typing \|

	~25 bucketed memory chunks per persona (`family` / `medical` / `hobbies` / `daily_routine` / `social`; buckets tuned per-persona). A short-form voice push-to-talk mic surfaces only for personas whose modelled access method is verbal — see `VOICE_CAPABLE_PERSONAS` in [frontend/src/lib/voiceEligibility.ts](frontend/src/lib/voiceEligibility.ts).

	---

	## How to Run

	```bash
	# One-time setup
	bash setup.sh

	# CLI
	python -m backend.main --debug

	# Full stack
	uvicorn backend.api.main:app --reload # FastAPI on :8000
	pnpm --dir frontend dev # React on :7550
	```

	---

	## Configuration

	All config lives in [backend/config/settings.py](backend/config/settings.py) as Pydantic `BaseSettings`.
	Copy `.env.example` → `.env` and set:

	- `ACTIVE_LLM_TIER` — `primary` \| `fallback`
	- `PRIMARY_MODEL` / `FALLBACK_MODEL` — Ollama Cloud model identifiers
	(e.g. `gemma4:31b-cloud`)
	- `LOGS_DIR` — where per-turn JSONL logs are written (default: `logs/`)

	---

	## Data Files

	\| Path \| Purpose \|
	\|------\|---------\|
	\| `data/users.json` \| Flat user index (id, name, condition, style) \|
	\| `data/memories/<uid>.json` \| Full persona JSON with bucketed memories \|
	\| `data/vector_store/<uid>/` \| `vectors.pt` + `meta.json` — rebuild after any persona edit \|
	\| `data/generate_users.py` \| Regenerates memories + users.json \|

	---

	## Code Style

	- Keep comments to a minimum. Only comment what isn't obvious from the
	code. No file headers explaining what a module does (the name and code
	show that). No section divider banners (`# ── Foo ──`). No restating
	what the next line does. Prefer one-line comments when needed.
	- Skip `from __future__ import annotations`. The project is Python 3.10+
	and uses native `X \| None` / `list[dict]` syntax — the import adds nothing.

	## Development Notes

	- NEVER use local Ollama models (e.g. `qwen3:8b`, `gemma3:1b`) — this machine
	is not powerful enough and will break. Always use cloud-backed models like
	`gemma4:31b-cloud` via Ollama Cloud.
	- Adding a persona: add a memory JSON under `data/memories/<uid>.json` and
	a matching entry in `data/users.json` (or regenerate both via
	`data/generate_users.py` if present), then
	`python -m backend.retrieval.vector_store` to rebuild indexes. If the
	persona's modelled access method includes live speech, also add their `id`
	to `VOICE_CAPABLE_PERSONAS` in `frontend/src/lib/voiceEligibility.ts` so
	the mic button surfaces.
	- Changing LLM: set `ACTIVE_LLM_TIER` in `.env` — no code changes needed
	- Extending sensing: sensing runs in the React frontend
	(`frontend/src/hooks/useSensing.ts`); to add a new signal, classify it
	there and add a label field to `PipelineState` in
	`backend/pipeline/state.py`. Keep purely-data label maps in
	`backend/sensing/labels.py`.
	- Guardrail tuning: edit signal lists in `backend/guardrails/checks.py`
	- Affect → generation mapping: `_AFFECT_CONFIG` in `backend/pipeline/nodes/intent.py`
	and `_PERSONA_TONE_OVERRIDES` in `backend/pipeline/nodes/planner.py`
	- Vector indexes in `data/vector_store/` are gitignored — rebuilt from source JSONs
	via `python -m backend.retrieval.vector_store`
	- Frontend uses pnpm, Node 22+