Spaces:
Sleeping
Multimodal AAC Chatbot β Project Guide
What This Project Does
An AI chatbot that speaks as an AAC user, not to them. Given one of 14 personas β nine anchored in real memoirs and five in canonical fiction β it fuses real-time multimodal non-verbal signals with personal memory retrieval to generate responses in that person's authentic voice. Orchestrated as a plain Python function chain across five layers, with two conditional branches.
Architecture
frontend/ React + Vite + TypeScript
src/hooks/useSensing.ts MediaPipe JS β affect, gesture, gaze, air-writing (browser-side)
src/components/ChatPanel.tsx Chat UI β POST /chat with sensing labels
backend/ Python (conda env: aac-chatbot)
main.py CLI entry point
api/main.py FastAPI REST API
pipeline/graph.py run_pipeline() β plain function chain with 2 conditional branches
pipeline/nodes/intent.py L2 β LLM + Pydantic intent routing
pipeline/nodes/retrieval.py L3 β BGE embeddings + torch tensor cosine search (fast / full)
pipeline/nodes/planner.py L4 β expression-conditioned generation
pipeline/nodes/feedback.py L5 β JSONL turn logging + Bayesian bucket priors
sensing/labels.py GESTURE_TO_TAG label map (sensing itself runs in browser)
retrieval/ BGE embeddings (torch), Bayesian bucket priors
generation/ Two-tier LLM client (primary / fallback, both Ollama Cloud)
guardrails/ Input + output safety checks
config/ Pydantic BaseSettings β all config in one place
data/ Shared data (personas, vector indexes)
logs/ Per-turn JSONL logs (gitignored)
Key Design Decisions
- Plain function chain orchestrates the pipeline (
run_pipelineinbackend/pipeline/graph.py): intent β retrieval β planner β feedback, with two conditional branches (affect picks fast/full retrieval; cumulative latency picks primary/fallback LLM). No LangGraph / LangChain dependency. - BGE-small-en-v1.5 for embeddings (beats MiniLM on MTEB at same speed)
- Torch tensor matmul for vector search on the embedder's device
(mps β cuda β cpu). No FAISS, no separate index format. Stored as
vectors.ptper user. Headroom is ~100k vectors before approximate search (hnswlib) becomes worthwhile. - No reranker β cosine score from BGE-small carries the ranking signal
at current scales. Revisit when per-query
top_kgrows past ~30. - Two-tier Ollama Cloud LLM:
primaryβfallback(when cumulative latency exceedsFALLBACK_LATENCY_THRESHOLD). Both tiers hit Ollama Cloud over the OpenAI-compatible endpoint. Models default togemma4:31b-cloud; swap one when a larger cloud model is provisioned. - Pydantic-validated LLM routing output β
intent.pyretries on schema failures (3 attempts) before falling back to a default route - Expression-conditioned response shaping β affect steers tone, retrieval depth, and candidate ranking (not just metadata annotation)
- Bayesian bucket priors β session-level P(bucket) updated after each accepted turn
- Per-turn JSONL logging β one line per turn appended to
logs/turns.jsonl(no MLflow). Query ad-hoc with DuckDB if needed. - Browser-side sensing β MediaPipe JS runs in React frontend, only classified labels (affect, gesture, gaze bucket) are sent to the backend API
Personas
Fourteen personas shipped. Real-memoir-anchored:
| ID | Name | Condition | Access |
|---|---|---|---|
stephen_hawking |
Stephen Hawking | ALS (advanced) | Cheek-twitch + ACAT predictive speech |
jean_dominique_bauby |
Jean-Dominique Bauby | Locked-in syndrome | Alphabet-blink with amanuensis |
michael_j_fox |
Michael J. Fox | Parkinson's | Voice + adaptive keyboard + dictation |
gabby_giffords |
Gabby Giffords | Aphasia + right hemiparesis (post-TBI) | Left-hand typing + speech-to-text |
jason_becker |
Jason Becker | ALS (fully locked-in) | Eye-gaze + father's letter-code board |
tito_mukhopadhyay |
Tito Mukhopadhyay | Non-verbal autism | Letterboard + pencil |
christopher_reeve |
Christopher Reeve | C1βC2 spinal cord injury | Dictation to assistants; sip-and-puff |
christy_brown |
Christy Brown | Cerebral palsy (spastic quadriplegia) | Left foot typing / writing |
wendy_mitchell |
Wendy Mitchell | Early-onset dementia | Laptop/phone typing + "brain-book" |
Canonical fiction:
| ID | Name | Condition | Access |
|---|---|---|---|
abed_nadir |
Abed Nadir (Community) | Autism (coded); occasional selective mutism | Mostly verbal; text when overloaded |
allie_calhoun |
Allie Hamilton Calhoun (The Notebook) | Late-stage Alzheimer's | Verbal when lucid; yes/no otherwise |
forrest_gump |
Forrest Gump | Intellectual disability (IQ ~75) | Verbal primarily |
raymond_babbitt |
Raymond Babbitt (Rain Man) | Savant autism | Verbal when calm + visual schedules |
walter_jr_white |
Walter "Flynn" White Jr. (Breaking Bad) | Cerebral palsy | Verbal + smartphone typing |
~25 bucketed memory chunks per persona (family / medical / hobbies / daily_routine / social; buckets tuned per-persona). A short-form voice push-to-talk mic surfaces only for personas whose modelled access method is verbal β see VOICE_CAPABLE_PERSONAS in frontend/src/lib/voiceEligibility.ts.
How to Run
# One-time setup
bash setup.sh
# CLI
python -m backend.main --debug
# Full stack
uvicorn backend.api.main:app --reload # FastAPI on :8000
pnpm --dir frontend dev # React on :7550
Configuration
All config lives in backend/config/settings.py as Pydantic BaseSettings.
Copy .env.example β .env and set:
ACTIVE_LLM_TIERβprimary|fallbackPRIMARY_MODEL/FALLBACK_MODELβ Ollama Cloud model identifiers (e.g.gemma4:31b-cloud)LOGS_DIRβ where per-turn JSONL logs are written (default:logs/)
Data Files
| Path | Purpose |
|---|---|
data/users.json |
Flat user index (id, name, condition, style) |
data/memories/<uid>.json |
Full persona JSON with bucketed memories |
data/vector_store/<uid>/ |
vectors.pt + meta.json β rebuild after any persona edit |
data/generate_users.py |
Regenerates memories + users.json |
Code Style
- Keep comments to a minimum. Only comment what isn't obvious from the
code. No file headers explaining what a module does (the name and code
show that). No section divider banners (
# ββ Foo ββ). No restating what the next line does. Prefer one-line comments when needed. - Skip
from __future__ import annotations. The project is Python 3.10+ and uses nativeX | None/list[dict]syntax β the import adds nothing.
Development Notes
- NEVER use local Ollama models (e.g.
qwen3:8b,gemma3:1b) β this machine is not powerful enough and will break. Always use cloud-backed models likegemma4:31b-cloudvia Ollama Cloud. - Adding a persona: add a memory JSON under
data/memories/<uid>.jsonand a matching entry indata/users.json(or regenerate both viadata/generate_users.pyif present), thenpython -m backend.retrieval.vector_storeto rebuild indexes. If the persona's modelled access method includes live speech, also add theiridtoVOICE_CAPABLE_PERSONASinfrontend/src/lib/voiceEligibility.tsso the mic button surfaces. - Changing LLM: set
ACTIVE_LLM_TIERin.envβ no code changes needed - Extending sensing: sensing runs in the React frontend
(
frontend/src/hooks/useSensing.ts); to add a new signal, classify it there and add a label field toPipelineStateinbackend/pipeline/state.py. Keep purely-data label maps inbackend/sensing/labels.py. - Guardrail tuning: edit signal lists in
backend/guardrails/checks.py - Affect β generation mapping:
_AFFECT_CONFIGinbackend/pipeline/nodes/intent.pyand_PERSONA_TONE_OVERRIDESinbackend/pipeline/nodes/planner.py - Vector indexes in
data/vector_store/are gitignored β rebuilt from source JSONs viapython -m backend.retrieval.vector_store - Frontend uses pnpm, Node 22+