Spaces:
Sleeping
Sleeping
| # Multimodal AAC Chatbot β Project Guide | |
| ## What This Project Does | |
| An AI chatbot that **speaks as an AAC user**, not to them. Given one of 14 | |
| personas β nine anchored in real memoirs and five in canonical fiction β | |
| it fuses real-time multimodal non-verbal signals with personal memory | |
| retrieval to generate responses in that person's authentic voice. Orchestrated | |
| as a **plain Python function chain** across five layers, with two conditional | |
| branches. | |
| --- | |
| ## Architecture | |
| ``` | |
| frontend/ React + Vite + TypeScript | |
| src/hooks/useSensing.ts MediaPipe JS β affect, gesture, gaze, air-writing (browser-side) | |
| src/components/ChatPanel.tsx Chat UI β POST /chat with sensing labels | |
| backend/ Python (conda env: aac-chatbot) | |
| main.py CLI entry point | |
| api/main.py FastAPI REST API | |
| pipeline/graph.py run_pipeline() β plain function chain with 2 conditional branches | |
| pipeline/nodes/intent.py L2 β LLM + Pydantic intent routing | |
| pipeline/nodes/retrieval.py L3 β BGE embeddings + torch tensor cosine search (fast / full) | |
| pipeline/nodes/planner.py L4 β expression-conditioned generation | |
| pipeline/nodes/feedback.py L5 β JSONL turn logging + Bayesian bucket priors | |
| sensing/labels.py GESTURE_TO_TAG label map (sensing itself runs in browser) | |
| retrieval/ BGE embeddings (torch), Bayesian bucket priors | |
| generation/ Two-tier LLM client (primary / fallback, both Ollama Cloud) | |
| guardrails/ Input + output safety checks | |
| config/ Pydantic BaseSettings β all config in one place | |
| data/ Shared data (personas, vector indexes) | |
| logs/ Per-turn JSONL logs (gitignored) | |
| ``` | |
| ## Key Design Decisions | |
| - **Plain function chain** orchestrates the pipeline (`run_pipeline` in | |
| `backend/pipeline/graph.py`): intent β retrieval β planner β feedback, | |
| with two conditional branches (affect picks fast/full retrieval; cumulative | |
| latency picks primary/fallback LLM). No LangGraph / LangChain dependency. | |
| - **BGE-small-en-v1.5** for embeddings (beats MiniLM on MTEB at same speed) | |
| - **Torch tensor matmul** for vector search on the embedder's device | |
| (mps β cuda β cpu). No FAISS, no separate index format. Stored as | |
| `vectors.pt` per user. Headroom is ~100k vectors before approximate | |
| search (`hnswlib`) becomes worthwhile. | |
| - **No reranker** β cosine score from BGE-small carries the ranking signal | |
| at current scales. Revisit when per-query `top_k` grows past ~30. | |
| - **Two-tier Ollama Cloud LLM**: `primary` β `fallback` (when cumulative | |
| latency exceeds `FALLBACK_LATENCY_THRESHOLD`). Both tiers hit Ollama | |
| Cloud over the OpenAI-compatible endpoint. Models default to | |
| `gemma4:31b-cloud`; swap one when a larger cloud model is provisioned. | |
| - **Pydantic-validated** LLM routing output β `intent.py` retries on schema | |
| failures (3 attempts) before falling back to a default route | |
| - **Expression-conditioned response shaping** β affect steers tone, retrieval depth, | |
| and candidate ranking (not just metadata annotation) | |
| - **Bayesian bucket priors** β session-level P(bucket) updated after each accepted turn | |
| - **Per-turn JSONL logging** β one line per turn appended to | |
| `logs/turns.jsonl` (no MLflow). Query ad-hoc with DuckDB if needed. | |
| - **Browser-side sensing** β MediaPipe JS runs in React frontend, only classified | |
| labels (affect, gesture, gaze bucket) are sent to the backend API | |
| --- | |
| ## Personas | |
| Fourteen personas shipped. Real-memoir-anchored: | |
| | ID | Name | Condition | Access | | |
| |----|------|-----------|--------| | |
| | `stephen_hawking` | Stephen Hawking | ALS (advanced) | Cheek-twitch + ACAT predictive speech | | |
| | `jean_dominique_bauby` | Jean-Dominique Bauby | Locked-in syndrome | Alphabet-blink with amanuensis | | |
| | `michael_j_fox` | Michael J. Fox | Parkinson's | Voice + adaptive keyboard + dictation | | |
| | `gabby_giffords` | Gabby Giffords | Aphasia + right hemiparesis (post-TBI) | Left-hand typing + speech-to-text | | |
| | `jason_becker` | Jason Becker | ALS (fully locked-in) | Eye-gaze + father's letter-code board | | |
| | `tito_mukhopadhyay` | Tito Mukhopadhyay | Non-verbal autism | Letterboard + pencil | | |
| | `christopher_reeve` | Christopher Reeve | C1βC2 spinal cord injury | Dictation to assistants; sip-and-puff | | |
| | `christy_brown` | Christy Brown | Cerebral palsy (spastic quadriplegia) | Left foot typing / writing | | |
| | `wendy_mitchell` | Wendy Mitchell | Early-onset dementia | Laptop/phone typing + "brain-book" | | |
| Canonical fiction: | |
| | ID | Name | Condition | Access | | |
| |----|------|-----------|--------| | |
| | `abed_nadir` | Abed Nadir (*Community*) | Autism (coded); occasional selective mutism | Mostly verbal; text when overloaded | | |
| | `allie_calhoun` | Allie Hamilton Calhoun (*The Notebook*) | Late-stage Alzheimer's | Verbal when lucid; yes/no otherwise | | |
| | `forrest_gump` | Forrest Gump | Intellectual disability (IQ ~75) | Verbal primarily | | |
| | `raymond_babbitt` | Raymond Babbitt (*Rain Man*) | Savant autism | Verbal when calm + visual schedules | | |
| | `walter_jr_white` | Walter "Flynn" White Jr. (*Breaking Bad*) | Cerebral palsy | Verbal + smartphone typing | | |
| ~25 bucketed memory chunks per persona (`family` / `medical` / `hobbies` / `daily_routine` / `social`; buckets tuned per-persona). A short-form voice push-to-talk mic surfaces only for personas whose modelled access method is verbal β see `VOICE_CAPABLE_PERSONAS` in [frontend/src/lib/voiceEligibility.ts](frontend/src/lib/voiceEligibility.ts). | |
| --- | |
| ## How to Run | |
| ```bash | |
| # One-time setup | |
| bash setup.sh | |
| # CLI | |
| python -m backend.main --debug | |
| # Full stack | |
| uvicorn backend.api.main:app --reload # FastAPI on :8000 | |
| pnpm --dir frontend dev # React on :7550 | |
| ``` | |
| --- | |
| ## Configuration | |
| All config lives in [backend/config/settings.py](backend/config/settings.py) as Pydantic `BaseSettings`. | |
| Copy `.env.example` β `.env` and set: | |
| - `ACTIVE_LLM_TIER` β `primary` | `fallback` | |
| - `PRIMARY_MODEL` / `FALLBACK_MODEL` β Ollama Cloud model identifiers | |
| (e.g. `gemma4:31b-cloud`) | |
| - `LOGS_DIR` β where per-turn JSONL logs are written (default: `logs/`) | |
| --- | |
| ## Data Files | |
| | Path | Purpose | | |
| |------|---------| | |
| | `data/users.json` | Flat user index (id, name, condition, style) | | |
| | `data/memories/<uid>.json` | Full persona JSON with bucketed memories | | |
| | `data/vector_store/<uid>/` | `vectors.pt` + `meta.json` β **rebuild after any persona edit** | | |
| | `data/generate_users.py` | Regenerates memories + users.json | | |
| --- | |
| ## Code Style | |
| - **Keep comments to a minimum.** Only comment what isn't obvious from the | |
| code. No file headers explaining what a module does (the name and code | |
| show that). No section divider banners (`# ββ Foo ββ`). No restating | |
| what the next line does. Prefer one-line comments when needed. | |
| - **Skip `from __future__ import annotations`.** The project is Python 3.10+ | |
| and uses native `X | None` / `list[dict]` syntax β the import adds nothing. | |
| ## Development Notes | |
| - **NEVER use local Ollama models** (e.g. `qwen3:8b`, `gemma3:1b`) β this machine | |
| is not powerful enough and will break. Always use cloud-backed models like | |
| `gemma4:31b-cloud` via Ollama Cloud. | |
| - **Adding a persona**: add a memory JSON under `data/memories/<uid>.json` and | |
| a matching entry in `data/users.json` (or regenerate both via | |
| `data/generate_users.py` if present), then | |
| `python -m backend.retrieval.vector_store` to rebuild indexes. If the | |
| persona's modelled access method includes live speech, also add their `id` | |
| to `VOICE_CAPABLE_PERSONAS` in `frontend/src/lib/voiceEligibility.ts` so | |
| the mic button surfaces. | |
| - **Changing LLM**: set `ACTIVE_LLM_TIER` in `.env` β no code changes needed | |
| - **Extending sensing**: sensing runs in the React frontend | |
| (`frontend/src/hooks/useSensing.ts`); to add a new signal, classify it | |
| there and add a label field to `PipelineState` in | |
| `backend/pipeline/state.py`. Keep purely-data label maps in | |
| `backend/sensing/labels.py`. | |
| - **Guardrail tuning**: edit signal lists in `backend/guardrails/checks.py` | |
| - **Affect β generation mapping**: `_AFFECT_CONFIG` in `backend/pipeline/nodes/intent.py` | |
| and `_PERSONA_TONE_OVERRIDES` in `backend/pipeline/nodes/planner.py` | |
| - Vector indexes in `data/vector_store/` are gitignored β rebuilt from source JSONs | |
| via `python -m backend.retrieval.vector_store` | |
| - Frontend uses pnpm, Node 22+ | |