Spaces:

ub-aac-chatbot
/

aac-chatbot

Sleeping

App Files Files Community

aac-chatbot / CLAUDE.md

shwetangisingh

Add voice + air-writing conflict resolution

535a98d about 1 month ago

preview code

raw

history blame contribute delete

8.57 kB

Multimodal AAC Chatbot — Project Guide

What This Project Does

An AI chatbot that speaks as an AAC user, not to them. Given one of 14 personas — nine anchored in real memoirs and five in canonical fiction — it fuses real-time multimodal non-verbal signals with personal memory retrieval to generate responses in that person's authentic voice. Orchestrated as a plain Python function chain across five layers, with two conditional branches.

Architecture

frontend/                         React + Vite + TypeScript
  src/hooks/useSensing.ts         MediaPipe JS — affect, gesture, gaze, air-writing (browser-side)
  src/components/ChatPanel.tsx    Chat UI → POST /chat with sensing labels

backend/                          Python (conda env: aac-chatbot)
  main.py                         CLI entry point
  api/main.py                     FastAPI REST API
  pipeline/graph.py               run_pipeline() — plain function chain with 2 conditional branches
    pipeline/nodes/intent.py        L2 — LLM + Pydantic intent routing
    pipeline/nodes/retrieval.py     L3 — BGE embeddings + torch tensor cosine search (fast / full)
    pipeline/nodes/planner.py       L4 — expression-conditioned generation
    pipeline/nodes/feedback.py      L5 — JSONL turn logging + Bayesian bucket priors
  sensing/labels.py               GESTURE_TO_TAG label map (sensing itself runs in browser)
  retrieval/                      BGE embeddings (torch), Bayesian bucket priors
  generation/                     Two-tier LLM client (primary / fallback, both Ollama Cloud)
  guardrails/                     Input + output safety checks
  config/                         Pydantic BaseSettings — all config in one place

data/                             Shared data (personas, vector indexes)
logs/                             Per-turn JSONL logs (gitignored)

Key Design Decisions

Plain function chain orchestrates the pipeline (run_pipeline in backend/pipeline/graph.py): intent → retrieval → planner → feedback, with two conditional branches (affect picks fast/full retrieval; cumulative latency picks primary/fallback LLM). No LangGraph / LangChain dependency.
BGE-small-en-v1.5 for embeddings (beats MiniLM on MTEB at same speed)
Torch tensor matmul for vector search on the embedder's device (mps → cuda → cpu). No FAISS, no separate index format. Stored as vectors.pt per user. Headroom is ~100k vectors before approximate search (hnswlib) becomes worthwhile.
No reranker — cosine score from BGE-small carries the ranking signal at current scales. Revisit when per-query top_k grows past ~30.
Two-tier Ollama Cloud LLM: primary → fallback (when cumulative latency exceeds FALLBACK_LATENCY_THRESHOLD). Both tiers hit Ollama Cloud over the OpenAI-compatible endpoint. Models default to gemma4:31b-cloud; swap one when a larger cloud model is provisioned.
Pydantic-validated LLM routing output — intent.py retries on schema failures (3 attempts) before falling back to a default route
Expression-conditioned response shaping — affect steers tone, retrieval depth, and candidate ranking (not just metadata annotation)
Bayesian bucket priors — session-level P(bucket) updated after each accepted turn
Per-turn JSONL logging — one line per turn appended to logs/turns.jsonl (no MLflow). Query ad-hoc with DuckDB if needed.
Browser-side sensing — MediaPipe JS runs in React frontend, only classified labels (affect, gesture, gaze bucket) are sent to the backend API

Personas

Fourteen personas shipped. Real-memoir-anchored:

ID	Name	Condition	Access
`stephen_hawking`	Stephen Hawking	ALS (advanced)	Cheek-twitch + ACAT predictive speech
`jean_dominique_bauby`	Jean-Dominique Bauby	Locked-in syndrome	Alphabet-blink with amanuensis
`michael_j_fox`	Michael J. Fox	Parkinson's	Voice + adaptive keyboard + dictation
`gabby_giffords`	Gabby Giffords	Aphasia + right hemiparesis (post-TBI)	Left-hand typing + speech-to-text
`jason_becker`	Jason Becker	ALS (fully locked-in)	Eye-gaze + father's letter-code board
`tito_mukhopadhyay`	Tito Mukhopadhyay	Non-verbal autism	Letterboard + pencil
`christopher_reeve`	Christopher Reeve	C1–C2 spinal cord injury	Dictation to assistants; sip-and-puff
`christy_brown`	Christy Brown	Cerebral palsy (spastic quadriplegia)	Left foot typing / writing
`wendy_mitchell`	Wendy Mitchell	Early-onset dementia	Laptop/phone typing + "brain-book"

Canonical fiction:

ID	Name	Condition	Access
`abed_nadir`	Abed Nadir (Community)	Autism (coded); occasional selective mutism	Mostly verbal; text when overloaded
`allie_calhoun`	Allie Hamilton Calhoun (The Notebook)	Late-stage Alzheimer's	Verbal when lucid; yes/no otherwise
`forrest_gump`	Forrest Gump	Intellectual disability (IQ ~75)	Verbal primarily
`raymond_babbitt`	Raymond Babbitt (Rain Man)	Savant autism	Verbal when calm + visual schedules
`walter_jr_white`	Walter "Flynn" White Jr. (Breaking Bad)	Cerebral palsy	Verbal + smartphone typing

~25 bucketed memory chunks per persona (family / medical / hobbies / daily_routine / social; buckets tuned per-persona). A short-form voice push-to-talk mic surfaces only for personas whose modelled access method is verbal — see VOICE_CAPABLE_PERSONAS in frontend/src/lib/voiceEligibility.ts.

How to Run

# One-time setup
bash setup.sh

# CLI
python -m backend.main --debug

# Full stack
uvicorn backend.api.main:app --reload    # FastAPI on :8000
pnpm --dir frontend dev                  # React on :7550

Configuration

All config lives in backend/config/settings.py as Pydantic BaseSettings. Copy .env.example → .env and set:

ACTIVE_LLM_TIER — primary | fallback
PRIMARY_MODEL / FALLBACK_MODEL — Ollama Cloud model identifiers (e.g. gemma4:31b-cloud)
LOGS_DIR — where per-turn JSONL logs are written (default: logs/)

Data Files

Path	Purpose
`data/users.json`	Flat user index (id, name, condition, style)
`data/memories/<uid>.json`	Full persona JSON with bucketed memories
`data/vector_store/<uid>/`	`vectors.pt` + `meta.json` — rebuild after any persona edit
`data/generate_users.py`	Regenerates memories + users.json

Code Style

Keep comments to a minimum. Only comment what isn't obvious from the code. No file headers explaining what a module does (the name and code show that). No section divider banners (# ── Foo ──). No restating what the next line does. Prefer one-line comments when needed.
Skip from __future__ import annotations. The project is Python 3.10+ and uses native X | None / list[dict] syntax — the import adds nothing.

Development Notes

NEVER use local Ollama models (e.g. qwen3:8b, gemma3:1b) — this machine is not powerful enough and will break. Always use cloud-backed models like gemma4:31b-cloud via Ollama Cloud.
Adding a persona: add a memory JSON under data/memories/<uid>.json and a matching entry in data/users.json (or regenerate both via data/generate_users.py if present), then python -m backend.retrieval.vector_store to rebuild indexes. If the persona's modelled access method includes live speech, also add their id to VOICE_CAPABLE_PERSONAS in frontend/src/lib/voiceEligibility.ts so the mic button surfaces.
Changing LLM: set ACTIVE_LLM_TIER in .env — no code changes needed
Extending sensing: sensing runs in the React frontend (frontend/src/hooks/useSensing.ts); to add a new signal, classify it there and add a label field to PipelineState in backend/pipeline/state.py. Keep purely-data label maps in backend/sensing/labels.py.
Guardrail tuning: edit signal lists in backend/guardrails/checks.py
Affect → generation mapping: _AFFECT_CONFIG in backend/pipeline/nodes/intent.py and _PERSONA_TONE_OVERRIDES in backend/pipeline/nodes/planner.py
Vector indexes in data/vector_store/ are gitignored — rebuilt from source JSONs via python -m backend.retrieval.vector_store
Frontend uses pnpm, Node 22+