Spaces:

ub-aac-chatbot
/

aac-chatbot

Sleeping

shwetangisingh commited on Apr 20

Commit

535a98d

1 Parent(s): c09a7e7

Add voice + air-writing conflict resolution

Push-to-talk Web Speech mic (gated to verbal-access personas) plus
a Jaccard + AAC-priority-token resolver. The resulting
{text, source, voice_text, air_text} drives a supplemental sub-intent
and source-aware planner copy across five branches. Voice/air text is
sanitised before LLM interpolation to close a prompt-injection vector.

Also refreshes CLAUDE.md's persona table to match data/users.json.

Files changed (14) hide show

CLAUDE.md +35 -11
README.md +1 -1
backend/api/main.py +13 -0
backend/main.py +2 -0
backend/pipeline/nodes/feedback.py +3 -0
backend/pipeline/nodes/intent.py +12 -9
backend/pipeline/nodes/planner.py +72 -8
backend/pipeline/state.py +4 -0
frontend/src/App.css +24 -0
frontend/src/components/ChatPanel.tsx +67 -0
frontend/src/hooks/useVoice.ts +170 -0
frontend/src/lib/resolveIntent.ts +109 -0
frontend/src/lib/voiceEligibility.ts +18 -0
frontend/src/types.ts +17 -0

CLAUDE.md CHANGED Viewed

@@ -2,11 +2,12 @@
 ## What This Project Does
-An AI chatbot that **speaks as an AAC user**, not to them. Given a user persona
-(Mia, Gerald, or Arjun), it fuses real-time multimodal non-verbal signals with
-personal memory retrieval to generate responses in that person's authentic voice.
-Orchestrated as a **plain Python function chain** across five layers, with two
-conditional branches.
 ---
@@ -66,13 +67,31 @@ logs/                             Per-turn JSONL logs (gitignored)
 ## Personas
 | ID | Name | Condition | Access |
 |----|------|-----------|--------|
-| `mia_chen` | Mia Chen, 28 | Cerebral palsy | Webcam head-tracking |
-| `gerald_okafor` | Gerald Okafor, 61 | ALS (early-mid) | Eye-gaze device |
-| `arjun_mehta` | Arjun Mehta, 17 | Autism (non-verbal) | Tablet touch grid |
-25 memory chunks each (5 buckets × 5 memories). Arjun code-switches Hindi/English.
 ---
@@ -129,8 +148,13 @@ Copy `.env.example` → `.env` and set:
 - **NEVER use local Ollama models** (e.g. `qwen3:8b`, `gemma3:1b`) — this machine
   is not powerful enough and will break. Always use cloud-backed models like
   `gemma4:31b-cloud` via Ollama Cloud.
-- **Adding a persona**: add to `PERSONAS` in `data/generate_users.py`, re-run it,
-  then `python -m backend.retrieval.vector_store` to rebuild indexes
 - **Changing LLM**: set `ACTIVE_LLM_TIER` in `.env` — no code changes needed
 - **Extending sensing**: sensing runs in the React frontend
   (`frontend/src/hooks/useSensing.ts`); to add a new signal, classify it

 ## What This Project Does
+An AI chatbot that **speaks as an AAC user**, not to them. Given one of 14
+personas — nine anchored in real memoirs and five in canonical fiction —
+it fuses real-time multimodal non-verbal signals with personal memory
+retrieval to generate responses in that person's authentic voice. Orchestrated
+as a **plain Python function chain** across five layers, with two conditional
+branches.
 ---
 ## Personas
+Fourteen personas shipped. Real-memoir-anchored:
+| ID | Name | Condition | Access |
+|----|------|-----------|--------|
+| `stephen_hawking` | Stephen Hawking | ALS (advanced) | Cheek-twitch + ACAT predictive speech |
+| `jean_dominique_bauby` | Jean-Dominique Bauby | Locked-in syndrome | Alphabet-blink with amanuensis |
+| `michael_j_fox` | Michael J. Fox | Parkinson's | Voice + adaptive keyboard + dictation |
+| `gabby_giffords` | Gabby Giffords | Aphasia + right hemiparesis (post-TBI) | Left-hand typing + speech-to-text |
+| `jason_becker` | Jason Becker | ALS (fully locked-in) | Eye-gaze + father's letter-code board |
+| `tito_mukhopadhyay` | Tito Mukhopadhyay | Non-verbal autism | Letterboard + pencil |
+| `christopher_reeve` | Christopher Reeve | C1–C2 spinal cord injury | Dictation to assistants; sip-and-puff |
+| `christy_brown` | Christy Brown | Cerebral palsy (spastic quadriplegia) | Left foot typing / writing |
+| `wendy_mitchell` | Wendy Mitchell | Early-onset dementia | Laptop/phone typing + "brain-book" |
+Canonical fiction:
 | ID | Name | Condition | Access |
 |----|------|-----------|--------|
+| `abed_nadir` | Abed Nadir (*Community*) | Autism (coded); occasional selective mutism | Mostly verbal; text when overloaded |
+| `allie_calhoun` | Allie Hamilton Calhoun (*The Notebook*) | Late-stage Alzheimer's | Verbal when lucid; yes/no otherwise |
+| `forrest_gump` | Forrest Gump | Intellectual disability (IQ ~75) | Verbal primarily |
+| `raymond_babbitt` | Raymond Babbitt (*Rain Man*) | Savant autism | Verbal when calm + visual schedules |
+| `walter_jr_white` | Walter "Flynn" White Jr. (*Breaking Bad*) | Cerebral palsy | Verbal + smartphone typing |
+~25 bucketed memory chunks per persona (`family` / `medical` / `hobbies` / `daily_routine` / `social`; buckets tuned per-persona). A short-form voice push-to-talk mic surfaces only for personas whose modelled access method is verbal — see `VOICE_CAPABLE_PERSONAS` in [frontend/src/lib/voiceEligibility.ts](frontend/src/lib/voiceEligibility.ts).
 ---
 - **NEVER use local Ollama models** (e.g. `qwen3:8b`, `gemma3:1b`) — this machine
   is not powerful enough and will break. Always use cloud-backed models like
   `gemma4:31b-cloud` via Ollama Cloud.
+- **Adding a persona**: add a memory JSON under `data/memories/<uid>.json` and
+  a matching entry in `data/users.json` (or regenerate both via
+  `data/generate_users.py` if present), then
+  `python -m backend.retrieval.vector_store` to rebuild indexes. If the
+  persona's modelled access method includes live speech, also add their `id`
+  to `VOICE_CAPABLE_PERSONAS` in `frontend/src/lib/voiceEligibility.ts` so
+  the mic button surfaces.
 - **Changing LLM**: set `ACTIVE_LLM_TIER` in `.env` — no code changes needed
 - **Extending sensing**: sensing runs in the React frontend
   (`frontend/src/hooks/useSensing.ts`); to add a new signal, classify it

README.md CHANGED Viewed

@@ -400,7 +400,7 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
   - Calibration is now averaged over the first 30 frames (~1s of neutral face) instead of a single-frame snapshot — a brief smile at startup used to lock in a biased baseline. Affect stays null during calibration; gaze/head/gesture/air-writing still flow.
 - [x] **[Core]** Gestures (`THUMBS_UP` / `THUMBS_DOWN` / `POINTING` / `WAVING`) now carry an `opener_hint` via `GESTURE_DIRECTIVES` in [backend/sensing/labels.py](backend/sensing/labels.py). A detected thumbs-up overrides the affect opener and tells the LLM to lead with an affirmation.
 - [x] **[Core]** Air-writing carries a default template bank ([frontend/src/lib/airTemplates.ts](frontend/src/lib/airTemplates.ts): `yes` / `?` / `hi` / `help` / `done` / `more` / `water` / `stop`) — all single-stroke shapes so DTW can match reliably. On match, the word flows through the pipeline three ways: (1) retrieval picks up the word as an extra `PERSONAL` sub-intent with a bucket hint (see `infer_bucket` in [backend/sensing/bucket_keywords.py](backend/sensing/bucket_keywords.py) — e.g. `help` → medical, `water` → daily_routine), (2) the planner includes an explicit "the user air-wrote X — incorporate verbatim if appropriate" instruction in the user message, and (3) the word appears in `logs/turns.jsonl` for debugging. The recognizer has a `MATCH_THRESHOLD` reject gate and `console.debug`s on empty-bank / no-match so unrecognised strokes never reach the backend. To add more templates, append entries to `DEFAULT_AIR_TEMPLATES` as 32-point normalised single-stroke trajectories.
-- [ ] **[Bonus]** Voice + air-writing conflict resolution. Capture short voice (Web Speech API), compare to air-written intent, send a `resolved_intent`
 - [ ] Thumbs-up currently biases the opener via the prompt. Once generation emits N candidates, move this to candidate reranking for a stronger signal.
 ### Intent decomposition

   - Calibration is now averaged over the first 30 frames (~1s of neutral face) instead of a single-frame snapshot — a brief smile at startup used to lock in a biased baseline. Affect stays null during calibration; gaze/head/gesture/air-writing still flow.
 - [x] **[Core]** Gestures (`THUMBS_UP` / `THUMBS_DOWN` / `POINTING` / `WAVING`) now carry an `opener_hint` via `GESTURE_DIRECTIVES` in [backend/sensing/labels.py](backend/sensing/labels.py). A detected thumbs-up overrides the affect opener and tells the LLM to lead with an affirmation.
 - [x] **[Core]** Air-writing carries a default template bank ([frontend/src/lib/airTemplates.ts](frontend/src/lib/airTemplates.ts): `yes` / `?` / `hi` / `help` / `done` / `more` / `water` / `stop`) — all single-stroke shapes so DTW can match reliably. On match, the word flows through the pipeline three ways: (1) retrieval picks up the word as an extra `PERSONAL` sub-intent with a bucket hint (see `infer_bucket` in [backend/sensing/bucket_keywords.py](backend/sensing/bucket_keywords.py) — e.g. `help` → medical, `water` → daily_routine), (2) the planner includes an explicit "the user air-wrote X — incorporate verbatim if appropriate" instruction in the user message, and (3) the word appears in `logs/turns.jsonl` for debugging. The recognizer has a `MATCH_THRESHOLD` reject gate and `console.debug`s on empty-bank / no-match so unrecognised strokes never reach the backend. To add more templates, append entries to `DEFAULT_AIR_TEMPLATES` as 32-point normalised single-stroke trajectories.
+- [x] **[Bonus]** Voice + air-writing conflict resolution. A push-to-talk mic ([frontend/src/hooks/useVoice.ts](frontend/src/hooks/useVoice.ts)) captures a short Web Speech utterance; [frontend/src/lib/resolveIntent.ts](frontend/src/lib/resolveIntent.ts) merges it against the air-written text using Jaccard token overlap + AAC-priority tokens (`help/stop/water/done/more` win ties). The resolver emits a `{text, source, voice_text, air_text}` payload — `source ∈ voice_only | air_only | agree | conflict_air | conflict_voice` — which the backend uses in [backend/pipeline/nodes/intent.py](backend/pipeline/nodes/intent.py) to pick the supplemental sub-intent, and in [backend/pipeline/nodes/planner.py](backend/pipeline/nodes/planner.py) to render source-aware prompt copy (conflicts are acknowledged instead of silently overwritten). The mic is gated by persona via `VOICE_CAPABLE_PERSONAS` in [frontend/src/lib/voiceEligibility.ts](frontend/src/lib/voiceEligibility.ts) — only personas whose modelled access method is verbal (Abed, Allie, Forrest, Gabby, Michael J. Fox, Raymond, Walter Jr.) see the button; non-verbal / locked-in / letterboard personas don't.
 - [ ] Thumbs-up currently biases the opener via the prompt. Once generation emits N candidates, move this to candidate reranking for a stronger signal.
 ### Intent decomposition

backend/api/main.py CHANGED Viewed

@@ -98,6 +98,13 @@ def _reserve_eval_slot(run_id: str) -> None:
 # ── Request / response schemas ─────────────────────────────────────────────────
 class ChatRequest(BaseModel):
     user_id: str
     query: str
@@ -106,6 +113,8 @@ class ChatRequest(BaseModel):
     gaze_bucket: str | None = None
     air_written_text: str | None = None
     head_signal: str | None = None  # "HEAD_SHAKE"|"HEAD_NOD_DISSATISFIED"
 class TurnaroundRequest(BaseModel):
@@ -210,6 +219,10 @@ def _build_initial_state(req: ChatRequest, session: dict) -> PipelineState:
         gaze_bucket=req.gaze_bucket,
         air_written_text=req.air_written_text,
         head_signal=req.head_signal,
         turnaround_triggered=False,
         raw_query=req.query,
         intent_route=None,

 # ── Request / response schemas ─────────────────────────────────────────────────
+class ResolvedIntent(BaseModel):
+    text: str
+    source: str  # voice_only | air_only | agree | conflict_air | conflict_voice | none
+    voice_text: str | None = None
+    air_text: str | None = None
 class ChatRequest(BaseModel):
     user_id: str
     query: str
     gaze_bucket: str | None = None
     air_written_text: str | None = None
     head_signal: str | None = None  # "HEAD_SHAKE"|"HEAD_NOD_DISSATISFIED"
+    voice_text: str | None = None
+    resolved_intent: ResolvedIntent | None = None
 class TurnaroundRequest(BaseModel):
         gaze_bucket=req.gaze_bucket,
         air_written_text=req.air_written_text,
         head_signal=req.head_signal,
+        voice_text=req.voice_text,
+        resolved_intent=(
+            req.resolved_intent.model_dump() if req.resolved_intent else None
+        ),
         turnaround_triggered=False,
         raw_query=req.query,
         intent_route=None,

backend/main.py CHANGED Viewed

@@ -171,6 +171,8 @@ def main() -> None:
             gesture_tag=None,
             gaze_bucket=None,
             air_written_text=None,
             raw_query=query,
             intent_route=pre_route,  # pre-filled → intent node sees it and skips LLM call
             generation_config=pre_gen_config,

             gesture_tag=None,
             gaze_bucket=None,
             air_written_text=None,
+            voice_text=None,
+            resolved_intent=None,
             raw_query=query,
             intent_route=pre_route,  # pre-filled → intent node sees it and skips LLM call
             generation_config=pre_gen_config,

backend/pipeline/nodes/feedback.py CHANGED Viewed

@@ -51,6 +51,9 @@ def _log_to_jsonl(
         "retrieval_mode": state.get("retrieval_mode_used", "unknown"),
         "affect": affect,
         "head_signal": state.get("head_signal"),
         "turnaround_triggered": state.get("turnaround_triggered", False),
         "guardrail_passed": state.get("guardrail_passed", True),
         "num_chunks": len(chunks),

         "retrieval_mode": state.get("retrieval_mode_used", "unknown"),
         "affect": affect,
         "head_signal": state.get("head_signal"),
+        "air_written_text": state.get("air_written_text"),
+        "voice_text": state.get("voice_text"),
+        "resolved_intent": state.get("resolved_intent"),
         "turnaround_triggered": state.get("turnaround_triggered", False),
         "guardrail_passed": state.get("guardrail_passed", True),
         "num_chunks": len(chunks),

backend/pipeline/nodes/intent.py CHANGED Viewed

@@ -256,18 +256,21 @@ def run(state: PipelineState) -> dict:
             }
         ]
-    air_written = state.get("air_written_text")
-    if air_written:
-        # Classify the air-written supplement the same way as a normal fragment
-        # so a present-tense supplement ("tired") on a present-state question
         # doesn't silently flip the route to PERSONAL and re-enable retrieval.
-        air_cls = _classify(air_written)
         sub_intents.append(
             {
-                "type": air_cls,
-                "query": air_written,
-                "bucket_hint": infer_bucket(air_written)
-                if air_cls == "PERSONAL"
                 else None,
                 "priority": priority,
             }

             }
         ]
+    # Prefer resolved_intent.text when the frontend did voice⇄air reconciliation;
+    # fall back to raw air_written_text when no voice was captured.
+    resolved = state.get("resolved_intent") or {}
+    supplement = (resolved.get("text") or "").strip() or state.get("air_written_text")
+    if supplement:
+        # Classify the supplement the same way as a normal fragment so a
+        # present-tense supplement ("tired") on a present-state question
         # doesn't silently flip the route to PERSONAL and re-enable retrieval.
+        sup_cls = _classify(supplement)
         sub_intents.append(
             {
+                "type": sup_cls,
+                "query": supplement,
+                "bucket_hint": infer_bucket(supplement)
+                if sup_cls == "PERSONAL"
                 else None,
                 "priority": priority,
             }

backend/pipeline/nodes/planner.py CHANGED Viewed

@@ -95,6 +95,7 @@ def _run_stream(state: PipelineState, tier: str) -> Iterator[dict]:
     style: StyleDirective = gen_cfg["style"]
     gesture_tag = state.get("gesture_tag")
     air_written_text = state.get("air_written_text")
     turnaround_triggered = state.get("turnaround_triggered", False)
     rejected_response: str | None = None
     if turnaround_triggered:
@@ -195,6 +196,7 @@ def _run_stream(state: PipelineState, tier: str) -> Iterator[dict]:
             gen_cfg,
             gesture_tag=gesture_tag,
             air_written_text=air_written_text,
             rejected_response=rejected_response,
             rejected_candidates=rejected_candidates,
             intent_kind=intent_kind,
@@ -367,6 +369,7 @@ def _run(state: PipelineState, tier: str) -> dict:
     style: StyleDirective = gen_cfg["style"]
     gesture_tag = state.get("gesture_tag")
     air_written_text = state.get("air_written_text")
     turnaround_triggered = state.get("turnaround_triggered", False)
     rejected_response: str | None = None
     if turnaround_triggered:
@@ -409,6 +412,7 @@ def _run(state: PipelineState, tier: str) -> dict:
             gen_cfg,
             gesture_tag=gesture_tag,
             air_written_text=air_written_text,
             rejected_response=rejected_response,
             rejected_candidates=rejected_candidates,
             intent_kind=intent_kind,
@@ -502,6 +506,7 @@ def _run(state: PipelineState, tier: str) -> dict:
         gen_cfg,
         gesture_tag=gesture_tag,
         air_written_text=air_written_text,
         rejected_response=rejected_response,
         intent_kind=intent_kind,
         affect=affect,
@@ -541,6 +546,7 @@ def _build_messages(
     gen_cfg: dict,
     gesture_tag: str | None = None,
     air_written_text: str | None = None,
     rejected_response: str | None = None,
     rejected_candidates: list[str] | None = None,
     intent_kind: str = "memory",
@@ -560,6 +566,7 @@ def _build_messages(
         gesture_tag,
         air_written_text,
         profile["name"],
         rejected_response=rejected_response,
         rejected_candidates=rejected_candidates,
         intent_kind=intent_kind,
@@ -611,6 +618,69 @@ Answering rules:
 --- end character sheet ---"""
 def _build_user(
     chunks: list[dict],
     history: list[dict],
@@ -621,6 +691,7 @@ def _build_user(
     air_written_text: str | None,
     persona_name: str,
     *,
     rejected_response: str | None = None,
     rejected_candidates: list[str] | None = None,
     intent_kind: str = "memory",
@@ -665,14 +736,7 @@ def _build_user(
             # Gesture opener wins over affect opener — a deliberate thumbs-up is a stronger signal than inferred affect.
             merged_opener = directive["opener_hint"]
-    air_writing_block = ""
-    if air_written_text:
-        air_writing_block = (
-            f'\nThe user air-wrote: "{air_written_text}". '
-            "If this looks like a name, noun, or short phrase, "
-            "incorporate it verbatim into your response; "
-            "otherwise use it as a hint about what they're trying to say."
-        )
     persona_mod = gen_cfg.get("persona_mod", "baseline")
     persona_instruction_line = (

     style: StyleDirective = gen_cfg["style"]
     gesture_tag = state.get("gesture_tag")
     air_written_text = state.get("air_written_text")
+    resolved_intent = state.get("resolved_intent")
     turnaround_triggered = state.get("turnaround_triggered", False)
     rejected_response: str | None = None
     if turnaround_triggered:
             gen_cfg,
             gesture_tag=gesture_tag,
             air_written_text=air_written_text,
+            resolved_intent=resolved_intent,
             rejected_response=rejected_response,
             rejected_candidates=rejected_candidates,
             intent_kind=intent_kind,
     style: StyleDirective = gen_cfg["style"]
     gesture_tag = state.get("gesture_tag")
     air_written_text = state.get("air_written_text")
+    resolved_intent = state.get("resolved_intent")
     turnaround_triggered = state.get("turnaround_triggered", False)
     rejected_response: str | None = None
     if turnaround_triggered:
             gen_cfg,
             gesture_tag=gesture_tag,
             air_written_text=air_written_text,
+            resolved_intent=resolved_intent,
             rejected_response=rejected_response,
             rejected_candidates=rejected_candidates,
             intent_kind=intent_kind,
         gen_cfg,
         gesture_tag=gesture_tag,
         air_written_text=air_written_text,
+        resolved_intent=resolved_intent,
         rejected_response=rejected_response,
         intent_kind=intent_kind,
         affect=affect,
     gen_cfg: dict,
     gesture_tag: str | None = None,
     air_written_text: str | None = None,
+    resolved_intent: dict | None = None,
     rejected_response: str | None = None,
     rejected_candidates: list[str] | None = None,
     intent_kind: str = "memory",
         gesture_tag,
         air_written_text,
         profile["name"],
+        resolved_intent=resolved_intent,
         rejected_response=rejected_response,
         rejected_candidates=rejected_candidates,
         intent_kind=intent_kind,
 --- end character sheet ---"""
+def _safe_user_text(s: str) -> str:
+    # voice_text / air_text arrive from untrusted channels (Web Speech,
+    # air-writing DTW). They're f-stringed into LLM messages wrapped in
+    # double-quotes — a transcript containing `"` or newlines would break out
+    # of the quoted region and could inject instructions. Strip those and cap
+    # length. Same pattern as `safe_rejected` for `rejected_response`.
+    return s.replace('"', "'").replace("\n", " ").replace("\r", " ")[:200]
+def _format_multimodal_intent(
+    resolved: dict | None, air_written_text: str | None
+) -> str:
+    # Branch on resolved_intent.source so the model sees voice⇄air-writing
+    # disagreements explicitly instead of getting a single text without context.
+    if resolved:
+        source = resolved.get("source") or "none"
+        voice_t = _safe_user_text((resolved.get("voice_text") or "").strip())
+        air_t = _safe_user_text((resolved.get("air_text") or "").strip())
+        text = _safe_user_text((resolved.get("text") or "").strip())
+        if source == "voice_only" and voice_t:
+            return (
+                f'\nThe user spoke aloud: "{voice_t}". '
+                "Treat this as a supplement to the partner's question — "
+                "a hint or clarification about what they want."
+            )
+        if source == "air_only" and air_t:
+            return (
+                f'\nThe user air-wrote: "{air_t}". '
+                "If this looks like a name, noun, or short phrase, "
+                "incorporate it verbatim into your response; "
+                "otherwise use it as a hint about what they're trying to say."
+            )
+        if source == "agree" and text:
+            return (
+                f'\nThe user spoke and air-wrote the same thing: "{text}". '
+                "This is a strong signal — lean into it when shaping your reply."
+            )
+        if source == "conflict_air" and air_t:
+            return (
+                f'\nThe user spoke "{voice_t}" but also air-wrote "{air_t}". '
+                "The air-written token is a canonical AAC signal "
+                "(help/stop/water/done/more) — prioritise it over the spoken "
+                "words, which may have been misheard."
+            )
+        if source == "conflict_voice" and voice_t:
+            return (
+                f'\nThe user spoke "{voice_t}" but air-wrote "{air_t}" — '
+                "these don't match. The spoken form is richer; treat it as "
+                "the real intent and gently acknowledge the air-writing "
+                "may have been a mis-stroke."
+            )
+    if air_written_text:
+        return (
+            f'\nThe user air-wrote: "{_safe_user_text(air_written_text)}". '
+            "If this looks like a name, noun, or short phrase, "
+            "incorporate it verbatim into your response; "
+            "otherwise use it as a hint about what they're trying to say."
+        )
+    return ""
 def _build_user(
     chunks: list[dict],
     history: list[dict],
     air_written_text: str | None,
     persona_name: str,
     *,
+    resolved_intent: dict | None = None,
     rejected_response: str | None = None,
     rejected_candidates: list[str] | None = None,
     intent_kind: str = "memory",
             # Gesture opener wins over affect opener — a deliberate thumbs-up is a stronger signal than inferred affect.
             merged_opener = directive["opener_hint"]
+    air_writing_block = _format_multimodal_intent(resolved_intent, air_written_text)
     persona_mod = gen_cfg.get("persona_mod", "baseline")
     persona_instruction_line = (

backend/pipeline/state.py CHANGED Viewed

@@ -95,6 +95,10 @@ class PipelineState(TypedDict):
     gaze_bucket: str | None  # bucket hinted by gaze fixation
     air_written_text: str | None  # concatenated air-written chars
     head_signal: str | None  # "HEAD_SHAKE" | "HEAD_NOD_DISSATISFIED"
     turnaround_triggered: bool  # true when re-planned from dissatisfaction signal
     # ── L2: Intent decomposition outputs ─────────────────────────────────────

     gaze_bucket: str | None  # bucket hinted by gaze fixation
     air_written_text: str | None  # concatenated air-written chars
     head_signal: str | None  # "HEAD_SHAKE" | "HEAD_NOD_DISSATISFIED"
+    voice_text: str | None  # raw Web Speech transcript, pre-resolution
+    # Resolved voice⇄air-writing intent. Keys: text, source, voice_text, air_text.
+    # source ∈ voice_only | air_only | agree | conflict_air | conflict_voice.
+    resolved_intent: dict[str, Any] | None
     turnaround_triggered: bool  # true when re-planned from dissatisfaction signal
     # ── L2: Intent decomposition outputs ─────────────────────────────────────

frontend/src/App.css CHANGED Viewed

@@ -525,6 +525,30 @@ input[type="text"]:hover {
   color: #ffffff !important;
 }
 .eval-panel {
   margin-top: 10px;
   border-top: 1px solid var(--border);

   color: #ffffff !important;
 }
+.mic-btn {
+  background: transparent !important;
+  color: var(--accent) !important;
+  border: 1px solid var(--accent) !important;
+}
+.mic-btn.listening {
+  background: var(--accent) !important;
+  color: #ffffff !important;
+  animation: mic-pulse 1.1s ease-in-out infinite;
+}
+@keyframes mic-pulse {
+  0%, 100% { opacity: 1; }
+  50% { opacity: 0.6; }
+}
+.voice-status {
+  padding: 4px 12px;
+  font-size: 12px;
+  color: var(--text-muted);
+  font-family: var(--sans);
+}
 .eval-panel {
   margin-top: 10px;
   border-top: 1px solid var(--border);

frontend/src/components/ChatPanel.tsx CHANGED Viewed

@@ -14,6 +14,9 @@ import {
   streamRegenerate,
 } from "../lib/api";
 import { EvalPanel } from "./EvalPanel";
 const STRATEGY_LABELS: Record<string, string> = {
   broad: "broad — all memories",
@@ -135,6 +138,10 @@ export function ChatPanel({
   const [turnaroundLoading, setTurnaroundLoading] = useState(false);
   const [regenerateLoading, setRegenerateLoading] = useState(false);
   const { queueToken, flushNow } = useTokenBatcher(setMessages);
   const bottomRef = useRef<HTMLDivElement>(null);
   const lastResponseTsRef = useRef<number>(0);
   const lastTurnIdRef = useRef<number | null>(null);
@@ -156,6 +163,8 @@ export function ChatPanel({
     lastResponseTsRef.current = 0;
     evalPollAbortsRef.current.forEach((ac) => ac.abort());
     evalPollAbortsRef.current.clear();
   }, [userId]);
   useEffect(() => {
@@ -432,6 +441,8 @@ export function ChatPanel({
     setLoading(true);
     const airText = sensing.airWrittenText || null;
     // Push the partner bubble, and a placeholder AAC message we'll fill in
     // progressively. We need the placeholder's index to target updates — use
@@ -470,6 +481,8 @@ export function ChatPanel({
           gaze_bucket: sensing.gazeBucket,
           air_written_text: airText,
           head_signal: sensing.headSignal,
         },
         (evt) => {
           if (evt.type === "token") {
@@ -531,6 +544,10 @@ export function ChatPanel({
       }));
     } finally {
       if (airText) onAirTextConsumed();
       setLoading(false);
     }
   }
@@ -568,6 +585,31 @@ export function ChatPanel({
     [messages, setMessages, userId]
   );
   return (
     <div className="chat-panel">
       <div className="chat-header">
@@ -683,6 +725,11 @@ export function ChatPanel({
         )}
         <div ref={bottomRef} />
       </div>
       <div className="chat-input-row">
         <input
           type="text"
@@ -695,6 +742,26 @@ export function ChatPanel({
         <button onClick={handleSend} disabled={!userId || loading || !backendReady || !input.trim()}>
           Send
         </button>
       </div>
     </div>
   );

   streamRegenerate,
 } from "../lib/api";
 import { EvalPanel } from "./EvalPanel";
+import { useVoice } from "../hooks/useVoice";
+import { isVoiceCapable } from "../lib/voiceEligibility";
+import { resolveIntent } from "../lib/resolveIntent";
 const STRATEGY_LABELS: Record<string, string> = {
   broad: "broad — all memories",
   const [turnaroundLoading, setTurnaroundLoading] = useState(false);
   const [regenerateLoading, setRegenerateLoading] = useState(false);
   const { queueToken, flushNow } = useTokenBatcher(setMessages);
+  const [voiceText, setVoiceText] = useState<string | null>(null);
+  const [voiceNote, setVoiceNote] = useState<string | null>(null);
+  const voice = useVoice();
+  const micAvailable = isVoiceCapable(userId) && voice.supported;
   const bottomRef = useRef<HTMLDivElement>(null);
   const lastResponseTsRef = useRef<number>(0);
   const lastTurnIdRef = useRef<number | null>(null);
     lastResponseTsRef.current = 0;
     evalPollAbortsRef.current.forEach((ac) => ac.abort());
     evalPollAbortsRef.current.clear();
+    setVoiceText(null);
+    setVoiceNote(null);
   }, [userId]);
   useEffect(() => {
     setLoading(true);
     const airText = sensing.airWrittenText || null;
+    const vText = voiceText;
+    const resolved = resolveIntent(vText, airText);
     // Push the partner bubble, and a placeholder AAC message we'll fill in
     // progressively. We need the placeholder's index to target updates — use
           gaze_bucket: sensing.gazeBucket,
           air_written_text: airText,
           head_signal: sensing.headSignal,
+          voice_text: vText,
+          resolved_intent: resolved.source === "none" ? null : resolved,
         },
         (evt) => {
           if (evt.type === "token") {
       }));
     } finally {
       if (airText) onAirTextConsumed();
+      // Clear voice state unconditionally — a failed send shouldn't silently
+      // re-attach a stale transcript to the next turn. User can re-tap mic.
+      setVoiceText(null);
+      setVoiceNote(null);
       setLoading(false);
     }
   }
     [messages, setMessages, userId]
   );
+  const handleMic = useCallback(async () => {
+    if (!micAvailable || voice.listening) return;
+    setVoiceNote("Listening...");
+    try {
+      const cap = await voice.capture();
+      if (cap.transcript) {
+        setVoiceText(cap.transcript);
+        setVoiceNote(`Heard: "${cap.transcript}"`);
+      } else {
+        setVoiceNote("No speech detected.");
+      }
+    } catch (e) {
+      setVoiceNote(
+        `Mic error: ${e instanceof Error ? e.message : "failed"}`
+      );
+    }
+  }, [micAvailable, voice]);
+  const canTurnaround =
+    !!userId &&
+    backendReady &&
+    !loading &&
+    !turnaroundLoading &&
+    lastTurnIdRef.current !== null;
   return (
     <div className="chat-panel">
       <div className="chat-header">
         )}
         <div ref={bottomRef} />
       </div>
+      {micAvailable && voiceNote && (
+        <div className="voice-status" aria-live="polite">
+          {voiceNote}
+        </div>
+      )}
       <div className="chat-input-row">
         <input
           type="text"
         <button onClick={handleSend} disabled={!userId || loading || !backendReady || !input.trim()}>
           Send
         </button>
+        {micAvailable && (
+          <button
+            type="button"
+            className={`mic-btn${voice.listening ? " listening" : ""}`}
+            onClick={handleMic}
+            disabled={!backendReady || loading || voice.listening}
+            title="Capture a short voice utterance — resolved against air-writing before sending"
+          >
+            {voice.listening ? "🎤 Listening…" : "🎤 Speak"}
+          </button>
+        )}
+        <button
+          type="button"
+          className="turnaround-btn"
+          onClick={() => handleTurnaround("manual")}
+          disabled={!canTurnaround}
+          title="Re-plan the last response (also triggered by a head shake / sharp nod)"
+        >
+          ↻ Not quite right
+        </button>
       </div>
     </div>
   );

frontend/src/hooks/useVoice.ts ADDED Viewed

	@@ -0,0 +1,170 @@

+import { useCallback, useEffect, useRef, useState } from "react";
+// Thin wrapper around the Web Speech API. Chrome/Edge expose
+// `webkitSpeechRecognition`; Safari/Firefox don't — we no-op gracefully.
+type SRCtor = new () => SpeechRecognitionLike;
+interface SpeechRecognitionLike {
+  lang: string;
+  continuous: boolean;
+  interimResults: boolean;
+  maxAlternatives: number;
+  onresult: ((e: SpeechRecognitionEventLike) => void) | null;
+  onerror: ((e: { error: string }) => void) | null;
+  onend: (() => void) | null;
+  start: () => void;
+  stop: () => void;
+  abort: () => void;
+}
+interface SpeechRecognitionEventLike {
+  results: {
+    length: number;
+    [index: number]: {
+      isFinal: boolean;
+      length: number;
+      [index: number]: { transcript: string; confidence: number };
+    };
+  };
+}
+function getRecognitionCtor(): SRCtor | null {
+  if (typeof window === "undefined") return null;
+  const w = window as unknown as {
+    SpeechRecognition?: SRCtor;
+    webkitSpeechRecognition?: SRCtor;
+  };
+  return w.SpeechRecognition ?? w.webkitSpeechRecognition ?? null;
+}
+export interface VoiceCapture {
+  transcript: string;
+  confidence: number;
+}
+const WINDOW_MS = 3500;
+export function useVoice() {
+  const [supported] = useState(() => getRecognitionCtor() !== null);
+  const [listening, setListening] = useState(false);
+  const [error, setError] = useState<string | null>(null);
+  const recRef = useRef<SpeechRecognitionLike | null>(null);
+  const resolveRef = useRef<((v: VoiceCapture) => void) | null>(null);
+  const rejectRef = useRef<((err: Error) => void) | null>(null);
+  const bestRef = useRef<VoiceCapture>({ transcript: "", confidence: 0 });
+  const timerRef = useRef<number | null>(null);
+  const cleanup = useCallback(() => {
+    if (timerRef.current !== null) {
+      window.clearTimeout(timerRef.current);
+      timerRef.current = null;
+    }
+    const rec = recRef.current;
+    if (rec) {
+      try {
+        rec.abort();
+      } catch {
+        // ignore — some browsers throw if already stopped
+      }
+      rec.onresult = null;
+      rec.onerror = null;
+      rec.onend = null;
+    }
+    recRef.current = null;
+    resolveRef.current = null;
+    rejectRef.current = null;
+    setListening(false);
+  }, []);
+  // Unmount teardown: reject any in-flight promise so `await voice.capture()`
+  // in a parent component doesn't hang forever when the tree unmounts mid-listen.
+  useEffect(
+    () => () => {
+      const rj = rejectRef.current;
+      cleanup();
+      if (rj) rj(new Error("unmounted"));
+    },
+    [cleanup]
+  );
+  const capture = useCallback((): Promise<VoiceCapture> => {
+    const Ctor = getRecognitionCtor();
+    if (!Ctor) {
+      return Promise.reject(new Error("Speech recognition not supported"));
+    }
+    if (recRef.current) {
+      return Promise.reject(new Error("Already listening"));
+    }
+    return new Promise<VoiceCapture>((resolve, reject) => {
+      const rec = new Ctor();
+      rec.lang = navigator.language || "en-US";
+      rec.continuous = false;
+      rec.interimResults = false;
+      rec.maxAlternatives = 1;
+      bestRef.current = { transcript: "", confidence: 0 };
+      resolveRef.current = resolve;
+      rejectRef.current = reject;
+      recRef.current = rec;
+      setError(null);
+      rec.onresult = (e) => {
+        for (let i = 0; i < e.results.length; i++) {
+          const res = e.results[i];
+          if (!res.isFinal) continue;
+          const alt = res[0];
+          if (alt && alt.transcript.trim().length > 0) {
+            if (alt.confidence > bestRef.current.confidence) {
+              bestRef.current = {
+                transcript: alt.transcript.trim(),
+                confidence: alt.confidence,
+              };
+            }
+          }
+        }
+      };
+      rec.onerror = (e) => {
+        const msg = e.error || "recognition error";
+        setError(msg);
+        const rj = rejectRef.current;
+        cleanup();
+        if (rj) rj(new Error(msg));
+      };
+      rec.onend = () => {
+        const rs = resolveRef.current;
+        const best = bestRef.current;
+        cleanup();
+        if (rs) rs(best);
+      };
+      try {
+        rec.start();
+        setListening(true);
+        timerRef.current = window.setTimeout(() => {
+          try {
+            rec.stop();
+          } catch {
+            // onend will still fire
+          }
+        }, WINDOW_MS);
+      } catch (err) {
+        const msg = err instanceof Error ? err.message : "failed to start";
+        setError(msg);
+        cleanup();
+        reject(new Error(msg));
+      }
+    });
+  }, [cleanup]);
+  const cancel = useCallback(() => {
+    const rj = rejectRef.current;
+    cleanup();
+    if (rj) rj(new Error("cancelled"));
+  }, [cleanup]);
+  return { supported, listening, error, capture, cancel };
+}

frontend/src/lib/resolveIntent.ts ADDED Viewed

	@@ -0,0 +1,109 @@

+import { DEFAULT_AIR_TEMPLATES } from "./airTemplates";
+// Canonical AAC tokens that carry high signal when someone air-writes them —
+// short, action-oriented, and hard to confuse for casual chat. When the
+// voice transcript and the air-written text disagree, these tokens win.
+const AAC_PRIORITY_TOKENS: ReadonlySet<string> = new Set(
+  ["help", "stop", "water", "done", "more"].filter((t) =>
+    DEFAULT_AIR_TEMPLATES.has(t)
+  )
+);
+export type ResolvedSource =
+  | "voice_only"
+  | "air_only"
+  | "agree"
+  | "conflict_air"
+  | "conflict_voice"
+  | "none";
+export interface ResolvedIntent {
+  text: string;
+  source: ResolvedSource;
+  voice_text: string | null;
+  air_text: string | null;
+}
+function normalise(s: string | null | undefined): string {
+  return (s ?? "").trim().toLowerCase();
+}
+function tokens(s: string): Set<string> {
+  return new Set(
+    s
+      .toLowerCase()
+      .replace(/[^a-z0-9\s]/g, " ")
+      .split(/\s+/)
+      .filter((w) => w.length > 1)
+  );
+}
+function jaccard(a: Set<string>, b: Set<string>): number {
+  if (a.size === 0 || b.size === 0) return 0;
+  let inter = 0;
+  for (const tok of a) if (b.has(tok)) inter++;
+  const union = a.size + b.size - inter;
+  return union === 0 ? 0 : inter / union;
+}
+export function resolveIntent(
+  voiceRaw: string | null,
+  airRaw: string | null
+): ResolvedIntent {
+  const voice = normalise(voiceRaw);
+  const air = normalise(airRaw);
+  if (!voice && !air) {
+    return { text: "", source: "none", voice_text: null, air_text: null };
+  }
+  if (voice && !air) {
+    return {
+      text: voice,
+      source: "voice_only",
+      voice_text: voice,
+      air_text: null,
+    };
+  }
+  if (!voice && air) {
+    return { text: air, source: "air_only", voice_text: null, air_text: air };
+  }
+  // Both present.
+  const voiceTokens = tokens(voice);
+  const airTokens = tokens(air);
+  const overlap = jaccard(voiceTokens, airTokens);
+  // Air-text appears as a substring of the voice transcript (or vice versa) —
+  // user probably said the word while also writing it. Treat as agreement.
+  const substringHit =
+    voice.includes(air) || air.includes(voice) || overlap >= 0.5;
+  if (substringHit) {
+    // Prefer the longer / richer form (usually voice), but mark source as agree.
+    const winner = voice.length >= air.length ? voice : air;
+    return {
+      text: winner,
+      source: "agree",
+      voice_text: voice,
+      air_text: air,
+    };
+  }
+  // Genuine conflict. AAC priority tokens (help/stop/water/done/more) dominate.
+  if (AAC_PRIORITY_TOKENS.has(air)) {
+    return {
+      text: air,
+      source: "conflict_air",
+      voice_text: voice,
+      air_text: air,
+    };
+  }
+  // Otherwise voice wins — higher information density.
+  return {
+    text: voice,
+    source: "conflict_voice",
+    voice_text: voice,
+    air_text: air,
+  };
+}

frontend/src/lib/voiceEligibility.ts ADDED Viewed

	@@ -0,0 +1,18 @@

+// Personas for whom a live-mic button makes sense.
+// Gate reflects each persona's real-world speech access, not the in-universe
+// voice of the character: we hide the mic whenever the modelled access method
+// is non-verbal (locked-in, letterboard, dictation-to-assistant, etc.), even
+// if the character can "speak" in their canon.
+export const VOICE_CAPABLE_PERSONAS: ReadonlySet<string> = new Set([
+  "abed_nadir",
+  "allie_calhoun",
+  "forrest_gump",
+  "gabby_giffords",
+  "michael_j_fox",
+  "raymond_babbitt",
+  "walter_jr_white",
+]);
+export function isVoiceCapable(userId: string | null): boolean {
+  return !!userId && VOICE_CAPABLE_PERSONAS.has(userId);
+}

frontend/src/types.ts CHANGED Viewed

@@ -28,6 +28,21 @@ export interface Persona {
   style: string;
 }
 export interface ChatRequest {
   user_id: string;
   query: string;
@@ -36,6 +51,8 @@ export interface ChatRequest {
   gaze_bucket: MemoryBucket | null;
   air_written_text: string | null;
   head_signal?: HeadSignal | null;
 }
 export interface TurnaroundRequest {

   style: string;
 }
+export type ResolvedSource =
+  | "voice_only"
+  | "air_only"
+  | "agree"
+  | "conflict_air"
+  | "conflict_voice"
+  | "none";
+export interface ResolvedIntent {
+  text: string;
+  source: ResolvedSource;
+  voice_text: string | null;
+  air_text: string | null;
+}
 export interface ChatRequest {
   user_id: string;
   query: string;
   gaze_bucket: MemoryBucket | null;
   air_written_text: string | null;
   head_signal?: HeadSignal | null;
+  voice_text?: string | null;
+  resolved_intent?: ResolvedIntent | null;
 }
 export interface TurnaroundRequest {