shwetangisingh commited on
Commit
535a98d
·
1 Parent(s): c09a7e7

Add voice + air-writing conflict resolution

Browse files

Push-to-talk Web Speech mic (gated to verbal-access personas) plus
a Jaccard + AAC-priority-token resolver. The resulting
{text, source, voice_text, air_text} drives a supplemental sub-intent
and source-aware planner copy across five branches. Voice/air text is
sanitised before LLM interpolation to close a prompt-injection vector.

Also refreshes CLAUDE.md's persona table to match data/users.json.

CLAUDE.md CHANGED
@@ -2,11 +2,12 @@
2
 
3
  ## What This Project Does
4
 
5
- An AI chatbot that **speaks as an AAC user**, not to them. Given a user persona
6
- (Mia, Gerald, or Arjun), it fuses real-time multimodal non-verbal signals with
7
- personal memory retrieval to generate responses in that person's authentic voice.
8
- Orchestrated as a **plain Python function chain** across five layers, with two
9
- conditional branches.
 
10
 
11
  ---
12
 
@@ -66,13 +67,31 @@ logs/ Per-turn JSONL logs (gitignored)
66
 
67
  ## Personas
68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
  | ID | Name | Condition | Access |
70
  |----|------|-----------|--------|
71
- | `mia_chen` | Mia Chen, 28 | Cerebral palsy | Webcam head-tracking |
72
- | `gerald_okafor` | Gerald Okafor, 61 | ALS (early-mid) | Eye-gaze device |
73
- | `arjun_mehta` | Arjun Mehta, 17 | Autism (non-verbal) | Tablet touch grid |
 
 
74
 
75
- 25 memory chunks each (5 buckets × 5 memories). Arjun code-switches Hindi/English.
76
 
77
  ---
78
 
@@ -129,8 +148,13 @@ Copy `.env.example` → `.env` and set:
129
  - **NEVER use local Ollama models** (e.g. `qwen3:8b`, `gemma3:1b`) — this machine
130
  is not powerful enough and will break. Always use cloud-backed models like
131
  `gemma4:31b-cloud` via Ollama Cloud.
132
- - **Adding a persona**: add to `PERSONAS` in `data/generate_users.py`, re-run it,
133
- then `python -m backend.retrieval.vector_store` to rebuild indexes
 
 
 
 
 
134
  - **Changing LLM**: set `ACTIVE_LLM_TIER` in `.env` — no code changes needed
135
  - **Extending sensing**: sensing runs in the React frontend
136
  (`frontend/src/hooks/useSensing.ts`); to add a new signal, classify it
 
2
 
3
  ## What This Project Does
4
 
5
+ An AI chatbot that **speaks as an AAC user**, not to them. Given one of 14
6
+ personas nine anchored in real memoirs and five in canonical fiction —
7
+ it fuses real-time multimodal non-verbal signals with personal memory
8
+ retrieval to generate responses in that person's authentic voice. Orchestrated
9
+ as a **plain Python function chain** across five layers, with two conditional
10
+ branches.
11
 
12
  ---
13
 
 
67
 
68
  ## Personas
69
 
70
+ Fourteen personas shipped. Real-memoir-anchored:
71
+
72
+ | ID | Name | Condition | Access |
73
+ |----|------|-----------|--------|
74
+ | `stephen_hawking` | Stephen Hawking | ALS (advanced) | Cheek-twitch + ACAT predictive speech |
75
+ | `jean_dominique_bauby` | Jean-Dominique Bauby | Locked-in syndrome | Alphabet-blink with amanuensis |
76
+ | `michael_j_fox` | Michael J. Fox | Parkinson's | Voice + adaptive keyboard + dictation |
77
+ | `gabby_giffords` | Gabby Giffords | Aphasia + right hemiparesis (post-TBI) | Left-hand typing + speech-to-text |
78
+ | `jason_becker` | Jason Becker | ALS (fully locked-in) | Eye-gaze + father's letter-code board |
79
+ | `tito_mukhopadhyay` | Tito Mukhopadhyay | Non-verbal autism | Letterboard + pencil |
80
+ | `christopher_reeve` | Christopher Reeve | C1–C2 spinal cord injury | Dictation to assistants; sip-and-puff |
81
+ | `christy_brown` | Christy Brown | Cerebral palsy (spastic quadriplegia) | Left foot typing / writing |
82
+ | `wendy_mitchell` | Wendy Mitchell | Early-onset dementia | Laptop/phone typing + "brain-book" |
83
+
84
+ Canonical fiction:
85
+
86
  | ID | Name | Condition | Access |
87
  |----|------|-----------|--------|
88
+ | `abed_nadir` | Abed Nadir (*Community*) | Autism (coded); occasional selective mutism | Mostly verbal; text when overloaded |
89
+ | `allie_calhoun` | Allie Hamilton Calhoun (*The Notebook*) | Late-stage Alzheimer's | Verbal when lucid; yes/no otherwise |
90
+ | `forrest_gump` | Forrest Gump | Intellectual disability (IQ ~75) | Verbal primarily |
91
+ | `raymond_babbitt` | Raymond Babbitt (*Rain Man*) | Savant autism | Verbal when calm + visual schedules |
92
+ | `walter_jr_white` | Walter "Flynn" White Jr. (*Breaking Bad*) | Cerebral palsy | Verbal + smartphone typing |
93
 
94
+ ~25 bucketed memory chunks per persona (`family` / `medical` / `hobbies` / `daily_routine` / `social`; buckets tuned per-persona). A short-form voice push-to-talk mic surfaces only for personas whose modelled access method is verbal — see `VOICE_CAPABLE_PERSONAS` in [frontend/src/lib/voiceEligibility.ts](frontend/src/lib/voiceEligibility.ts).
95
 
96
  ---
97
 
 
148
  - **NEVER use local Ollama models** (e.g. `qwen3:8b`, `gemma3:1b`) — this machine
149
  is not powerful enough and will break. Always use cloud-backed models like
150
  `gemma4:31b-cloud` via Ollama Cloud.
151
+ - **Adding a persona**: add a memory JSON under `data/memories/<uid>.json` and
152
+ a matching entry in `data/users.json` (or regenerate both via
153
+ `data/generate_users.py` if present), then
154
+ `python -m backend.retrieval.vector_store` to rebuild indexes. If the
155
+ persona's modelled access method includes live speech, also add their `id`
156
+ to `VOICE_CAPABLE_PERSONAS` in `frontend/src/lib/voiceEligibility.ts` so
157
+ the mic button surfaces.
158
  - **Changing LLM**: set `ACTIVE_LLM_TIER` in `.env` — no code changes needed
159
  - **Extending sensing**: sensing runs in the React frontend
160
  (`frontend/src/hooks/useSensing.ts`); to add a new signal, classify it
README.md CHANGED
@@ -400,7 +400,7 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
400
  - Calibration is now averaged over the first 30 frames (~1s of neutral face) instead of a single-frame snapshot — a brief smile at startup used to lock in a biased baseline. Affect stays null during calibration; gaze/head/gesture/air-writing still flow.
401
  - [x] **[Core]** Gestures (`THUMBS_UP` / `THUMBS_DOWN` / `POINTING` / `WAVING`) now carry an `opener_hint` via `GESTURE_DIRECTIVES` in [backend/sensing/labels.py](backend/sensing/labels.py). A detected thumbs-up overrides the affect opener and tells the LLM to lead with an affirmation.
402
  - [x] **[Core]** Air-writing carries a default template bank ([frontend/src/lib/airTemplates.ts](frontend/src/lib/airTemplates.ts): `yes` / `?` / `hi` / `help` / `done` / `more` / `water` / `stop`) — all single-stroke shapes so DTW can match reliably. On match, the word flows through the pipeline three ways: (1) retrieval picks up the word as an extra `PERSONAL` sub-intent with a bucket hint (see `infer_bucket` in [backend/sensing/bucket_keywords.py](backend/sensing/bucket_keywords.py) — e.g. `help` → medical, `water` → daily_routine), (2) the planner includes an explicit "the user air-wrote X — incorporate verbatim if appropriate" instruction in the user message, and (3) the word appears in `logs/turns.jsonl` for debugging. The recognizer has a `MATCH_THRESHOLD` reject gate and `console.debug`s on empty-bank / no-match so unrecognised strokes never reach the backend. To add more templates, append entries to `DEFAULT_AIR_TEMPLATES` as 32-point normalised single-stroke trajectories.
403
- - [ ] **[Bonus]** Voice + air-writing conflict resolution. Capture short voice (Web Speech API), compare to air-written intent, send a `resolved_intent`
404
  - [ ] Thumbs-up currently biases the opener via the prompt. Once generation emits N candidates, move this to candidate reranking for a stronger signal.
405
 
406
  ### Intent decomposition
 
400
  - Calibration is now averaged over the first 30 frames (~1s of neutral face) instead of a single-frame snapshot — a brief smile at startup used to lock in a biased baseline. Affect stays null during calibration; gaze/head/gesture/air-writing still flow.
401
  - [x] **[Core]** Gestures (`THUMBS_UP` / `THUMBS_DOWN` / `POINTING` / `WAVING`) now carry an `opener_hint` via `GESTURE_DIRECTIVES` in [backend/sensing/labels.py](backend/sensing/labels.py). A detected thumbs-up overrides the affect opener and tells the LLM to lead with an affirmation.
402
  - [x] **[Core]** Air-writing carries a default template bank ([frontend/src/lib/airTemplates.ts](frontend/src/lib/airTemplates.ts): `yes` / `?` / `hi` / `help` / `done` / `more` / `water` / `stop`) — all single-stroke shapes so DTW can match reliably. On match, the word flows through the pipeline three ways: (1) retrieval picks up the word as an extra `PERSONAL` sub-intent with a bucket hint (see `infer_bucket` in [backend/sensing/bucket_keywords.py](backend/sensing/bucket_keywords.py) — e.g. `help` → medical, `water` → daily_routine), (2) the planner includes an explicit "the user air-wrote X — incorporate verbatim if appropriate" instruction in the user message, and (3) the word appears in `logs/turns.jsonl` for debugging. The recognizer has a `MATCH_THRESHOLD` reject gate and `console.debug`s on empty-bank / no-match so unrecognised strokes never reach the backend. To add more templates, append entries to `DEFAULT_AIR_TEMPLATES` as 32-point normalised single-stroke trajectories.
403
+ - [x] **[Bonus]** Voice + air-writing conflict resolution. A push-to-talk mic ([frontend/src/hooks/useVoice.ts](frontend/src/hooks/useVoice.ts)) captures a short Web Speech utterance; [frontend/src/lib/resolveIntent.ts](frontend/src/lib/resolveIntent.ts) merges it against the air-written text using Jaccard token overlap + AAC-priority tokens (`help/stop/water/done/more` win ties). The resolver emits a `{text, source, voice_text, air_text}` payload — `source ∈ voice_only | air_only | agree | conflict_air | conflict_voice` — which the backend uses in [backend/pipeline/nodes/intent.py](backend/pipeline/nodes/intent.py) to pick the supplemental sub-intent, and in [backend/pipeline/nodes/planner.py](backend/pipeline/nodes/planner.py) to render source-aware prompt copy (conflicts are acknowledged instead of silently overwritten). The mic is gated by persona via `VOICE_CAPABLE_PERSONAS` in [frontend/src/lib/voiceEligibility.ts](frontend/src/lib/voiceEligibility.ts) — only personas whose modelled access method is verbal (Abed, Allie, Forrest, Gabby, Michael J. Fox, Raymond, Walter Jr.) see the button; non-verbal / locked-in / letterboard personas don't.
404
  - [ ] Thumbs-up currently biases the opener via the prompt. Once generation emits N candidates, move this to candidate reranking for a stronger signal.
405
 
406
  ### Intent decomposition
backend/api/main.py CHANGED
@@ -98,6 +98,13 @@ def _reserve_eval_slot(run_id: str) -> None:
98
  # ── Request / response schemas ─────────────────────────────────────────────────
99
 
100
 
 
 
 
 
 
 
 
101
  class ChatRequest(BaseModel):
102
  user_id: str
103
  query: str
@@ -106,6 +113,8 @@ class ChatRequest(BaseModel):
106
  gaze_bucket: str | None = None
107
  air_written_text: str | None = None
108
  head_signal: str | None = None # "HEAD_SHAKE"|"HEAD_NOD_DISSATISFIED"
 
 
109
 
110
 
111
  class TurnaroundRequest(BaseModel):
@@ -210,6 +219,10 @@ def _build_initial_state(req: ChatRequest, session: dict) -> PipelineState:
210
  gaze_bucket=req.gaze_bucket,
211
  air_written_text=req.air_written_text,
212
  head_signal=req.head_signal,
 
 
 
 
213
  turnaround_triggered=False,
214
  raw_query=req.query,
215
  intent_route=None,
 
98
  # ── Request / response schemas ─────────────────────────────────────────────────
99
 
100
 
101
+ class ResolvedIntent(BaseModel):
102
+ text: str
103
+ source: str # voice_only | air_only | agree | conflict_air | conflict_voice | none
104
+ voice_text: str | None = None
105
+ air_text: str | None = None
106
+
107
+
108
  class ChatRequest(BaseModel):
109
  user_id: str
110
  query: str
 
113
  gaze_bucket: str | None = None
114
  air_written_text: str | None = None
115
  head_signal: str | None = None # "HEAD_SHAKE"|"HEAD_NOD_DISSATISFIED"
116
+ voice_text: str | None = None
117
+ resolved_intent: ResolvedIntent | None = None
118
 
119
 
120
  class TurnaroundRequest(BaseModel):
 
219
  gaze_bucket=req.gaze_bucket,
220
  air_written_text=req.air_written_text,
221
  head_signal=req.head_signal,
222
+ voice_text=req.voice_text,
223
+ resolved_intent=(
224
+ req.resolved_intent.model_dump() if req.resolved_intent else None
225
+ ),
226
  turnaround_triggered=False,
227
  raw_query=req.query,
228
  intent_route=None,
backend/main.py CHANGED
@@ -171,6 +171,8 @@ def main() -> None:
171
  gesture_tag=None,
172
  gaze_bucket=None,
173
  air_written_text=None,
 
 
174
  raw_query=query,
175
  intent_route=pre_route, # pre-filled → intent node sees it and skips LLM call
176
  generation_config=pre_gen_config,
 
171
  gesture_tag=None,
172
  gaze_bucket=None,
173
  air_written_text=None,
174
+ voice_text=None,
175
+ resolved_intent=None,
176
  raw_query=query,
177
  intent_route=pre_route, # pre-filled → intent node sees it and skips LLM call
178
  generation_config=pre_gen_config,
backend/pipeline/nodes/feedback.py CHANGED
@@ -51,6 +51,9 @@ def _log_to_jsonl(
51
  "retrieval_mode": state.get("retrieval_mode_used", "unknown"),
52
  "affect": affect,
53
  "head_signal": state.get("head_signal"),
 
 
 
54
  "turnaround_triggered": state.get("turnaround_triggered", False),
55
  "guardrail_passed": state.get("guardrail_passed", True),
56
  "num_chunks": len(chunks),
 
51
  "retrieval_mode": state.get("retrieval_mode_used", "unknown"),
52
  "affect": affect,
53
  "head_signal": state.get("head_signal"),
54
+ "air_written_text": state.get("air_written_text"),
55
+ "voice_text": state.get("voice_text"),
56
+ "resolved_intent": state.get("resolved_intent"),
57
  "turnaround_triggered": state.get("turnaround_triggered", False),
58
  "guardrail_passed": state.get("guardrail_passed", True),
59
  "num_chunks": len(chunks),
backend/pipeline/nodes/intent.py CHANGED
@@ -256,18 +256,21 @@ def run(state: PipelineState) -> dict:
256
  }
257
  ]
258
 
259
- air_written = state.get("air_written_text")
260
- if air_written:
261
- # Classify the air-written supplement the same way as a normal fragment
262
- # so a present-tense supplement ("tired") on a present-state question
 
 
 
263
  # doesn't silently flip the route to PERSONAL and re-enable retrieval.
264
- air_cls = _classify(air_written)
265
  sub_intents.append(
266
  {
267
- "type": air_cls,
268
- "query": air_written,
269
- "bucket_hint": infer_bucket(air_written)
270
- if air_cls == "PERSONAL"
271
  else None,
272
  "priority": priority,
273
  }
 
256
  }
257
  ]
258
 
259
+ # Prefer resolved_intent.text when the frontend did voice⇄air reconciliation;
260
+ # fall back to raw air_written_text when no voice was captured.
261
+ resolved = state.get("resolved_intent") or {}
262
+ supplement = (resolved.get("text") or "").strip() or state.get("air_written_text")
263
+ if supplement:
264
+ # Classify the supplement the same way as a normal fragment so a
265
+ # present-tense supplement ("tired") on a present-state question
266
  # doesn't silently flip the route to PERSONAL and re-enable retrieval.
267
+ sup_cls = _classify(supplement)
268
  sub_intents.append(
269
  {
270
+ "type": sup_cls,
271
+ "query": supplement,
272
+ "bucket_hint": infer_bucket(supplement)
273
+ if sup_cls == "PERSONAL"
274
  else None,
275
  "priority": priority,
276
  }
backend/pipeline/nodes/planner.py CHANGED
@@ -95,6 +95,7 @@ def _run_stream(state: PipelineState, tier: str) -> Iterator[dict]:
95
  style: StyleDirective = gen_cfg["style"]
96
  gesture_tag = state.get("gesture_tag")
97
  air_written_text = state.get("air_written_text")
 
98
  turnaround_triggered = state.get("turnaround_triggered", False)
99
  rejected_response: str | None = None
100
  if turnaround_triggered:
@@ -195,6 +196,7 @@ def _run_stream(state: PipelineState, tier: str) -> Iterator[dict]:
195
  gen_cfg,
196
  gesture_tag=gesture_tag,
197
  air_written_text=air_written_text,
 
198
  rejected_response=rejected_response,
199
  rejected_candidates=rejected_candidates,
200
  intent_kind=intent_kind,
@@ -367,6 +369,7 @@ def _run(state: PipelineState, tier: str) -> dict:
367
  style: StyleDirective = gen_cfg["style"]
368
  gesture_tag = state.get("gesture_tag")
369
  air_written_text = state.get("air_written_text")
 
370
  turnaround_triggered = state.get("turnaround_triggered", False)
371
  rejected_response: str | None = None
372
  if turnaround_triggered:
@@ -409,6 +412,7 @@ def _run(state: PipelineState, tier: str) -> dict:
409
  gen_cfg,
410
  gesture_tag=gesture_tag,
411
  air_written_text=air_written_text,
 
412
  rejected_response=rejected_response,
413
  rejected_candidates=rejected_candidates,
414
  intent_kind=intent_kind,
@@ -502,6 +506,7 @@ def _run(state: PipelineState, tier: str) -> dict:
502
  gen_cfg,
503
  gesture_tag=gesture_tag,
504
  air_written_text=air_written_text,
 
505
  rejected_response=rejected_response,
506
  intent_kind=intent_kind,
507
  affect=affect,
@@ -541,6 +546,7 @@ def _build_messages(
541
  gen_cfg: dict,
542
  gesture_tag: str | None = None,
543
  air_written_text: str | None = None,
 
544
  rejected_response: str | None = None,
545
  rejected_candidates: list[str] | None = None,
546
  intent_kind: str = "memory",
@@ -560,6 +566,7 @@ def _build_messages(
560
  gesture_tag,
561
  air_written_text,
562
  profile["name"],
 
563
  rejected_response=rejected_response,
564
  rejected_candidates=rejected_candidates,
565
  intent_kind=intent_kind,
@@ -611,6 +618,69 @@ Answering rules:
611
  --- end character sheet ---"""
612
 
613
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
614
  def _build_user(
615
  chunks: list[dict],
616
  history: list[dict],
@@ -621,6 +691,7 @@ def _build_user(
621
  air_written_text: str | None,
622
  persona_name: str,
623
  *,
 
624
  rejected_response: str | None = None,
625
  rejected_candidates: list[str] | None = None,
626
  intent_kind: str = "memory",
@@ -665,14 +736,7 @@ def _build_user(
665
  # Gesture opener wins over affect opener — a deliberate thumbs-up is a stronger signal than inferred affect.
666
  merged_opener = directive["opener_hint"]
667
 
668
- air_writing_block = ""
669
- if air_written_text:
670
- air_writing_block = (
671
- f'\nThe user air-wrote: "{air_written_text}". '
672
- "If this looks like a name, noun, or short phrase, "
673
- "incorporate it verbatim into your response; "
674
- "otherwise use it as a hint about what they're trying to say."
675
- )
676
 
677
  persona_mod = gen_cfg.get("persona_mod", "baseline")
678
  persona_instruction_line = (
 
95
  style: StyleDirective = gen_cfg["style"]
96
  gesture_tag = state.get("gesture_tag")
97
  air_written_text = state.get("air_written_text")
98
+ resolved_intent = state.get("resolved_intent")
99
  turnaround_triggered = state.get("turnaround_triggered", False)
100
  rejected_response: str | None = None
101
  if turnaround_triggered:
 
196
  gen_cfg,
197
  gesture_tag=gesture_tag,
198
  air_written_text=air_written_text,
199
+ resolved_intent=resolved_intent,
200
  rejected_response=rejected_response,
201
  rejected_candidates=rejected_candidates,
202
  intent_kind=intent_kind,
 
369
  style: StyleDirective = gen_cfg["style"]
370
  gesture_tag = state.get("gesture_tag")
371
  air_written_text = state.get("air_written_text")
372
+ resolved_intent = state.get("resolved_intent")
373
  turnaround_triggered = state.get("turnaround_triggered", False)
374
  rejected_response: str | None = None
375
  if turnaround_triggered:
 
412
  gen_cfg,
413
  gesture_tag=gesture_tag,
414
  air_written_text=air_written_text,
415
+ resolved_intent=resolved_intent,
416
  rejected_response=rejected_response,
417
  rejected_candidates=rejected_candidates,
418
  intent_kind=intent_kind,
 
506
  gen_cfg,
507
  gesture_tag=gesture_tag,
508
  air_written_text=air_written_text,
509
+ resolved_intent=resolved_intent,
510
  rejected_response=rejected_response,
511
  intent_kind=intent_kind,
512
  affect=affect,
 
546
  gen_cfg: dict,
547
  gesture_tag: str | None = None,
548
  air_written_text: str | None = None,
549
+ resolved_intent: dict | None = None,
550
  rejected_response: str | None = None,
551
  rejected_candidates: list[str] | None = None,
552
  intent_kind: str = "memory",
 
566
  gesture_tag,
567
  air_written_text,
568
  profile["name"],
569
+ resolved_intent=resolved_intent,
570
  rejected_response=rejected_response,
571
  rejected_candidates=rejected_candidates,
572
  intent_kind=intent_kind,
 
618
  --- end character sheet ---"""
619
 
620
 
621
+ def _safe_user_text(s: str) -> str:
622
+ # voice_text / air_text arrive from untrusted channels (Web Speech,
623
+ # air-writing DTW). They're f-stringed into LLM messages wrapped in
624
+ # double-quotes — a transcript containing `"` or newlines would break out
625
+ # of the quoted region and could inject instructions. Strip those and cap
626
+ # length. Same pattern as `safe_rejected` for `rejected_response`.
627
+ return s.replace('"', "'").replace("\n", " ").replace("\r", " ")[:200]
628
+
629
+
630
+ def _format_multimodal_intent(
631
+ resolved: dict | None, air_written_text: str | None
632
+ ) -> str:
633
+ # Branch on resolved_intent.source so the model sees voice⇄air-writing
634
+ # disagreements explicitly instead of getting a single text without context.
635
+ if resolved:
636
+ source = resolved.get("source") or "none"
637
+ voice_t = _safe_user_text((resolved.get("voice_text") or "").strip())
638
+ air_t = _safe_user_text((resolved.get("air_text") or "").strip())
639
+ text = _safe_user_text((resolved.get("text") or "").strip())
640
+
641
+ if source == "voice_only" and voice_t:
642
+ return (
643
+ f'\nThe user spoke aloud: "{voice_t}". '
644
+ "Treat this as a supplement to the partner's question — "
645
+ "a hint or clarification about what they want."
646
+ )
647
+ if source == "air_only" and air_t:
648
+ return (
649
+ f'\nThe user air-wrote: "{air_t}". '
650
+ "If this looks like a name, noun, or short phrase, "
651
+ "incorporate it verbatim into your response; "
652
+ "otherwise use it as a hint about what they're trying to say."
653
+ )
654
+ if source == "agree" and text:
655
+ return (
656
+ f'\nThe user spoke and air-wrote the same thing: "{text}". '
657
+ "This is a strong signal — lean into it when shaping your reply."
658
+ )
659
+ if source == "conflict_air" and air_t:
660
+ return (
661
+ f'\nThe user spoke "{voice_t}" but also air-wrote "{air_t}". '
662
+ "The air-written token is a canonical AAC signal "
663
+ "(help/stop/water/done/more) — prioritise it over the spoken "
664
+ "words, which may have been misheard."
665
+ )
666
+ if source == "conflict_voice" and voice_t:
667
+ return (
668
+ f'\nThe user spoke "{voice_t}" but air-wrote "{air_t}" — '
669
+ "these don't match. The spoken form is richer; treat it as "
670
+ "the real intent and gently acknowledge the air-writing "
671
+ "may have been a mis-stroke."
672
+ )
673
+
674
+ if air_written_text:
675
+ return (
676
+ f'\nThe user air-wrote: "{_safe_user_text(air_written_text)}". '
677
+ "If this looks like a name, noun, or short phrase, "
678
+ "incorporate it verbatim into your response; "
679
+ "otherwise use it as a hint about what they're trying to say."
680
+ )
681
+ return ""
682
+
683
+
684
  def _build_user(
685
  chunks: list[dict],
686
  history: list[dict],
 
691
  air_written_text: str | None,
692
  persona_name: str,
693
  *,
694
+ resolved_intent: dict | None = None,
695
  rejected_response: str | None = None,
696
  rejected_candidates: list[str] | None = None,
697
  intent_kind: str = "memory",
 
736
  # Gesture opener wins over affect opener — a deliberate thumbs-up is a stronger signal than inferred affect.
737
  merged_opener = directive["opener_hint"]
738
 
739
+ air_writing_block = _format_multimodal_intent(resolved_intent, air_written_text)
 
 
 
 
 
 
 
740
 
741
  persona_mod = gen_cfg.get("persona_mod", "baseline")
742
  persona_instruction_line = (
backend/pipeline/state.py CHANGED
@@ -95,6 +95,10 @@ class PipelineState(TypedDict):
95
  gaze_bucket: str | None # bucket hinted by gaze fixation
96
  air_written_text: str | None # concatenated air-written chars
97
  head_signal: str | None # "HEAD_SHAKE" | "HEAD_NOD_DISSATISFIED"
 
 
 
 
98
  turnaround_triggered: bool # true when re-planned from dissatisfaction signal
99
 
100
  # ── L2: Intent decomposition outputs ─────────────────────────────────────
 
95
  gaze_bucket: str | None # bucket hinted by gaze fixation
96
  air_written_text: str | None # concatenated air-written chars
97
  head_signal: str | None # "HEAD_SHAKE" | "HEAD_NOD_DISSATISFIED"
98
+ voice_text: str | None # raw Web Speech transcript, pre-resolution
99
+ # Resolved voice⇄air-writing intent. Keys: text, source, voice_text, air_text.
100
+ # source ∈ voice_only | air_only | agree | conflict_air | conflict_voice.
101
+ resolved_intent: dict[str, Any] | None
102
  turnaround_triggered: bool # true when re-planned from dissatisfaction signal
103
 
104
  # ── L2: Intent decomposition outputs ─────────────────────────────────────
frontend/src/App.css CHANGED
@@ -525,6 +525,30 @@ input[type="text"]:hover {
525
  color: #ffffff !important;
526
  }
527
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
528
  .eval-panel {
529
  margin-top: 10px;
530
  border-top: 1px solid var(--border);
 
525
  color: #ffffff !important;
526
  }
527
 
528
+ .mic-btn {
529
+ background: transparent !important;
530
+ color: var(--accent) !important;
531
+ border: 1px solid var(--accent) !important;
532
+ }
533
+
534
+ .mic-btn.listening {
535
+ background: var(--accent) !important;
536
+ color: #ffffff !important;
537
+ animation: mic-pulse 1.1s ease-in-out infinite;
538
+ }
539
+
540
+ @keyframes mic-pulse {
541
+ 0%, 100% { opacity: 1; }
542
+ 50% { opacity: 0.6; }
543
+ }
544
+
545
+ .voice-status {
546
+ padding: 4px 12px;
547
+ font-size: 12px;
548
+ color: var(--text-muted);
549
+ font-family: var(--sans);
550
+ }
551
+
552
  .eval-panel {
553
  margin-top: 10px;
554
  border-top: 1px solid var(--border);
frontend/src/components/ChatPanel.tsx CHANGED
@@ -14,6 +14,9 @@ import {
14
  streamRegenerate,
15
  } from "../lib/api";
16
  import { EvalPanel } from "./EvalPanel";
 
 
 
17
 
18
  const STRATEGY_LABELS: Record<string, string> = {
19
  broad: "broad — all memories",
@@ -135,6 +138,10 @@ export function ChatPanel({
135
  const [turnaroundLoading, setTurnaroundLoading] = useState(false);
136
  const [regenerateLoading, setRegenerateLoading] = useState(false);
137
  const { queueToken, flushNow } = useTokenBatcher(setMessages);
 
 
 
 
138
  const bottomRef = useRef<HTMLDivElement>(null);
139
  const lastResponseTsRef = useRef<number>(0);
140
  const lastTurnIdRef = useRef<number | null>(null);
@@ -156,6 +163,8 @@ export function ChatPanel({
156
  lastResponseTsRef.current = 0;
157
  evalPollAbortsRef.current.forEach((ac) => ac.abort());
158
  evalPollAbortsRef.current.clear();
 
 
159
  }, [userId]);
160
 
161
  useEffect(() => {
@@ -432,6 +441,8 @@ export function ChatPanel({
432
  setLoading(true);
433
 
434
  const airText = sensing.airWrittenText || null;
 
 
435
 
436
  // Push the partner bubble, and a placeholder AAC message we'll fill in
437
  // progressively. We need the placeholder's index to target updates — use
@@ -470,6 +481,8 @@ export function ChatPanel({
470
  gaze_bucket: sensing.gazeBucket,
471
  air_written_text: airText,
472
  head_signal: sensing.headSignal,
 
 
473
  },
474
  (evt) => {
475
  if (evt.type === "token") {
@@ -531,6 +544,10 @@ export function ChatPanel({
531
  }));
532
  } finally {
533
  if (airText) onAirTextConsumed();
 
 
 
 
534
  setLoading(false);
535
  }
536
  }
@@ -568,6 +585,31 @@ export function ChatPanel({
568
  [messages, setMessages, userId]
569
  );
570
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
571
  return (
572
  <div className="chat-panel">
573
  <div className="chat-header">
@@ -683,6 +725,11 @@ export function ChatPanel({
683
  )}
684
  <div ref={bottomRef} />
685
  </div>
 
 
 
 
 
686
  <div className="chat-input-row">
687
  <input
688
  type="text"
@@ -695,6 +742,26 @@ export function ChatPanel({
695
  <button onClick={handleSend} disabled={!userId || loading || !backendReady || !input.trim()}>
696
  Send
697
  </button>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
698
  </div>
699
  </div>
700
  );
 
14
  streamRegenerate,
15
  } from "../lib/api";
16
  import { EvalPanel } from "./EvalPanel";
17
+ import { useVoice } from "../hooks/useVoice";
18
+ import { isVoiceCapable } from "../lib/voiceEligibility";
19
+ import { resolveIntent } from "../lib/resolveIntent";
20
 
21
  const STRATEGY_LABELS: Record<string, string> = {
22
  broad: "broad — all memories",
 
138
  const [turnaroundLoading, setTurnaroundLoading] = useState(false);
139
  const [regenerateLoading, setRegenerateLoading] = useState(false);
140
  const { queueToken, flushNow } = useTokenBatcher(setMessages);
141
+ const [voiceText, setVoiceText] = useState<string | null>(null);
142
+ const [voiceNote, setVoiceNote] = useState<string | null>(null);
143
+ const voice = useVoice();
144
+ const micAvailable = isVoiceCapable(userId) && voice.supported;
145
  const bottomRef = useRef<HTMLDivElement>(null);
146
  const lastResponseTsRef = useRef<number>(0);
147
  const lastTurnIdRef = useRef<number | null>(null);
 
163
  lastResponseTsRef.current = 0;
164
  evalPollAbortsRef.current.forEach((ac) => ac.abort());
165
  evalPollAbortsRef.current.clear();
166
+ setVoiceText(null);
167
+ setVoiceNote(null);
168
  }, [userId]);
169
 
170
  useEffect(() => {
 
441
  setLoading(true);
442
 
443
  const airText = sensing.airWrittenText || null;
444
+ const vText = voiceText;
445
+ const resolved = resolveIntent(vText, airText);
446
 
447
  // Push the partner bubble, and a placeholder AAC message we'll fill in
448
  // progressively. We need the placeholder's index to target updates — use
 
481
  gaze_bucket: sensing.gazeBucket,
482
  air_written_text: airText,
483
  head_signal: sensing.headSignal,
484
+ voice_text: vText,
485
+ resolved_intent: resolved.source === "none" ? null : resolved,
486
  },
487
  (evt) => {
488
  if (evt.type === "token") {
 
544
  }));
545
  } finally {
546
  if (airText) onAirTextConsumed();
547
+ // Clear voice state unconditionally — a failed send shouldn't silently
548
+ // re-attach a stale transcript to the next turn. User can re-tap mic.
549
+ setVoiceText(null);
550
+ setVoiceNote(null);
551
  setLoading(false);
552
  }
553
  }
 
585
  [messages, setMessages, userId]
586
  );
587
 
588
+ const handleMic = useCallback(async () => {
589
+ if (!micAvailable || voice.listening) return;
590
+ setVoiceNote("Listening...");
591
+ try {
592
+ const cap = await voice.capture();
593
+ if (cap.transcript) {
594
+ setVoiceText(cap.transcript);
595
+ setVoiceNote(`Heard: "${cap.transcript}"`);
596
+ } else {
597
+ setVoiceNote("No speech detected.");
598
+ }
599
+ } catch (e) {
600
+ setVoiceNote(
601
+ `Mic error: ${e instanceof Error ? e.message : "failed"}`
602
+ );
603
+ }
604
+ }, [micAvailable, voice]);
605
+
606
+ const canTurnaround =
607
+ !!userId &&
608
+ backendReady &&
609
+ !loading &&
610
+ !turnaroundLoading &&
611
+ lastTurnIdRef.current !== null;
612
+
613
  return (
614
  <div className="chat-panel">
615
  <div className="chat-header">
 
725
  )}
726
  <div ref={bottomRef} />
727
  </div>
728
+ {micAvailable && voiceNote && (
729
+ <div className="voice-status" aria-live="polite">
730
+ {voiceNote}
731
+ </div>
732
+ )}
733
  <div className="chat-input-row">
734
  <input
735
  type="text"
 
742
  <button onClick={handleSend} disabled={!userId || loading || !backendReady || !input.trim()}>
743
  Send
744
  </button>
745
+ {micAvailable && (
746
+ <button
747
+ type="button"
748
+ className={`mic-btn${voice.listening ? " listening" : ""}`}
749
+ onClick={handleMic}
750
+ disabled={!backendReady || loading || voice.listening}
751
+ title="Capture a short voice utterance — resolved against air-writing before sending"
752
+ >
753
+ {voice.listening ? "🎤 Listening…" : "🎤 Speak"}
754
+ </button>
755
+ )}
756
+ <button
757
+ type="button"
758
+ className="turnaround-btn"
759
+ onClick={() => handleTurnaround("manual")}
760
+ disabled={!canTurnaround}
761
+ title="Re-plan the last response (also triggered by a head shake / sharp nod)"
762
+ >
763
+ ↻ Not quite right
764
+ </button>
765
  </div>
766
  </div>
767
  );
frontend/src/hooks/useVoice.ts ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { useCallback, useEffect, useRef, useState } from "react";
2
+
3
+ // Thin wrapper around the Web Speech API. Chrome/Edge expose
4
+ // `webkitSpeechRecognition`; Safari/Firefox don't — we no-op gracefully.
5
+
6
+ type SRCtor = new () => SpeechRecognitionLike;
7
+
8
+ interface SpeechRecognitionLike {
9
+ lang: string;
10
+ continuous: boolean;
11
+ interimResults: boolean;
12
+ maxAlternatives: number;
13
+ onresult: ((e: SpeechRecognitionEventLike) => void) | null;
14
+ onerror: ((e: { error: string }) => void) | null;
15
+ onend: (() => void) | null;
16
+ start: () => void;
17
+ stop: () => void;
18
+ abort: () => void;
19
+ }
20
+
21
+ interface SpeechRecognitionEventLike {
22
+ results: {
23
+ length: number;
24
+ [index: number]: {
25
+ isFinal: boolean;
26
+ length: number;
27
+ [index: number]: { transcript: string; confidence: number };
28
+ };
29
+ };
30
+ }
31
+
32
+ function getRecognitionCtor(): SRCtor | null {
33
+ if (typeof window === "undefined") return null;
34
+ const w = window as unknown as {
35
+ SpeechRecognition?: SRCtor;
36
+ webkitSpeechRecognition?: SRCtor;
37
+ };
38
+ return w.SpeechRecognition ?? w.webkitSpeechRecognition ?? null;
39
+ }
40
+
41
+ export interface VoiceCapture {
42
+ transcript: string;
43
+ confidence: number;
44
+ }
45
+
46
+ const WINDOW_MS = 3500;
47
+
48
+ export function useVoice() {
49
+ const [supported] = useState(() => getRecognitionCtor() !== null);
50
+ const [listening, setListening] = useState(false);
51
+ const [error, setError] = useState<string | null>(null);
52
+ const recRef = useRef<SpeechRecognitionLike | null>(null);
53
+ const resolveRef = useRef<((v: VoiceCapture) => void) | null>(null);
54
+ const rejectRef = useRef<((err: Error) => void) | null>(null);
55
+ const bestRef = useRef<VoiceCapture>({ transcript: "", confidence: 0 });
56
+ const timerRef = useRef<number | null>(null);
57
+
58
+ const cleanup = useCallback(() => {
59
+ if (timerRef.current !== null) {
60
+ window.clearTimeout(timerRef.current);
61
+ timerRef.current = null;
62
+ }
63
+ const rec = recRef.current;
64
+ if (rec) {
65
+ try {
66
+ rec.abort();
67
+ } catch {
68
+ // ignore — some browsers throw if already stopped
69
+ }
70
+ rec.onresult = null;
71
+ rec.onerror = null;
72
+ rec.onend = null;
73
+ }
74
+ recRef.current = null;
75
+ resolveRef.current = null;
76
+ rejectRef.current = null;
77
+ setListening(false);
78
+ }, []);
79
+
80
+ // Unmount teardown: reject any in-flight promise so `await voice.capture()`
81
+ // in a parent component doesn't hang forever when the tree unmounts mid-listen.
82
+ useEffect(
83
+ () => () => {
84
+ const rj = rejectRef.current;
85
+ cleanup();
86
+ if (rj) rj(new Error("unmounted"));
87
+ },
88
+ [cleanup]
89
+ );
90
+
91
+ const capture = useCallback((): Promise<VoiceCapture> => {
92
+ const Ctor = getRecognitionCtor();
93
+ if (!Ctor) {
94
+ return Promise.reject(new Error("Speech recognition not supported"));
95
+ }
96
+ if (recRef.current) {
97
+ return Promise.reject(new Error("Already listening"));
98
+ }
99
+
100
+ return new Promise<VoiceCapture>((resolve, reject) => {
101
+ const rec = new Ctor();
102
+ rec.lang = navigator.language || "en-US";
103
+ rec.continuous = false;
104
+ rec.interimResults = false;
105
+ rec.maxAlternatives = 1;
106
+
107
+ bestRef.current = { transcript: "", confidence: 0 };
108
+ resolveRef.current = resolve;
109
+ rejectRef.current = reject;
110
+ recRef.current = rec;
111
+ setError(null);
112
+
113
+ rec.onresult = (e) => {
114
+ for (let i = 0; i < e.results.length; i++) {
115
+ const res = e.results[i];
116
+ if (!res.isFinal) continue;
117
+ const alt = res[0];
118
+ if (alt && alt.transcript.trim().length > 0) {
119
+ if (alt.confidence > bestRef.current.confidence) {
120
+ bestRef.current = {
121
+ transcript: alt.transcript.trim(),
122
+ confidence: alt.confidence,
123
+ };
124
+ }
125
+ }
126
+ }
127
+ };
128
+
129
+ rec.onerror = (e) => {
130
+ const msg = e.error || "recognition error";
131
+ setError(msg);
132
+ const rj = rejectRef.current;
133
+ cleanup();
134
+ if (rj) rj(new Error(msg));
135
+ };
136
+
137
+ rec.onend = () => {
138
+ const rs = resolveRef.current;
139
+ const best = bestRef.current;
140
+ cleanup();
141
+ if (rs) rs(best);
142
+ };
143
+
144
+ try {
145
+ rec.start();
146
+ setListening(true);
147
+ timerRef.current = window.setTimeout(() => {
148
+ try {
149
+ rec.stop();
150
+ } catch {
151
+ // onend will still fire
152
+ }
153
+ }, WINDOW_MS);
154
+ } catch (err) {
155
+ const msg = err instanceof Error ? err.message : "failed to start";
156
+ setError(msg);
157
+ cleanup();
158
+ reject(new Error(msg));
159
+ }
160
+ });
161
+ }, [cleanup]);
162
+
163
+ const cancel = useCallback(() => {
164
+ const rj = rejectRef.current;
165
+ cleanup();
166
+ if (rj) rj(new Error("cancelled"));
167
+ }, [cleanup]);
168
+
169
+ return { supported, listening, error, capture, cancel };
170
+ }
frontend/src/lib/resolveIntent.ts ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { DEFAULT_AIR_TEMPLATES } from "./airTemplates";
2
+
3
+ // Canonical AAC tokens that carry high signal when someone air-writes them —
4
+ // short, action-oriented, and hard to confuse for casual chat. When the
5
+ // voice transcript and the air-written text disagree, these tokens win.
6
+ const AAC_PRIORITY_TOKENS: ReadonlySet<string> = new Set(
7
+ ["help", "stop", "water", "done", "more"].filter((t) =>
8
+ DEFAULT_AIR_TEMPLATES.has(t)
9
+ )
10
+ );
11
+
12
+ export type ResolvedSource =
13
+ | "voice_only"
14
+ | "air_only"
15
+ | "agree"
16
+ | "conflict_air"
17
+ | "conflict_voice"
18
+ | "none";
19
+
20
+ export interface ResolvedIntent {
21
+ text: string;
22
+ source: ResolvedSource;
23
+ voice_text: string | null;
24
+ air_text: string | null;
25
+ }
26
+
27
+ function normalise(s: string | null | undefined): string {
28
+ return (s ?? "").trim().toLowerCase();
29
+ }
30
+
31
+ function tokens(s: string): Set<string> {
32
+ return new Set(
33
+ s
34
+ .toLowerCase()
35
+ .replace(/[^a-z0-9\s]/g, " ")
36
+ .split(/\s+/)
37
+ .filter((w) => w.length > 1)
38
+ );
39
+ }
40
+
41
+ function jaccard(a: Set<string>, b: Set<string>): number {
42
+ if (a.size === 0 || b.size === 0) return 0;
43
+ let inter = 0;
44
+ for (const tok of a) if (b.has(tok)) inter++;
45
+ const union = a.size + b.size - inter;
46
+ return union === 0 ? 0 : inter / union;
47
+ }
48
+
49
+ export function resolveIntent(
50
+ voiceRaw: string | null,
51
+ airRaw: string | null
52
+ ): ResolvedIntent {
53
+ const voice = normalise(voiceRaw);
54
+ const air = normalise(airRaw);
55
+
56
+ if (!voice && !air) {
57
+ return { text: "", source: "none", voice_text: null, air_text: null };
58
+ }
59
+ if (voice && !air) {
60
+ return {
61
+ text: voice,
62
+ source: "voice_only",
63
+ voice_text: voice,
64
+ air_text: null,
65
+ };
66
+ }
67
+ if (!voice && air) {
68
+ return { text: air, source: "air_only", voice_text: null, air_text: air };
69
+ }
70
+
71
+ // Both present.
72
+ const voiceTokens = tokens(voice);
73
+ const airTokens = tokens(air);
74
+ const overlap = jaccard(voiceTokens, airTokens);
75
+
76
+ // Air-text appears as a substring of the voice transcript (or vice versa) —
77
+ // user probably said the word while also writing it. Treat as agreement.
78
+ const substringHit =
79
+ voice.includes(air) || air.includes(voice) || overlap >= 0.5;
80
+
81
+ if (substringHit) {
82
+ // Prefer the longer / richer form (usually voice), but mark source as agree.
83
+ const winner = voice.length >= air.length ? voice : air;
84
+ return {
85
+ text: winner,
86
+ source: "agree",
87
+ voice_text: voice,
88
+ air_text: air,
89
+ };
90
+ }
91
+
92
+ // Genuine conflict. AAC priority tokens (help/stop/water/done/more) dominate.
93
+ if (AAC_PRIORITY_TOKENS.has(air)) {
94
+ return {
95
+ text: air,
96
+ source: "conflict_air",
97
+ voice_text: voice,
98
+ air_text: air,
99
+ };
100
+ }
101
+
102
+ // Otherwise voice wins — higher information density.
103
+ return {
104
+ text: voice,
105
+ source: "conflict_voice",
106
+ voice_text: voice,
107
+ air_text: air,
108
+ };
109
+ }
frontend/src/lib/voiceEligibility.ts ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // Personas for whom a live-mic button makes sense.
2
+ // Gate reflects each persona's real-world speech access, not the in-universe
3
+ // voice of the character: we hide the mic whenever the modelled access method
4
+ // is non-verbal (locked-in, letterboard, dictation-to-assistant, etc.), even
5
+ // if the character can "speak" in their canon.
6
+ export const VOICE_CAPABLE_PERSONAS: ReadonlySet<string> = new Set([
7
+ "abed_nadir",
8
+ "allie_calhoun",
9
+ "forrest_gump",
10
+ "gabby_giffords",
11
+ "michael_j_fox",
12
+ "raymond_babbitt",
13
+ "walter_jr_white",
14
+ ]);
15
+
16
+ export function isVoiceCapable(userId: string | null): boolean {
17
+ return !!userId && VOICE_CAPABLE_PERSONAS.has(userId);
18
+ }
frontend/src/types.ts CHANGED
@@ -28,6 +28,21 @@ export interface Persona {
28
  style: string;
29
  }
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  export interface ChatRequest {
32
  user_id: string;
33
  query: string;
@@ -36,6 +51,8 @@ export interface ChatRequest {
36
  gaze_bucket: MemoryBucket | null;
37
  air_written_text: string | null;
38
  head_signal?: HeadSignal | null;
 
 
39
  }
40
 
41
  export interface TurnaroundRequest {
 
28
  style: string;
29
  }
30
 
31
+ export type ResolvedSource =
32
+ | "voice_only"
33
+ | "air_only"
34
+ | "agree"
35
+ | "conflict_air"
36
+ | "conflict_voice"
37
+ | "none";
38
+
39
+ export interface ResolvedIntent {
40
+ text: string;
41
+ source: ResolvedSource;
42
+ voice_text: string | null;
43
+ air_text: string | null;
44
+ }
45
+
46
  export interface ChatRequest {
47
  user_id: string;
48
  query: string;
 
51
  gaze_bucket: MemoryBucket | null;
52
  air_written_text: string | null;
53
  head_signal?: HeadSignal | null;
54
+ voice_text?: string | null;
55
+ resolved_intent?: ResolvedIntent | null;
56
  }
57
 
58
  export interface TurnaroundRequest {