shwetangisingh commited on
Commit
af222c8
·
1 Parent(s): ce51e88

imultimodal sensing → real stylistic constraints

Browse files

Affect, gestures, and air-writing now actually steer the LLM's
word choice instead of just being metadata. Each emotion maps to
a StyleDirective (register, prefer/avoid words, opener hint,
exemplar) that's rendered as explicit instructions in the
per-turn user message; gestures override the opener when present;
air-writing recognises 8 single-stroke shapes (yes, ?, hi, help,
done, more, water, stop) and both biases retrieval via bucket
keywords and gets incorporated verbatim by the planner.

Fixes along the way:
- LCP was measuring mouth x-drift, not vertical lip-corner pull.
Rewrote it as (mouth_centre.y - corner_avg.y) / inter_ocular
and retuned thresholds; FRUSTRATED now has a second trigger
path (brows lowered + squinting).
- Calibration now averages the first 30 frames instead of a
single-frame snapshot. Affect stays null during calibration
but gaze/gesture/air-writing still flow.
- Deepcopy the shared _AFFECT_CONFIG in both intent paths so
downstream mutations can't corrupt the module constant.
- compute_multimodal_alignment now returns non-zero scores
(affect via sentiment lexicon, gesture via opener regex,
gaze via retrieved-chunk bucket match).
- LLM temperature 0.4 → 0.8 so the sensing→output link is
actually visible in the response.

README.md CHANGED
@@ -288,7 +288,7 @@ multimodal_aac_chatbot/
288
  │ │ ├── graph.py run_pipeline() — plain function chain
289
  │ │ ├── state.py PipelineState TypedDict
290
  │ │ └── nodes/ intent, retrieval, planner, feedback
291
- │ ├── sensing/labels.py GESTURE_TO_TAG (sensing runs in browser)
292
  │ ├── retrieval/ BGE embeddings (torch tensor) + bucket priors
293
  │ ├── generation/llm_client.py 2-tier Ollama Cloud LLM client (primary/fallback)
294
  │ └── guardrails/checks.py Input + output safety checks
@@ -340,7 +340,7 @@ Adding a new persona: drop a JSON file into `data/memories/` following the schem
340
 
341
  From the spec (pages 10–11). Tags: **[Core]** = must do, **[Bonus]** = nice to have, **[Eval]** = for the grade.
342
 
343
- Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend just gets the labels (`affect`, `gesture_tag`, `gaze_bucket`). The `backend/sensing/` python modules are dead code.
344
 
345
  ### Dataset
346
 
@@ -359,10 +359,13 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
359
  - [x] intent-aware turnaround: PERSONAL re-retrieves excluding the rejected bucket *and* exact rejected chunk texts (with `turnaround_min_score` floor — falls back to original chunks rather than degrading); PRESENT_STATE flips emotional read or admits uncertainty
360
  - [x] UI: rejected bubble gets strikethrough + "rephrased" badge, new bubble appended with "↻ turnaround" badge — both visible (you can't unsay something to a partner). Manual "↻ Not quite right" button as fallback
361
  - [x] guards: `turnaroundConsumedTurnRef` prevents self-retrigger loops; backend `turn_id` returned in `ChatResponse` so frontend doesn't desync on persona switch; stale-turn 409
362
- - [ ] **[Core]** Smile / positive affect should actually change the wording (more positive lexicon), not just be metadata. Right now it's annotated in the prompt but we never checked if the LLM is doing anything with it probably need a stronger constraint or example in the prompt
363
- - [ ] **[Core]** Air-writing is treated as raw text appended to the query. Spec wants it as a stylistic constraint too should it bias tone, or stay query-only? Decide and document
 
 
 
364
  - [ ] **[Bonus]** Voice + air-writing conflict resolution. Capture short voice (Web Speech API), compare to air-written intent, send a `resolved_intent`
365
- - [ ] thumbs-up only changes the prompt today should also boost affirmative candidates in the reranker
366
 
367
  ### Intent decomposition
368
 
@@ -386,6 +389,7 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
386
  - [ ] **[Core]** API returns one response. Should return multiple candidates so the user can pick (and so the next item works)
387
  - [ ] **[Core]** Frontend needs a candidate picker — show all the options, let the user click one, send the selection back
388
  - [ ] **[Bonus]** When user picks a candidate, save the `(query, picked)` pair to a side vector index and check it first next turn
 
389
 
390
  ### Evals
391
 
@@ -395,20 +399,21 @@ Live per-turn scores show up in the `EvalPanel`. State:
395
  |--------|--------|
396
  | Efficiency | works (SLO check on `t_total`) |
397
  | Faithfulness | stub, returns 0 |
398
- | Multimodal alignment | stub, returns 0 |
399
  | Authenticity | star rating in UI but not saved |
400
 
401
  - [ ] **[Eval]** Faithfulness — actually check if the response is grounded in what we retrieved. NLI model, sentence-level. If we didn't retrieve anything, flag `no_evidence` instead of pretending we scored it
402
  - [ ] **[Eval]** Efficiency — per-turn SLO check is done, but for the writeup we need aggregate latency: p50/p95 across a fixed query set, broken out by LLM tier. Spec target is < 6s
403
- - [ ] **[Eval]** Multimodal alignment — does the response actually reflect the gesture/affect/gaze? Don't need a model for this, just reuse the word maps the planner already has. Gaze one is trickier — check whether the chunks we ended up using came from the bucket the user was looking at
404
  - [ ] **[Eval]** Authenticity — the Likert stars are wired up in the UI but go nowhere. Save them, log them with the turn so we can actually look at them later
405
  - [ ] **[Eval]** For the live in-class eval: figure out the actual session — who rates (partners + experts per spec), how many turns each, what gets shown to them. The Likert form is the easy part; the protocol isn't written down anywhere
406
  - [ ] **[Eval]** Need an offline version of all three model-driven evals (faithfulness / alignment / efficiency). Aggregate numbers across a fixed query set per persona for the writeup
407
 
408
  ### Cleanup
409
 
410
- - [ ] move the affect→tone / persona override dicts out of code into a yaml
411
  - [x] delete `backend/sensing/` (dead code, sensing is in frontend) — done, only `labels.py` remains
 
412
 
413
  ---
414
 
 
288
  │ │ ├── graph.py run_pipeline() — plain function chain
289
  │ │ ├── state.py PipelineState TypedDict
290
  │ │ └── nodes/ intent, retrieval, planner, feedback
291
+ │ ├── sensing/labels.py GESTURE_DIRECTIVES (sensing runs in browser)
292
  │ ├── retrieval/ BGE embeddings (torch tensor) + bucket priors
293
  │ ├── generation/llm_client.py 2-tier Ollama Cloud LLM client (primary/fallback)
294
  │ └── guardrails/checks.py Input + output safety checks
 
340
 
341
  From the spec (pages 10–11). Tags: **[Core]** = must do, **[Bonus]** = nice to have, **[Eval]** = for the grade.
342
 
343
+ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend just gets the labels (`affect`, `gesture_tag`, `gaze_bucket`). Only `backend/sensing/labels.py` (`GESTURE_DIRECTIVES`) lives on the backend.
344
 
345
  ### Dataset
346
 
 
359
  - [x] intent-aware turnaround: PERSONAL re-retrieves excluding the rejected bucket *and* exact rejected chunk texts (with `turnaround_min_score` floor — falls back to original chunks rather than degrading); PRESENT_STATE flips emotional read or admits uncertainty
360
  - [x] UI: rejected bubble gets strikethrough + "rephrased" badge, new bubble appended with "↻ turnaround" badge — both visible (you can't unsay something to a partner). Manual "↻ Not quite right" button as fallback
361
  - [x] guards: `turnaroundConsumedTurnRef` prevents self-retrigger loops; backend `turn_id` returned in `ChatResponse` so frontend doesn't desync on persona switch; stale-turn 409
362
+ - [x] **[Core]** Smile / positive affect actually changes wording now. Affect compiles into a `StyleDirective` (register + prefer/avoid words + exemplar + opener hint) rendered as explicit instructions in the turn-specific user message — see `_AFFECT_CONFIG` in [backend/pipeline/nodes/intent.py](backend/pipeline/nodes/intent.py) and `_build_user` in [backend/pipeline/nodes/planner.py](backend/pipeline/nodes/planner.py). The persona's own `stylistic_preferences` (from the memory JSONs) carry the stable baseline in the cached system message; the affect directive is how that baseline shifts per turn. Measured by `compute_multimodal_alignment` (positive/negative lexicon).
363
+ - Fixed a long-standing bug where LCP (lip-corner pull) was accidentally the *x-coordinate* of the mouth centre, so it drifted on head turns and almost never fired FRUSTRATED. Now measured as vertical pull of the corners relative to mouth centre, normalised by inter-ocular distance. HAPPY/FRUSTRATED thresholds retuned to the new scale; FRUSTRATED also triggers on brows-lowered + squinting as a second path. See `computeAffectVector` and `classifyAffect` in [frontend/src/lib/sensing.ts](frontend/src/lib/sensing.ts).
364
+ - Calibration is now averaged over the first 30 frames (~1s of neutral face) instead of a single-frame snapshot — a brief smile at startup used to lock in a biased baseline. Affect stays null during calibration; gaze/head/gesture/air-writing still flow.
365
+ - [x] **[Core]** Gestures (`THUMBS_UP` / `THUMBS_DOWN` / `POINTING` / `WAVING`) now carry an `opener_hint` via `GESTURE_DIRECTIVES` in [backend/sensing/labels.py](backend/sensing/labels.py). A detected thumbs-up overrides the affect opener and tells the LLM to lead with an affirmation.
366
+ - [x] **[Core]** Air-writing carries a default template bank ([frontend/src/lib/airTemplates.ts](frontend/src/lib/airTemplates.ts): `yes` / `?` / `hi` / `help` / `done` / `more` / `water` / `stop`) — all single-stroke shapes so DTW can match reliably. On match, the word flows through the pipeline three ways: (1) retrieval picks up the word as an extra `PERSONAL` sub-intent with a bucket hint (see `infer_bucket` in [backend/sensing/bucket_keywords.py](backend/sensing/bucket_keywords.py) — e.g. `help` → medical, `water` → daily_routine), (2) the planner includes an explicit "the user air-wrote X — incorporate verbatim if appropriate" instruction in the user message, and (3) the word appears in `logs/turns.jsonl` for debugging. The recognizer has a `MATCH_THRESHOLD` reject gate and `console.debug`s on empty-bank / no-match so unrecognised strokes never reach the backend. To add more templates, append entries to `DEFAULT_AIR_TEMPLATES` as 32-point normalised single-stroke trajectories.
367
  - [ ] **[Bonus]** Voice + air-writing conflict resolution. Capture short voice (Web Speech API), compare to air-written intent, send a `resolved_intent`
368
+ - [ ] Thumbs-up currently biases the opener via the prompt. Once generation emits N candidates, move this to candidate reranking for a stronger signal.
369
 
370
  ### Intent decomposition
371
 
 
389
  - [ ] **[Core]** API returns one response. Should return multiple candidates so the user can pick (and so the next item works)
390
  - [ ] **[Core]** Frontend needs a candidate picker — show all the options, let the user click one, send the selection back
391
  - [ ] **[Bonus]** When user picks a candidate, save the `(query, picked)` pair to a side vector index and check it first next turn
392
+ - [x] LLM temperature bumped from 0.4 → 0.8 in [backend/pipeline/nodes/planner.py](backend/pipeline/nodes/planner.py). The old setting produced near-identical responses across turns even when affect/gesture changed, which made the sensing→output link hard to see. 0.8 gives meaningful lexical variation while staying in the persona's voice.
393
 
394
  ### Evals
395
 
 
399
  |--------|--------|
400
  | Efficiency | works (SLO check on `t_total`) |
401
  | Faithfulness | stub, returns 0 |
402
+ | Multimodal alignment | works — affect (sentiment lexicon), gesture (opener regex), gaze (bucket match) |
403
  | Authenticity | star rating in UI but not saved |
404
 
405
  - [ ] **[Eval]** Faithfulness — actually check if the response is grounded in what we retrieved. NLI model, sentence-level. If we didn't retrieve anything, flag `no_evidence` instead of pretending we scored it
406
  - [ ] **[Eval]** Efficiency — per-turn SLO check is done, but for the writeup we need aggregate latency: p50/p95 across a fixed query set, broken out by LLM tier. Spec target is < 6s
407
+ - [x] **[Eval]** Multimodal alignment — implemented in `backend/evals/multimodal_alignment.py`. Affect scored by positive/negative lexicon overlap vs. target sentiment, gesture by opener-phrase regex (THUMBS_UP/THUMBS_DOWN/WAVING), gaze by fraction of retrieved chunks matching the looked-at bucket. Returned on every turn as `multimodal_alignment` / `affect_alignment` / `gesture_alignment` / `gaze_alignment`
408
  - [ ] **[Eval]** Authenticity — the Likert stars are wired up in the UI but go nowhere. Save them, log them with the turn so we can actually look at them later
409
  - [ ] **[Eval]** For the live in-class eval: figure out the actual session — who rates (partners + experts per spec), how many turns each, what gets shown to them. The Likert form is the easy part; the protocol isn't written down anywhere
410
  - [ ] **[Eval]** Need an offline version of all three model-driven evals (faithfulness / alignment / efficiency). Aggregate numbers across a fixed query set per persona for the writeup
411
 
412
  ### Cleanup
413
 
414
+ - [ ] move the affect `StyleDirective` config (`_AFFECT_CONFIG` in [intent.py](backend/pipeline/nodes/intent.py)) and the gesture directives ([labels.py](backend/sensing/labels.py)) out of code into a yaml
415
  - [x] delete `backend/sensing/` (dead code, sensing is in frontend) — done, only `labels.py` remains
416
+ - [x] per-persona affect overrides (`_PERSONA_TONE_OVERRIDES`) deleted — redundant with `stylistic_preferences` in the new persona JSONs
417
 
418
  ---
419
 
backend/evals/multimodal_alignment.py CHANGED
@@ -1,5 +1,85 @@
1
- # Multimodal alignment scoring.
2
- from __future__ import annotations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
 
4
 
5
  def compute_multimodal_alignment(
@@ -9,10 +89,17 @@ def compute_multimodal_alignment(
9
  gaze_bucket: str | None,
10
  chunks: list[dict],
11
  ) -> dict:
12
- """Score alignment between non-verbal inputs and generated text."""
 
 
 
 
 
 
 
13
  return {
14
- "overall_score": 0.0,
15
- "affect_alignment": 0.0,
16
- "gesture_alignment": 0.0,
17
- "gaze_alignment": 0.0,
18
  }
 
1
+ import re
2
+
3
+ _POSITIVE = {
4
+ "glad",
5
+ "love",
6
+ "lucky",
7
+ "happy",
8
+ "great",
9
+ "grateful",
10
+ "fun",
11
+ "wonderful",
12
+ "nice",
13
+ "amazing",
14
+ "delighted",
15
+ "pleased",
16
+ "yes",
17
+ "solid",
18
+ }
19
+ _NEGATIVE = {
20
+ "tired",
21
+ "hard",
22
+ "sorry",
23
+ "unfortunately",
24
+ "bad",
25
+ "awful",
26
+ "regrettably",
27
+ "difficult",
28
+ "frustrating",
29
+ "no",
30
+ "stop",
31
+ }
32
+
33
+ _AFFECT_TARGET = {
34
+ "HAPPY": 1.0,
35
+ "FRUSTRATED": -0.5,
36
+ "NEUTRAL": 0.0,
37
+ "SURPRISED": 0.0,
38
+ }
39
+
40
+ _GESTURE_OPENER_PATTERNS = {
41
+ "THUMBS_UP": re.compile(r"^\s*(yes|yeah|totally|for sure|absolutely|sure)\b", re.I),
42
+ "THUMBS_DOWN": re.compile(r"^\s*(no|nah|not really|i'd rather not)\b", re.I),
43
+ "WAVING": re.compile(r"^\s*(hi|hey|hello)\b", re.I),
44
+ }
45
+
46
+
47
+ def _tokens(text: str) -> set[str]:
48
+ return set(re.findall(r"\b[a-z]+\b", text.lower()))
49
+
50
+
51
+ def _sentiment_score(text: str) -> float:
52
+ toks = _tokens(text)
53
+ pos = len(toks & _POSITIVE)
54
+ neg = len(toks & _NEGATIVE)
55
+ if pos == 0 and neg == 0:
56
+ return 0.0
57
+ return (pos - neg) / (pos + neg)
58
+
59
+
60
+ def _affect_alignment(response: str, affect: str | None) -> float:
61
+ if not affect:
62
+ return 0.0
63
+ target = _AFFECT_TARGET.get(affect, 0.0)
64
+ score = _sentiment_score(response)
65
+ # distance in [0, 2] → similarity in [0, 1]
66
+ return max(0.0, 1.0 - abs(score - target) / 2.0)
67
+
68
+
69
+ def _gesture_alignment(response: str, gesture_tag: str | None) -> float:
70
+ if not gesture_tag:
71
+ return 0.0
72
+ pattern = _GESTURE_OPENER_PATTERNS.get(gesture_tag)
73
+ if pattern is None:
74
+ return 0.5 # gesture has no testable opener; give partial credit
75
+ return 1.0 if pattern.search(response) else 0.0
76
+
77
+
78
+ def _gaze_alignment(chunks: list[dict], gaze_bucket: str | None) -> float:
79
+ if not gaze_bucket or not chunks:
80
+ return 0.0
81
+ matches = sum(1 for c in chunks if c.get("bucket") == gaze_bucket)
82
+ return matches / len(chunks)
83
 
84
 
85
  def compute_multimodal_alignment(
 
89
  gaze_bucket: str | None,
90
  chunks: list[dict],
91
  ) -> dict:
92
+ scores: dict[str, float] = {}
93
+ if affect:
94
+ scores["affect_alignment"] = _affect_alignment(response, affect)
95
+ if gesture_tag:
96
+ scores["gesture_alignment"] = _gesture_alignment(response, gesture_tag)
97
+ if gaze_bucket:
98
+ scores["gaze_alignment"] = _gaze_alignment(chunks, gaze_bucket)
99
+ overall = sum(scores.values()) / len(scores) if scores else 0.0
100
  return {
101
+ "overall_score": round(overall, 4),
102
+ "affect_alignment": round(scores.get("affect_alignment", 0.0), 4),
103
+ "gesture_alignment": round(scores.get("gesture_alignment", 0.0), 4),
104
+ "gaze_alignment": round(scores.get("gaze_alignment", 0.0), 4),
105
  }
backend/main.py CHANGED
@@ -2,6 +2,7 @@
2
  from __future__ import annotations
3
 
4
  import argparse
 
5
  import json
6
  import os
7
  import sys
@@ -10,6 +11,7 @@ import time
10
  from backend.config.settings import settings
11
  from backend.guardrails.checks import check_input
12
  from backend.pipeline.graph import run_pipeline
 
13
  from backend.pipeline.state import GenerationConfig, PipelineState
14
  from backend.retrieval.bucket_priors import uniform_priors
15
  from backend.retrieval.vector_store import _get_embedder
@@ -49,6 +51,7 @@ def _keyword_intent(query: str) -> tuple[dict, GenerationConfig]:
49
  else "PERSONAL"
50
  )
51
 
 
52
  route = {
53
  "sub_intents": [
54
  {
@@ -66,12 +69,8 @@ def _keyword_intent(query: str) -> tuple[dict, GenerationConfig]:
66
  },
67
  "affect": "NEUTRAL",
68
  }
69
- gen_config: GenerationConfig = {
70
- "max_tokens": settings.max_tokens_neutral,
71
- "tone_tag": "[TONE:DEFAULT]",
72
- "retrieval_mode": "full",
73
- "persona_mod": "baseline",
74
- }
75
  return route, gen_config
76
 
77
 
 
2
  from __future__ import annotations
3
 
4
  import argparse
5
+ import copy
6
  import json
7
  import os
8
  import sys
 
11
  from backend.config.settings import settings
12
  from backend.guardrails.checks import check_input
13
  from backend.pipeline.graph import run_pipeline
14
+ from backend.pipeline.nodes.intent import _AFFECT_CONFIG
15
  from backend.pipeline.state import GenerationConfig, PipelineState
16
  from backend.retrieval.bucket_priors import uniform_priors
17
  from backend.retrieval.vector_store import _get_embedder
 
51
  else "PERSONAL"
52
  )
53
 
54
+ # `style_constraints` is vestigial — planner reads `generation_config` (below) as the source of truth.
55
  route = {
56
  "sub_intents": [
57
  {
 
69
  },
70
  "affect": "NEUTRAL",
71
  }
72
+ # Deep-copy: callers may mutate gen_config downstream; never hand them the shared constant.
73
+ gen_config: GenerationConfig = copy.deepcopy(_AFFECT_CONFIG["NEUTRAL"])
 
 
 
 
74
  return route, gen_config
75
 
76
 
backend/pipeline/nodes/intent.py CHANGED
@@ -1,6 +1,7 @@
1
  # Intent decomposition node — regex-split fragments + BGE zero-shot classifier.
2
  from __future__ import annotations
3
 
 
4
  import re
5
  import time
6
  from functools import lru_cache
@@ -88,24 +89,65 @@ _AFFECT_CONFIG: dict[str, GenerationConfig] = {
88
  "tone_tag": "[TONE:WARM]",
89
  "retrieval_mode": "full",
90
  "persona_mod": "amplify_quirks",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  },
92
  "FRUSTRATED": {
93
  "max_tokens": settings.max_tokens_frustrated,
94
  "tone_tag": "[TONE:DIRECT_EMPATHETIC]",
95
  "retrieval_mode": "fast",
96
  "persona_mod": "suppress_humor",
 
 
 
 
 
 
 
 
97
  },
98
  "NEUTRAL": {
99
  "max_tokens": settings.max_tokens_neutral,
100
  "tone_tag": "[TONE:DEFAULT]",
101
  "retrieval_mode": "full",
102
  "persona_mod": "baseline",
 
 
 
 
 
 
 
 
 
103
  },
104
  "SURPRISED": {
105
  "max_tokens": settings.max_tokens_surprised,
106
  "tone_tag": "[TONE:CLARIFYING]",
107
  "retrieval_mode": "full",
108
  "persona_mod": "add_confirmation",
 
 
 
 
 
 
 
 
109
  },
110
  }
111
 
@@ -185,7 +227,8 @@ def run(state: PipelineState) -> dict:
185
  affect_state = state.get("affect") or {}
186
  emotion: str = affect_state.get("emotion", "NEUTRAL")
187
  query: str = state["raw_query"]
188
- gen_config = _AFFECT_CONFIG.get(emotion, _AFFECT_CONFIG["NEUTRAL"])
 
189
 
190
  fragments = _split_query(query)
191
  priority = "fast" if emotion == "FRUSTRATED" else "normal"
 
1
  # Intent decomposition node — regex-split fragments + BGE zero-shot classifier.
2
  from __future__ import annotations
3
 
4
+ import copy
5
  import re
6
  import time
7
  from functools import lru_cache
 
89
  "tone_tag": "[TONE:WARM]",
90
  "retrieval_mode": "full",
91
  "persona_mod": "amplify_quirks",
92
+ "style": {
93
+ "tone_tag": "[TONE:WARM]",
94
+ "register": "warm, upbeat, affectionate",
95
+ "prefer_words": [
96
+ "glad",
97
+ "love",
98
+ "lucky",
99
+ "happy",
100
+ "great",
101
+ "grateful",
102
+ "fun",
103
+ ],
104
+ "avoid_words": ["unfortunately", "frankly", "tired", "hard", "sorry"],
105
+ "opener_hint": None,
106
+ "exemplar": "Yeah — honestly, that made my week.",
107
+ },
108
  },
109
  "FRUSTRATED": {
110
  "max_tokens": settings.max_tokens_frustrated,
111
  "tone_tag": "[TONE:DIRECT_EMPATHETIC]",
112
  "retrieval_mode": "fast",
113
  "persona_mod": "suppress_humor",
114
+ "style": {
115
+ "tone_tag": "[TONE:DIRECT_EMPATHETIC]",
116
+ "register": "direct, short, validating — no jokes",
117
+ "prefer_words": ["okay", "yes", "right", "i hear you", "fair"],
118
+ "avoid_words": ["hilarious", "ha", "lol", "cheerful", "delightful"],
119
+ "opener_hint": "Acknowledge the feeling in 3-5 words before the answer.",
120
+ "exemplar": "Yeah. That's a lot. Short answer: yes.",
121
+ },
122
  },
123
  "NEUTRAL": {
124
  "max_tokens": settings.max_tokens_neutral,
125
  "tone_tag": "[TONE:DEFAULT]",
126
  "retrieval_mode": "full",
127
  "persona_mod": "baseline",
128
+ "style": {
129
+ "tone_tag": "[TONE:DEFAULT]",
130
+ "register": "natural, conversational",
131
+ "prefer_words": [],
132
+ "avoid_words": [],
133
+ "opener_hint": None,
134
+ # Empty on purpose — let the persona's own example_phrases carry the register.
135
+ "exemplar": "",
136
+ },
137
  },
138
  "SURPRISED": {
139
  "max_tokens": settings.max_tokens_surprised,
140
  "tone_tag": "[TONE:CLARIFYING]",
141
  "retrieval_mode": "full",
142
  "persona_mod": "add_confirmation",
143
+ "style": {
144
+ "tone_tag": "[TONE:CLARIFYING]",
145
+ "register": "curious, clarifying",
146
+ "prefer_words": ["really", "wait", "huh", "oh"],
147
+ "avoid_words": [],
148
+ "opener_hint": "Mirror surprise briefly, then ask a clarifying question.",
149
+ "exemplar": "Oh — wait, really? Did you mean the Friday one?",
150
+ },
151
  },
152
  }
153
 
 
227
  affect_state = state.get("affect") or {}
228
  emotion: str = affect_state.get("emotion", "NEUTRAL")
229
  query: str = state["raw_query"]
230
+ # Deep-copy: callers may mutate gen_config downstream; never hand them the shared constant.
231
+ gen_config = copy.deepcopy(_AFFECT_CONFIG.get(emotion, _AFFECT_CONFIG["NEUTRAL"]))
232
 
233
  fragments = _split_query(query)
234
  priority = "fast" if emotion == "FRUSTRATED" else "normal"
backend/pipeline/nodes/planner.py CHANGED
@@ -1,30 +1,35 @@
1
- # Planner node — prompt building, candidate generation, composite ranking.
2
- from __future__ import annotations
3
-
4
  import time
5
 
6
  from backend.config.settings import settings
7
  from backend.generation.llm_client import active_model, chat_complete
8
  from backend.guardrails.checks import check_output
9
  from backend.pipeline.intent_kind import classify_intent_kind
10
- from backend.pipeline.state import PipelineState
11
- from backend.sensing.labels import GESTURE_TO_TAG
12
-
13
- # ── Persona-specific tone tags (applied on top of affect base tag) ─────────────
14
-
15
- _PERSONA_TONE_OVERRIDES: dict[str, dict[str, str]] = {
16
- "mia_chen": {
17
- "HAPPY": "[TONE:WITTY_SARCASTIC]",
18
- "FRUSTRATED": "[TONE:DIRECT_EMPATHETIC]",
19
- },
20
- "gerald_okafor": {
21
- "HAPPY": "[TONE:WARM_FORMAL]",
22
- "FRUSTRATED": "[TONE:MEASURED_EMPATHETIC]",
23
- },
24
- "arjun_mehta": {
25
- "HAPPY": "[TONE:DIRECT_WARM]",
26
- "FRUSTRATED": "[TONE:MINIMAL_DIRECT]",
27
- },
 
 
 
 
 
 
 
 
28
  }
29
 
30
 
@@ -36,22 +41,16 @@ def run_fallback(state: PipelineState) -> dict:
36
  return _run(state, tier="fallback")
37
 
38
 
39
- # ── Core implementation ────────────────────────────────────────────────────────
40
-
41
-
42
  def _run(state: PipelineState, tier: str) -> dict:
43
  t0 = time.perf_counter()
44
 
45
  profile = state["persona_profile"]
46
- user_id = state["user_id"]
47
  affect = (state.get("affect") or {}).get("emotion", "NEUTRAL")
48
  gen_cfg = state.get("generation_config") or {}
49
  chunks = state.get("retrieved_chunks") or []
50
  history = (state.get("session_history") or [])[-20:]
51
 
52
- tone_tag = _resolve_tone_tag(
53
- user_id, affect, gen_cfg.get("tone_tag", "[TONE:DEFAULT]")
54
- )
55
  gesture_tag = state.get("gesture_tag")
56
  air_written_text = state.get("air_written_text")
57
  turnaround_triggered = state.get("turnaround_triggered", False)
@@ -64,7 +63,7 @@ def _run(state: PipelineState, tier: str) -> dict:
64
  chunks,
65
  history,
66
  state["raw_query"],
67
- tone_tag,
68
  gen_cfg,
69
  gesture_tag=gesture_tag,
70
  air_written_text=air_written_text,
@@ -76,7 +75,7 @@ def _run(state: PipelineState, tier: str) -> dict:
76
  selected = chat_complete(
77
  messages=messages,
78
  max_tokens=gen_cfg.get("max_tokens", settings.max_tokens_neutral),
79
- temperature=0.4,
80
  tier=tier,
81
  )
82
 
@@ -95,7 +94,7 @@ def _run(state: PipelineState, tier: str) -> dict:
95
  4,
96
  )
97
 
98
- augmented_prompt = "\n\n".join(m["content"] for m in messages)
99
  return {
100
  "augmented_prompt": augmented_prompt,
101
  "candidates": [selected],
@@ -107,8 +106,8 @@ def _run(state: PipelineState, tier: str) -> dict:
107
  }
108
 
109
 
110
- def _resolve_tone_tag(user_id: str, affect: str, default_tag: str) -> str:
111
- return _PERSONA_TONE_OVERRIDES.get(user_id, {}).get(affect, default_tag)
112
 
113
 
114
  _AFFECT_HINTS = {
@@ -124,7 +123,7 @@ def _build_messages(
124
  chunks: list[dict],
125
  history: list[dict],
126
  query: str,
127
- tone_tag: str,
128
  gen_cfg: dict,
129
  gesture_tag: str | None = None,
130
  air_written_text: str | None = None,
@@ -141,7 +140,7 @@ def _build_messages(
141
  chunks,
142
  history,
143
  query,
144
- tone_tag,
145
  gen_cfg,
146
  gesture_tag,
147
  air_written_text,
@@ -196,37 +195,11 @@ Answering rules:
196
  --- end character sheet ---"""
197
 
198
 
199
- _PERSONA_MOD_INSTRUCTIONS = {
200
- "amplify_quirks": "Amplify your characteristic style and personality.",
201
- "suppress_humor": "Be direct and supportive. Suppress humor.",
202
- "baseline": "Use your natural communication style.",
203
- "add_confirmation": "Add a clarifying question or confirmation at the end.",
204
- "turnaround": (
205
- "Your previous reply missed what you actually meant. Rephrase "
206
- "more directly — change the wording meaningfully, not just "
207
- "surface tweaks — and end with a one-sentence clarifying "
208
- "question to confirm you're on the right track."
209
- ),
210
- "reverse_stance": (
211
- "Your previous reply was substantively wrong — not poorly worded, "
212
- "but the wrong content. Take a meaningfully different stance using "
213
- "the available memories or, if none fit, honestly say you don't "
214
- "know. Do NOT just reword the previous reply."
215
- ),
216
- "present_state_retry": (
217
- "Your previous reply was wrong about your current state. The "
218
- "affect signal probably misled you. Either flip the emotional "
219
- "read (if you said 'good', try 'not great') or honestly admit "
220
- "you're not sure how you feel right now. Do NOT invent details."
221
- ),
222
- }
223
-
224
-
225
  def _build_user(
226
  chunks: list[dict],
227
  history: list[dict],
228
  query: str,
229
- tone_tag: str,
230
  gen_cfg: dict,
231
  gesture_tag: str | None,
232
  air_written_text: str | None,
@@ -262,20 +235,41 @@ def _build_user(
262
  or " (start of session)"
263
  )
264
 
265
- gesture_line = ""
266
  if gesture_tag:
267
- g_tag = GESTURE_TO_TAG.get(gesture_tag, f"[GESTURE:{gesture_tag}]")
268
- gesture_line = f"\nActive gesture signal: {g_tag}"
 
 
269
 
270
- air_writing_line = ""
271
  if air_written_text:
272
- air_writing_line = f'\nThe user air-wrote: "{air_written_text}" — treat as supplementary intent.'
 
 
 
 
 
273
 
274
- persona_instruction = _PERSONA_MOD_INSTRUCTIONS.get(
275
- gen_cfg.get("persona_mod", "baseline"),
276
- _PERSONA_MOD_INSTRUCTIONS["baseline"],
 
 
277
  )
278
 
 
 
 
 
 
 
 
 
 
 
 
 
279
  turnaround_line = ""
280
  if rejected_response:
281
  safe_rejected = rejected_response.replace('"', "'").replace("\n", " ")[:300]
@@ -287,8 +281,7 @@ def _build_user(
287
  if intent_kind == "present_state":
288
  affect_hint = _AFFECT_HINTS.get(affect, _AFFECT_HINTS["NEUTRAL"])
289
  return f"""\
290
- {tone_tag}{gesture_line}{air_writing_line}{turnaround_line}
291
- {persona_instruction}
292
 
293
  The partner is asking about your present state (right now, today).
294
  Your autobiographical memories do NOT contain this — do not fabricate details from them.
@@ -307,8 +300,7 @@ Reply as {persona_name} in 1–2 sentences, first person.
307
  - Do NOT use autobiographical facts (job, family, hobbies) unless the partner asked."""
308
 
309
  return f"""\
310
- {tone_tag}{gesture_line}{air_writing_line}{turnaround_line}
311
- {persona_instruction}
312
 
313
  Personal memories:
314
  {memory_block}
 
 
 
 
1
  import time
2
 
3
  from backend.config.settings import settings
4
  from backend.generation.llm_client import active_model, chat_complete
5
  from backend.guardrails.checks import check_output
6
  from backend.pipeline.intent_kind import classify_intent_kind
7
+ from backend.pipeline.state import PipelineState, StyleDirective
8
+ from backend.sensing.labels import GESTURE_DIRECTIVES
9
+
10
+ _PERSONA_MOD_INSTRUCTIONS = {
11
+ "amplify_quirks": "Amplify your characteristic style and personality.",
12
+ "suppress_humor": "Be direct and supportive. Suppress humor.",
13
+ "baseline": "Use your natural communication style.",
14
+ "add_confirmation": "Add a clarifying question or confirmation at the end.",
15
+ "turnaround": (
16
+ "Your previous reply missed what you actually meant. Rephrase "
17
+ "more directly — change the wording meaningfully, not just "
18
+ "surface tweaks — and end with a one-sentence clarifying "
19
+ "question to confirm you're on the right track."
20
+ ),
21
+ "reverse_stance": (
22
+ "Your previous reply was substantively wrong — not poorly worded, "
23
+ "but the wrong content. Take a meaningfully different stance using "
24
+ "the available memories or, if none fit, honestly say you don't "
25
+ "know. Do NOT just reword the previous reply."
26
+ ),
27
+ "present_state_retry": (
28
+ "Your previous reply was wrong about your current state. The "
29
+ "affect signal probably misled you. Either flip the emotional "
30
+ "read (if you said 'good', try 'not great') or honestly admit "
31
+ "you're not sure how you feel right now. Do NOT invent details."
32
+ ),
33
  }
34
 
35
 
 
41
  return _run(state, tier="fallback")
42
 
43
 
 
 
 
44
  def _run(state: PipelineState, tier: str) -> dict:
45
  t0 = time.perf_counter()
46
 
47
  profile = state["persona_profile"]
 
48
  affect = (state.get("affect") or {}).get("emotion", "NEUTRAL")
49
  gen_cfg = state.get("generation_config") or {}
50
  chunks = state.get("retrieved_chunks") or []
51
  history = (state.get("session_history") or [])[-20:]
52
 
53
+ style: StyleDirective = gen_cfg["style"]
 
 
54
  gesture_tag = state.get("gesture_tag")
55
  air_written_text = state.get("air_written_text")
56
  turnaround_triggered = state.get("turnaround_triggered", False)
 
63
  chunks,
64
  history,
65
  state["raw_query"],
66
+ style,
67
  gen_cfg,
68
  gesture_tag=gesture_tag,
69
  air_written_text=air_written_text,
 
75
  selected = chat_complete(
76
  messages=messages,
77
  max_tokens=gen_cfg.get("max_tokens", settings.max_tokens_neutral),
78
+ temperature=0.8,
79
  tier=tier,
80
  )
81
 
 
94
  4,
95
  )
96
 
97
+ augmented_prompt = "\n\n".join(f"[{m['role']}] {m['content']}" for m in messages)
98
  return {
99
  "augmented_prompt": augmented_prompt,
100
  "candidates": [selected],
 
106
  }
107
 
108
 
109
+ def _format_word_list(words: list[str]) -> str:
110
+ return ", ".join(words) if words else "(no constraint)"
111
 
112
 
113
  _AFFECT_HINTS = {
 
123
  chunks: list[dict],
124
  history: list[dict],
125
  query: str,
126
+ style: StyleDirective,
127
  gen_cfg: dict,
128
  gesture_tag: str | None = None,
129
  air_written_text: str | None = None,
 
140
  chunks,
141
  history,
142
  query,
143
+ style,
144
  gen_cfg,
145
  gesture_tag,
146
  air_written_text,
 
195
  --- end character sheet ---"""
196
 
197
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
198
  def _build_user(
199
  chunks: list[dict],
200
  history: list[dict],
201
  query: str,
202
+ style: StyleDirective,
203
  gen_cfg: dict,
204
  gesture_tag: str | None,
205
  air_written_text: str | None,
 
235
  or " (start of session)"
236
  )
237
 
238
+ merged_opener = style.get("opener_hint")
239
  if gesture_tag:
240
+ directive = GESTURE_DIRECTIVES.get(gesture_tag)
241
+ if directive:
242
+ # Gesture opener wins over affect opener — a deliberate thumbs-up is a stronger signal than inferred affect.
243
+ merged_opener = directive["opener_hint"]
244
 
245
+ air_writing_block = ""
246
  if air_written_text:
247
+ air_writing_block = (
248
+ f'\nThe user air-wrote: "{air_written_text}". '
249
+ "If this looks like a name, noun, or short phrase, "
250
+ "incorporate it verbatim into your response; "
251
+ "otherwise use it as a hint about what they're trying to say."
252
+ )
253
 
254
+ persona_mod = gen_cfg.get("persona_mod", "baseline")
255
+ persona_instruction_line = (
256
+ f"\n{_PERSONA_MOD_INSTRUCTIONS[persona_mod]}"
257
+ if persona_mod in _PERSONA_MOD_INSTRUCTIONS and persona_mod != "baseline"
258
+ else ""
259
  )
260
 
261
+ directive_lines = [
262
+ f"- Register: {style['register']}",
263
+ f"- Prefer words like: {_format_word_list(style['prefer_words'])}",
264
+ f"- Avoid words like: {_format_word_list(style['avoid_words'])}",
265
+ f"- Opener: {merged_opener or 'no constraint'}",
266
+ ]
267
+ if style.get("exemplar"):
268
+ directive_lines.append(
269
+ f'- In this register, a sentence sounds like: "{style["exemplar"]}"'
270
+ )
271
+ directive_block = "Style directive:\n" + "\n".join(directive_lines)
272
+
273
  turnaround_line = ""
274
  if rejected_response:
275
  safe_rejected = rejected_response.replace('"', "'").replace("\n", " ")[:300]
 
281
  if intent_kind == "present_state":
282
  affect_hint = _AFFECT_HINTS.get(affect, _AFFECT_HINTS["NEUTRAL"])
283
  return f"""\
284
+ {directive_block}{air_writing_block}{turnaround_line}{persona_instruction_line}
 
285
 
286
  The partner is asking about your present state (right now, today).
287
  Your autobiographical memories do NOT contain this — do not fabricate details from them.
 
300
  - Do NOT use autobiographical facts (job, family, hobbies) unless the partner asked."""
301
 
302
  return f"""\
303
+ {directive_block}{air_writing_block}{turnaround_line}{persona_instruction_line}
 
304
 
305
  Personal memories:
306
  {memory_block}
backend/pipeline/state.py CHANGED
@@ -43,14 +43,25 @@ class IntentRoute(TypedDict):
43
  affect: str
44
 
45
 
 
 
 
 
 
 
 
 
 
46
  class GenerationConfig(TypedDict):
47
  max_tokens: int
48
- tone_tag: str # e.g. "[TONE:WITTY_SARCASTIC]"
49
  retrieval_mode: str # "fast" | "full"
50
  persona_mod: str
51
  # persona_mod values:
52
  # "amplify_quirks" | "suppress_humor" | "baseline"
53
  # | "add_confirmation" | "turnaround"
 
 
54
 
55
 
56
  class LatencyLog(TypedDict):
 
43
  affect: str
44
 
45
 
46
+ class StyleDirective(TypedDict):
47
+ tone_tag: str # e.g. "[TONE:WARM]" — kept for logging + eval
48
+ register: str # short register phrase, e.g. "warm, upbeat, affectionate"
49
+ prefer_words: list[str] # lexical bias — words to steer toward
50
+ avoid_words: list[str] # anti-patterns — words to steer away from
51
+ opener_hint: str | None # structural hint for the opening clause
52
+ exemplar: str # one short sentence in the target register
53
+
54
+
55
  class GenerationConfig(TypedDict):
56
  max_tokens: int
57
+ tone_tag: str # legacy tag (kept in sync with style["tone_tag"] for existing log consumers)
58
  retrieval_mode: str # "fast" | "full"
59
  persona_mod: str
60
  # persona_mod values:
61
  # "amplify_quirks" | "suppress_humor" | "baseline"
62
  # | "add_confirmation" | "turnaround"
63
+ # | "reverse_stance" | "present_state_retry"
64
+ style: StyleDirective
65
 
66
 
67
  class LatencyLog(TypedDict):
backend/sensing/bucket_keywords.py CHANGED
@@ -1,9 +1,26 @@
1
  _BUCKET_KEYWORDS: list[tuple[str, tuple[str, ...]]] = [
2
- ("medical", ("medication", "medicine", "doctor", "health", "allergic", "therapy")),
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ("family", ("family", "mom", "dad", "brother", "sister", "parents")),
4
  ("hobbies", ("hobby", "like to do", "enjoy", "weekend", "fun")),
5
- ("daily_routine", ("routine", "morning", "wake", "sleep", "daily")),
6
- ("social", ("friend", "social", "people", "party", "community")),
 
 
 
7
  ]
8
 
9
 
 
1
  _BUCKET_KEYWORDS: list[tuple[str, tuple[str, ...]]] = [
2
+ # AAC air-writing templates (help/water/stop/done/more) are mapped here too
3
+ # when a partner/user signals one of these, retrieval pulls from the matching bucket.
4
+ (
5
+ "medical",
6
+ (
7
+ "medication",
8
+ "medicine",
9
+ "doctor",
10
+ "health",
11
+ "allergic",
12
+ "therapy",
13
+ "help",
14
+ "stop",
15
+ ),
16
+ ),
17
  ("family", ("family", "mom", "dad", "brother", "sister", "parents")),
18
  ("hobbies", ("hobby", "like to do", "enjoy", "weekend", "fun")),
19
+ (
20
+ "daily_routine",
21
+ ("routine", "morning", "wake", "sleep", "daily", "water", "done", "more"),
22
+ ),
23
+ ("social", ("friend", "social", "people", "party", "community", "hi")),
24
  ]
25
 
26
 
backend/sensing/labels.py CHANGED
@@ -1,6 +1,18 @@
1
- GESTURE_TO_TAG: dict[str, str] = {
2
- "THUMBS_UP": "[GESTURE:THUMBS_UP][TONE:AFFIRMATIVE]",
3
- "THUMBS_DOWN": "[GESTURE:THUMBS_DOWN][TONE:NEGATIVE]",
4
- "POINTING": "[GESTURE:POINTING][INTENT:REFERENTIAL]",
5
- "WAVING": "[GESTURE:WAVING][INTENT:GREETING]",
 
 
 
 
 
 
 
 
 
 
 
 
6
  }
 
1
+ GESTURE_DIRECTIVES: dict[str, dict[str, str]] = {
2
+ "THUMBS_UP": {
3
+ "tone": "[GESTURE:THUMBS_UP][TONE:AFFIRMATIVE]",
4
+ "opener_hint": "Open with an affirmation (Yes / Totally / For sure).",
5
+ },
6
+ "THUMBS_DOWN": {
7
+ "tone": "[GESTURE:THUMBS_DOWN][TONE:NEGATIVE]",
8
+ "opener_hint": "Open by declining or disagreeing briefly.",
9
+ },
10
+ "POINTING": {
11
+ "tone": "[GESTURE:POINTING][INTENT:REFERENTIAL]",
12
+ "opener_hint": "Treat the query as referring to a specific named thing.",
13
+ },
14
+ "WAVING": {
15
+ "tone": "[GESTURE:WAVING][INTENT:GREETING]",
16
+ "opener_hint": "Open with a greeting.",
17
+ },
18
  }
frontend/src/hooks/useSensing.ts CHANGED
@@ -13,6 +13,7 @@ import {
13
  AirWriter,
14
  HeadPoseTracker,
15
  } from "../lib/sensing";
 
16
 
17
  const EMA_ALPHA = 0.3;
18
 
@@ -20,11 +21,12 @@ export function useSensing() {
20
  const faceLandmarkerRef = useRef<FaceLandmarker | null>(null);
21
  const handLandmarkerRef = useRef<HandLandmarker | null>(null);
22
  const gazeTrackerRef = useRef(new GazeTracker());
23
- const airWriterRef = useRef(new AirWriter());
24
  const headTrackerRef = useRef(new HeadPoseTracker());
25
  const calibratePendingRef = useRef(false);
26
  const headDebugRef = useRef({ dx: 0, dy: 0, maxAbsDx: 0, maxAbsDy: 0, crossings: 0 });
27
  const neutralLCPRef = useRef<number | null>(null);
 
28
  const smoothedRef = useRef({ MAR: 0, EAR: 0.3, BRI: -0.3, LCP: 0 });
29
  const initingRef = useRef(false);
30
  const [ready, setReady] = useState(false);
@@ -108,9 +110,18 @@ export function useSensing() {
108
  if (faceResult.faceLandmarks && faceResult.faceLandmarks.length > 0) {
109
  const landmarks = faceResult.faceLandmarks[0];
110
 
 
 
 
 
111
  if (neutralLCPRef.current === null) {
112
- neutralLCPRef.current =
113
- (landmarks[61].x + landmarks[291].x) / 2;
 
 
 
 
 
114
  }
115
 
116
  if (calibratePendingRef.current) {
@@ -118,18 +129,21 @@ export function useSensing() {
118
  calibratePendingRef.current = false;
119
  }
120
 
121
- const raw = computeAffectVector(landmarks, neutralLCPRef.current);
 
122
 
123
- const prev = smoothedRef.current;
124
- const smoothed = {
125
- MAR: EMA_ALPHA * raw.MAR + (1 - EMA_ALPHA) * prev.MAR,
126
- EAR: EMA_ALPHA * raw.EAR + (1 - EMA_ALPHA) * prev.EAR,
127
- BRI: EMA_ALPHA * raw.BRI + (1 - EMA_ALPHA) * prev.BRI,
128
- LCP: EMA_ALPHA * raw.LCP + (1 - EMA_ALPHA) * prev.LCP,
129
- };
130
- smoothedRef.current = smoothed;
 
 
 
131
 
132
- affect = classifyAffect(smoothed);
133
  gazeBucket = gazeTrackerRef.current.process(landmarks);
134
  headSignal = headTrackerRef.current.process(landmarks);
135
  headDebugRef.current = headTrackerRef.current.debug;
@@ -182,6 +196,7 @@ export function useSensing() {
182
 
183
  const resetCalibration = useCallback(() => {
184
  neutralLCPRef.current = null;
 
185
  smoothedRef.current = { MAR: 0, EAR: 0.3, BRI: -0.3, LCP: 0 };
186
  gazeTrackerRef.current.reset();
187
  headTrackerRef.current.reset();
 
13
  AirWriter,
14
  HeadPoseTracker,
15
  } from "../lib/sensing";
16
+ import { DEFAULT_AIR_TEMPLATES } from "../lib/airTemplates";
17
 
18
  const EMA_ALPHA = 0.3;
19
 
 
21
  const faceLandmarkerRef = useRef<FaceLandmarker | null>(null);
22
  const handLandmarkerRef = useRef<HandLandmarker | null>(null);
23
  const gazeTrackerRef = useRef(new GazeTracker());
24
+ const airWriterRef = useRef(new AirWriter(DEFAULT_AIR_TEMPLATES));
25
  const headTrackerRef = useRef(new HeadPoseTracker());
26
  const calibratePendingRef = useRef(false);
27
  const headDebugRef = useRef({ dx: 0, dy: 0, maxAbsDx: 0, maxAbsDy: 0, crossings: 0 });
28
  const neutralLCPRef = useRef<number | null>(null);
29
+ const calibBufferRef = useRef<number[]>([]);
30
  const smoothedRef = useRef({ MAR: 0, EAR: 0.3, BRI: -0.3, LCP: 0 });
31
  const initingRef = useRef(false);
32
  const [ready, setReady] = useState(false);
 
110
  if (faceResult.faceLandmarks && faceResult.faceLandmarks.length > 0) {
111
  const landmarks = faceResult.faceLandmarks[0];
112
 
113
+ // Average the raw LCP (vertical corner pull, pre-offset) over ~30 frames
114
+ // of the user's face before locking neutral. Single-frame calibration is
115
+ // too noisy and tended to bake in a momentary smile as "neutral".
116
+ // During calibration, affect stays null but gaze/head/gesture still flow.
117
  if (neutralLCPRef.current === null) {
118
+ const raw0 = computeAffectVector(landmarks, 0);
119
+ calibBufferRef.current.push(raw0.LCP);
120
+ if (calibBufferRef.current.length >= 30) {
121
+ const sum = calibBufferRef.current.reduce((a, b) => a + b, 0);
122
+ neutralLCPRef.current = sum / calibBufferRef.current.length;
123
+ calibBufferRef.current = [];
124
+ }
125
  }
126
 
127
  if (calibratePendingRef.current) {
 
129
  calibratePendingRef.current = false;
130
  }
131
 
132
+ if (neutralLCPRef.current !== null) {
133
+ const raw = computeAffectVector(landmarks, neutralLCPRef.current);
134
 
135
+ const prev = smoothedRef.current;
136
+ const smoothed = {
137
+ MAR: EMA_ALPHA * raw.MAR + (1 - EMA_ALPHA) * prev.MAR,
138
+ EAR: EMA_ALPHA * raw.EAR + (1 - EMA_ALPHA) * prev.EAR,
139
+ BRI: EMA_ALPHA * raw.BRI + (1 - EMA_ALPHA) * prev.BRI,
140
+ LCP: EMA_ALPHA * raw.LCP + (1 - EMA_ALPHA) * prev.LCP,
141
+ };
142
+ smoothedRef.current = smoothed;
143
+
144
+ affect = classifyAffect(smoothed);
145
+ }
146
 
 
147
  gazeBucket = gazeTrackerRef.current.process(landmarks);
148
  headSignal = headTrackerRef.current.process(landmarks);
149
  headDebugRef.current = headTrackerRef.current.debug;
 
196
 
197
  const resetCalibration = useCallback(() => {
198
  neutralLCPRef.current = null;
199
+ calibBufferRef.current = [];
200
  smoothedRef.current = { MAR: 0, EAR: 0.3, BRI: -0.3, LCP: 0 };
201
  gazeTrackerRef.current.reset();
202
  headTrackerRef.current.reset();
frontend/src/lib/airTemplates.ts ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // Default air-writing template bank.
2
+ // Each template is a normalised 32-point [x, y] trajectory (coords in [0, 1]).
3
+ // Matched against live trajectories via DTW in AirWriter.recognise.
4
+ // To add a new template: pick a distinctive *single-stroke* shape,
5
+ // sample ~32 evenly-spaced points from stroke start → end, normalise
6
+ // x/y into [0, 1], and add an entry to DEFAULT_AIR_TEMPLATES.
7
+ //
8
+ // DTW quality tips:
9
+ // - Stick to single-stroke shapes. Multi-stroke shapes (like an X) look
10
+ // like a teleport to DTW and will mis-match.
11
+ // - Shapes should be distinctive in direction and extent — a small
12
+ // check-mark and a big slash look similar after normalisation.
13
+
14
+ function linear(from: [number, number], to: [number, number], n: number): [number, number][] {
15
+ const out: [number, number][] = [];
16
+ for (let i = 0; i < n; i++) {
17
+ const t = i / (n - 1);
18
+ out.push([from[0] + t * (to[0] - from[0]), from[1] + t * (to[1] - from[1])]);
19
+ }
20
+ return out;
21
+ }
22
+
23
+ function concat(...segs: [number, number][][]): [number, number][] {
24
+ const out: [number, number][] = [];
25
+ for (const s of segs) out.push(...s);
26
+ return resample(out, 32);
27
+ }
28
+
29
+ function resample(pts: [number, number][], n: number): [number, number][] {
30
+ if (pts.length < 2) return pts;
31
+ const out: [number, number][] = [];
32
+ for (let i = 0; i < n; i++) {
33
+ const t = (i / (n - 1)) * (pts.length - 1);
34
+ const lo = Math.floor(t);
35
+ const hi = Math.min(lo + 1, pts.length - 1);
36
+ const frac = t - lo;
37
+ out.push([
38
+ pts[lo][0] + frac * (pts[hi][0] - pts[lo][0]),
39
+ pts[lo][1] + frac * (pts[hi][1] - pts[lo][1]),
40
+ ]);
41
+ }
42
+ return out;
43
+ }
44
+
45
+ // check-mark: short down-right, then long up-right → affirmation
46
+ const YES: [number, number][] = concat(
47
+ linear([0.0, 0.5], [0.35, 1.0], 12),
48
+ linear([0.35, 1.0], [1.0, 0.0], 20)
49
+ );
50
+
51
+ // question-mark: curve over the top, then down to the dot → clarifying
52
+ const QUESTION: [number, number][] = concat(
53
+ linear([0.1, 0.25], [0.5, 0.0], 8),
54
+ linear([0.5, 0.0], [0.9, 0.25], 8),
55
+ linear([0.9, 0.25], [0.5, 0.55], 8),
56
+ linear([0.5, 0.55], [0.5, 1.0], 8)
57
+ );
58
+
59
+ // zig-zag wave across the top → greeting
60
+ const HI: [number, number][] = concat(
61
+ linear([0.0, 0.0], [0.25, 1.0], 8),
62
+ linear([0.25, 1.0], [0.5, 0.0], 8),
63
+ linear([0.5, 0.0], [0.75, 1.0], 8),
64
+ linear([0.75, 1.0], [1.0, 0.0], 8)
65
+ );
66
+
67
+ // straight vertical line bottom→top → "help" (raise hand / SOS mental model)
68
+ const HELP: [number, number][] = linear([0.5, 1.0], [0.5, 0.0], 32);
69
+
70
+ // horizontal line left→right → "done" (close / finish)
71
+ const DONE: [number, number][] = linear([0.0, 0.5], [1.0, 0.5], 32);
72
+
73
+ // plus-sign-ish as a single stroke: long down, backtrack up, then across → "more"
74
+ // mimics drawing "+" as one continuous stroke (down, back, right)
75
+ const MORE: [number, number][] = concat(
76
+ linear([0.5, 0.0], [0.5, 1.0], 12),
77
+ linear([0.5, 1.0], [0.5, 0.5], 6),
78
+ linear([0.5, 0.5], [1.0, 0.5], 14)
79
+ );
80
+
81
+ // single wave (down-up-down-up smooth) → "water" (fluid/ocean mental model)
82
+ const WATER: [number, number][] = concat(
83
+ linear([0.0, 0.5], [0.2, 0.9], 6),
84
+ linear([0.2, 0.9], [0.4, 0.1], 8),
85
+ linear([0.4, 0.1], [0.6, 0.9], 8),
86
+ linear([0.6, 0.9], [0.8, 0.1], 8),
87
+ linear([0.8, 0.1], [1.0, 0.5], 2)
88
+ );
89
+
90
+ // square/box (traced as one stroke) → "stop"
91
+ // start top-left, go right, down, left, up — closing the box
92
+ const STOP: [number, number][] = concat(
93
+ linear([0.0, 0.0], [1.0, 0.0], 8),
94
+ linear([1.0, 0.0], [1.0, 1.0], 8),
95
+ linear([1.0, 1.0], [0.0, 1.0], 8),
96
+ linear([0.0, 1.0], [0.0, 0.0], 8)
97
+ );
98
+
99
+ export const DEFAULT_AIR_TEMPLATES: Map<string, [number, number][]> = new Map([
100
+ ["yes", YES],
101
+ ["?", QUESTION],
102
+ ["hi", HI],
103
+ ["help", HELP],
104
+ ["done", DONE],
105
+ ["more", MORE],
106
+ ["water", WATER],
107
+ ["stop", STOP],
108
+ ]);
frontend/src/lib/sensing.ts CHANGED
@@ -11,12 +11,16 @@ interface AffectVector {
11
 
12
  export function classifyAffect(v: AffectVector): Affect {
13
  // BRI is relative (browMid.y - eyeCenter.y) / interOcular — more negative = brows raised higher
14
- // LCP is relative to calibrated neutral positive = corners pulled up (smile)
 
15
  // MAR is absolute ratio — higher = mouth more open
16
- // EAR is absolute ratio — lower = eyes more closed
17
  if (v.BRI < -0.35 && v.MAR > 0.4) return "SURPRISED";
18
- if (v.EAR < 0.12 && v.LCP < -0.005) return "FRUSTRATED";
19
- if (v.LCP > 0.005) return "HAPPY";
 
 
 
20
  return "NEUTRAL";
21
  }
22
 
@@ -55,8 +59,14 @@ export function computeAffectVector(
55
  // Raising brows moves them toward y=0, making this value more negative.
56
  const BRI = (browMid.y - eyeCenter.y) / (interOcular + 1e-6);
57
 
58
- const LCP =
59
- (landmarks[CORNER_LEFT].x + landmarks[CORNER_RIGHT].x) / 2 - neutralLCP;
 
 
 
 
 
 
60
 
61
  return { MAR, EAR, BRI, LCP };
62
  }
@@ -524,7 +534,13 @@ export class AirWriter {
524
  }
525
 
526
  private recognise(trajectory: [number, number][]): string | null {
527
- if (trajectory.length < 5 || this.templates.size === 0) return null;
 
 
 
 
 
 
528
  const query = normaliseTrajectory(trajectory);
529
  let bestChar: string | null = null;
530
  let bestDist = Infinity;
@@ -535,6 +551,15 @@ export class AirWriter {
535
  bestChar = char;
536
  }
537
  }
 
 
 
 
 
 
 
 
 
538
  return bestChar;
539
  }
540
 
 
11
 
12
  export function classifyAffect(v: AffectVector): Affect {
13
  // BRI is relative (browMid.y - eyeCenter.y) / interOcular — more negative = brows raised higher
14
+ // LCP is vertical offset of lip corners from mouth center, normalised by inter-ocular,
15
+ // relative to calibrated neutral — positive = corners pulled UP (smile), negative = DOWN (frown)
16
  // MAR is absolute ratio — higher = mouth more open
17
+ // EAR is absolute ratio — lower = eyes more closed / squinting
18
  if (v.BRI < -0.35 && v.MAR > 0.4) return "SURPRISED";
19
+ // FRUSTRATED: a clear frown, OR brows lowered + squinting — either signals displeasure
20
+ if (v.LCP < -0.015) return "FRUSTRATED";
21
+ if (v.BRI > -0.2 && v.EAR < 0.18) return "FRUSTRATED";
22
+ // HAPPY: meaningful upward pull of lip corners (tighter than the old 0.005)
23
+ if (v.LCP > 0.015) return "HAPPY";
24
  return "NEUTRAL";
25
  }
26
 
 
59
  // Raising brows moves them toward y=0, making this value more negative.
60
  const BRI = (browMid.y - eyeCenter.y) / (interOcular + 1e-6);
61
 
62
+ // Lip-corner pull: average y of the two corners vs. mouth vertical centre,
63
+ // normalised by inter-ocular distance, relative to calibrated neutral.
64
+ // MediaPipe y increases downward, so corners rising above the mouth centre → negative raw,
65
+ // which we flip so smile = positive. Subtracting the calibrated neutral removes per-face bias.
66
+ const mouthCentreY = (landmarks[MOUTH_TOP].y + landmarks[MOUTH_BOTTOM].y) / 2;
67
+ const cornerAvgY = (landmarks[CORNER_LEFT].y + landmarks[CORNER_RIGHT].y) / 2;
68
+ const rawLCP = (mouthCentreY - cornerAvgY) / (interOcular + 1e-6);
69
+ const LCP = rawLCP - neutralLCP;
70
 
71
  return { MAR, EAR, BRI, LCP };
72
  }
 
534
  }
535
 
536
  private recognise(trajectory: [number, number][]): string | null {
537
+ if (trajectory.length < 5) {
538
+ return null;
539
+ }
540
+ if (this.templates.size === 0) {
541
+ console.debug("[AirWriter] stroke completed but template bank is empty");
542
+ return null;
543
+ }
544
  const query = normaliseTrajectory(trajectory);
545
  let bestChar: string | null = null;
546
  let bestDist = Infinity;
 
551
  bestChar = char;
552
  }
553
  }
554
+ // Reject poor matches so we don't pass garbage to the LLM.
555
+ // Threshold is empirical — tune once real users test this.
556
+ const MATCH_THRESHOLD = 8.0;
557
+ if (bestDist > MATCH_THRESHOLD) {
558
+ console.debug(
559
+ `[AirWriter] no template matched (best='${bestChar}', dist=${bestDist.toFixed(2)})`
560
+ );
561
+ return null;
562
+ }
563
  return bestChar;
564
  }
565