shwetangisingh commited on
Commit
df78c68
·
1 Parent(s): 690c106

Streaming candidate picker + side-index feedback loops

Browse files

- Planner fans out 3 candidates with distinct grounding strategies
(broad/focused/serendipitous for memory; good/fine/rough for
present-state). Streams tokens via SSE through /chat/stream and
/chat/regenerate/stream.
- Picker UI: stackable cards per candidate, "try again" regenerates
with prior options marked rejected. Head-shake during open picker
triggers regenerate; post-pick still triggers turnaround.
- Side-index at data/pick_index/<uid>/ stores (query → picked text +
buckets). Feeds back into generation as a prior_pick retrieved
chunk and blends into bucket_priors at weight 0.3 (transient).
- Concurrency: RLock on pick_index cache, Lock on planner completed[].
rAF-batched token rendering. SSE reader drains trailing buffer.
Empty-candidate fallback surfaces actionable text + logs.

.gitignore CHANGED
@@ -18,6 +18,7 @@ env/
18
 
19
  # Data — indexes are rebuilt from source; do NOT commit binaries
20
  data/vector_store/
 
21
 
22
  # Per-turn JSONL logs (contain user conversation content)
23
  logs/
 
18
 
19
  # Data — indexes are rebuilt from source; do NOT commit binaries
20
  data/vector_store/
21
+ data/pick_index/
22
 
23
  # Per-turn JSONL logs (contain user conversation content)
24
  logs/
README.md CHANGED
@@ -386,7 +386,6 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
386
  ### Dataset
387
 
388
  - [x] **[Core]** Memories carry three chunk types per persona — `narrative`, `social_post`, `chat_log` — each with a `bucket` label. Type is preserved through the vector-store metadata and feeds the P(type) session prior.
389
- - [ ] **[Core]** Write down the data schema somewhere so evals can reuse it
390
 
391
  ### Sensing (frontend)
392
 
@@ -427,10 +426,10 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
427
 
428
  ### Generation
429
 
430
- - [ ] **[Core]** API returns one response. Should return multiple candidates so the user can pick (and so the next item works)
431
- - [ ] **[Core]** Frontend needs a candidate picker show all the options, let the user click one, send the selection back
432
- - [ ] **[Bonus]** When user picks a candidate, save the `(query, picked)` pair to a side vector index and check it first next turn
433
- - [x] LLM temperature bumped from 0.4 → 0.8 in [backend/pipeline/nodes/planner.py](backend/pipeline/nodes/planner.py). The old setting produced near-identical responses across turns even when affect/gesture changed, which made the sensing→output link hard to see. 0.8 gives meaningful lexical variation while staying in the persona's voice.
434
 
435
  ### Evals
436
 
@@ -450,10 +449,15 @@ Scoring runs synchronously on the `/chat` response path and the `eval_scores` di
450
  - [x] **[Eval]** Multimodal alignment — affect scored by positive/negative lexicon overlap vs. target sentiment, gesture by opener-phrase regex (THUMBS_UP/THUMBS_DOWN/WAVING), gaze by fraction of retrieved chunks matching the looked-at bucket
451
  - [x] **[Eval]** Authenticity — per-turn stars under each assistant bubble, POST to `/feedback/rating`, logged with `run_id + rater_id`
452
  - [ ] **[Eval]** For the live in-class eval: figure out the actual session — who rates (partners + experts per spec), how many turns each, what gets shown to them. The Likert form is the easy part; the protocol isn't written down anywhere
 
 
 
 
 
 
453
 
454
  ### Cleanup
455
 
456
- - [ ] move the affect → `StyleDirective` config (`_AFFECT_CONFIG` in [intent.py](backend/pipeline/nodes/intent.py)) and the gesture directives ([labels.py](backend/sensing/labels.py)) out of code into a yaml
457
  - [x] delete `backend/sensing/` (dead code, sensing is in frontend) — done, only `labels.py` remains
458
  - [x] per-persona affect overrides (`_PERSONA_TONE_OVERRIDES`) deleted — redundant with `stylistic_preferences` in the new persona JSONs
459
 
 
386
  ### Dataset
387
 
388
  - [x] **[Core]** Memories carry three chunk types per persona — `narrative`, `social_post`, `chat_log` — each with a `bucket` label. Type is preserved through the vector-store metadata and feeds the P(type) session prior.
 
389
 
390
  ### Sensing (frontend)
391
 
 
426
 
427
  ### Generation
428
 
429
+ - [x] **[Core]** API returns 3 candidates (plus an optional side-index hit) on `/chat` — see `candidates` in [backend/api/main.py](backend/api/main.py) `ChatResponse`. Planner fans out three grounding strategies in parallel threads and dedupes identical outputs: **broad** (all retrieved personal chunks), **focused** (top chunk only), and **serendipitous** (random non-top chunks) — see `_pick_strategy_chunks` in [backend/pipeline/nodes/planner.py](backend/pipeline/nodes/planner.py). Turnaround/present-state retries skip the fan-out and regenerate a single response.
430
+ - [x] **[Core]** Frontend picker shows stacked candidate cards with a strategy label under each; click to commit, which strikes the rest, locks the AAC bubble to the chosen text, and fires `POST /chat/pick`. One-candidate responses render as a normal bubble. See `handlePick` + `.candidate-list` in [frontend/src/components/ChatPanel.tsx](frontend/src/components/ChatPanel.tsx).
431
+ - [x] **[Bonus]** Side-index at `data/pick_index/<uid>/` stores `(query embedding picked text, strategy, picked_buckets)` after every pick. Two feedback loops into generation: (1) the retrieval node injects the previously-picked text as a `source: "prior_pick"` chunk rendered in a "you answered like this before" block — the three LLM candidates all see it and riff on it; (2) retrieval blends cumulative `bucket_pick_counts` into this turn's `bucket_priors` at weight 0.3 (transient — doesn't persist across turns), so users who historically pick family memories bias retrieval toward family without overriding the session prior. The raw picked text is also still surfaced as a standalone `side_index` candidate. See [backend/retrieval/pick_index.py](backend/retrieval/pick_index.py), `_blend_pick_history_into_priors` + `_prepend_prior_pick` in [backend/pipeline/nodes/retrieval.py](backend/pipeline/nodes/retrieval.py), and the prior-pick block in `_build_user` in [backend/pipeline/nodes/planner.py](backend/pipeline/nodes/planner.py).
432
+ - [x] LLM temperature bumped from 0.4 → 0.8, then pulled back to 0.7 once chunk-variation became the primary diversity axis. With three different grounding strategies feeding three parallel calls, sampling noise matters less than which memories are in the context window.
433
 
434
  ### Evals
435
 
 
449
  - [x] **[Eval]** Multimodal alignment — affect scored by positive/negative lexicon overlap vs. target sentiment, gesture by opener-phrase regex (THUMBS_UP/THUMBS_DOWN/WAVING), gaze by fraction of retrieved chunks matching the looked-at bucket
450
  - [x] **[Eval]** Authenticity — per-turn stars under each assistant bubble, POST to `/feedback/rating`, logged with `run_id + rater_id`
451
  - [ ] **[Eval]** For the live in-class eval: figure out the actual session — who rates (partners + experts per spec), how many turns each, what gets shown to them. The Likert form is the easy part; the protocol isn't written down anywhere
452
+ - [ ] **[Eval]** Relevance score — one NLI call per turn asking "does the response address the partner's query?" Fills the biggest current gap: a perfectly grounded but off-topic reply scores 100% grounded today and we'd never catch it
453
+ - [ ] **[Eval]** Candidate diversity — mean pairwise cosine distance among the 3 candidates in a picker round. Low diversity = picker showing three paraphrases of the same answer (the "aloha" problem), which is a signal that retrieval or temperature needs tuning for that query
454
+ - [ ] **[Eval]** Picker-aware metrics from `turns.jsonl` + `picks.jsonl`: which strategy wins most (`broad` vs `focused` vs `serendipitous` vs `side_index`), pick rate (% of turns where user clicked a card), regenerate rate (% of turns where user clicked "try again"). All computable offline, no runtime cost
455
+ - [ ] **[Eval]** Score alternate candidates too, not just the selected one. Right now `compute_evals` only scores `selected_response`; scoring all 3 would let us measure whether the picker actually improves quality over taking candidate 0 blindly
456
+ - [ ] **[Eval]** UI coverage gap: `compute_evals` returns 12 fields but `EvalPanel` renders only 5 pills (latency, grounded, affect, gesture, gaze). Hallucination rate, overall multimodal_alignment, SLO target/margin are computed and logged but never surfaced in the bubble. Decide what belongs as a pill vs a tooltip-only number vs an offline-only log field
457
+ - [ ] **[Eval]** On pill hover, tooltip should explain *how* the number was computed, not just what it means. Today the `title` attributes say "Groundedness: fraction of response sentences supported by retrieved memories" — which is the definition but not the math. Want: "5/8 sentences had NLI entailment prob ≥ 0.5 against the retrieved chunks" for groundedness; "3/4 positive-lexicon words matched HAPPY target" for affect; raw scorer inputs + thresholds exposed inline so the number isn't a black box
458
 
459
  ### Cleanup
460
 
 
461
  - [x] delete `backend/sensing/` (dead code, sensing is in frontend) — done, only `labels.py` remains
462
  - [x] per-persona affect overrides (`_PERSONA_TONE_OVERRIDES`) deleted — redundant with `stylistic_preferences` in the new persona JSONs
463
 
backend/api/main.py CHANGED
@@ -9,6 +9,7 @@ from pathlib import Path
9
 
10
  from fastapi import FastAPI, HTTPException
11
  from fastapi.middleware.cors import CORSMiddleware
 
12
  from pydantic import BaseModel, Field
13
 
14
  from backend.config.settings import settings
@@ -18,11 +19,12 @@ from backend.generation.llm_client import ( # active_model used by /debug/confi
18
  get_client,
19
  )
20
  from backend.guardrails.checks import check_input
21
- from backend.pipeline.graph import run_pipeline
22
  from backend.pipeline.intent_kind import classify_intent_kind
23
  from backend.pipeline.nodes import feedback as feedback_node
24
  from backend.pipeline.nodes import planner as planner_node
25
  from backend.pipeline.state import PipelineState
 
26
  from backend.retrieval.priors import BUCKETS, CHUNK_TYPES, uniform
27
  from backend.retrieval.vector_store import _get_embedder, retrieve
28
 
@@ -83,10 +85,17 @@ class TurnaroundRequest(BaseModel):
83
  head_signal: str | None = None
84
 
85
 
 
 
 
 
 
 
86
  class ChatResponse(BaseModel):
87
  user_id: str
88
  query: str
89
  response: str
 
90
  affect: str
91
  llm_tier: str
92
  llm_model: str
@@ -98,6 +107,18 @@ class ChatResponse(BaseModel):
98
  eval_scores: dict | None = None
99
 
100
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  class RatingRequest(BaseModel):
102
  run_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
103
  user_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
@@ -170,6 +191,7 @@ def _build_initial_state(req: ChatRequest, session: dict) -> PipelineState:
170
  retrieval_mode_used="",
171
  augmented_prompt=None,
172
  candidates=[],
 
173
  selected_response=None,
174
  llm_tier_used="",
175
  llm_model_used="",
@@ -330,6 +352,7 @@ def chat(req: ChatRequest):
330
  user_id=req.user_id,
331
  query=req.query,
332
  response=guard["fallback"],
 
333
  affect="NEUTRAL",
334
  llm_tier="none",
335
  llm_model="none",
@@ -368,6 +391,7 @@ def chat(req: ChatRequest):
368
  user_id=req.user_id,
369
  query=req.query,
370
  response=result["selected_response"] or "",
 
371
  affect=affect_emotion,
372
  llm_tier=result.get("llm_tier_used", "unknown"),
373
  llm_model=result.get("llm_model_used", "unknown"),
@@ -380,6 +404,105 @@ def chat(req: ChatRequest):
380
  )
381
 
382
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
383
  @app.post("/chat/turnaround", response_model=ChatResponse)
384
  def chat_turnaround(req: TurnaroundRequest):
385
  if req.user_id not in _sessions:
@@ -470,6 +593,289 @@ def chat_turnaround(req: TurnaroundRequest):
470
  user_id=req.user_id,
471
  query=replan_state["raw_query"],
472
  response=replan_state["selected_response"] or "",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
473
  affect=affect_emotion,
474
  llm_tier=replan_state.get("llm_tier_used", "unknown"),
475
  llm_model=replan_state.get("llm_model_used", "unknown"),
 
9
 
10
  from fastapi import FastAPI, HTTPException
11
  from fastapi.middleware.cors import CORSMiddleware
12
+ from fastapi.responses import StreamingResponse
13
  from pydantic import BaseModel, Field
14
 
15
  from backend.config.settings import settings
 
19
  get_client,
20
  )
21
  from backend.guardrails.checks import check_input
22
+ from backend.pipeline.graph import choose_planner_tier, run_pipeline, run_until_planner
23
  from backend.pipeline.intent_kind import classify_intent_kind
24
  from backend.pipeline.nodes import feedback as feedback_node
25
  from backend.pipeline.nodes import planner as planner_node
26
  from backend.pipeline.state import PipelineState
27
+ from backend.retrieval import pick_index
28
  from backend.retrieval.priors import BUCKETS, CHUNK_TYPES, uniform
29
  from backend.retrieval.vector_store import _get_embedder, retrieve
30
 
 
85
  head_signal: str | None = None
86
 
87
 
88
+ class CandidateOut(BaseModel):
89
+ text: str
90
+ strategy: str
91
+ grounded_buckets: list[str] = []
92
+
93
+
94
  class ChatResponse(BaseModel):
95
  user_id: str
96
  query: str
97
  response: str
98
+ candidates: list[CandidateOut] = []
99
  affect: str
100
  llm_tier: str
101
  llm_model: str
 
107
  eval_scores: dict | None = None
108
 
109
 
110
+ class PickRequest(BaseModel):
111
+ run_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
112
+ user_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
113
+ picked_idx: int = Field(ge=0, le=10)
114
+
115
+
116
+ class RegenerateRequest(BaseModel):
117
+ user_id: str
118
+ turn_id: int | None = None
119
+ rejected_texts: list[str] = Field(default_factory=list, max_length=20)
120
+
121
+
122
  class RatingRequest(BaseModel):
123
  run_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
124
  user_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
 
191
  retrieval_mode_used="",
192
  augmented_prompt=None,
193
  candidates=[],
194
+ rejected_candidates=[],
195
  selected_response=None,
196
  llm_tier_used="",
197
  llm_model_used="",
 
352
  user_id=req.user_id,
353
  query=req.query,
354
  response=guard["fallback"],
355
+ candidates=[],
356
  affect="NEUTRAL",
357
  llm_tier="none",
358
  llm_model="none",
 
391
  user_id=req.user_id,
392
  query=req.query,
393
  response=result["selected_response"] or "",
394
+ candidates=[CandidateOut(**c) for c in result.get("candidates") or []],
395
  affect=affect_emotion,
396
  llm_tier=result.get("llm_tier_used", "unknown"),
397
  llm_model=result.get("llm_model_used", "unknown"),
 
404
  )
405
 
406
 
407
+ @app.post("/chat/stream")
408
+ def chat_stream(req: ChatRequest):
409
+ """Server-Sent Events version of /chat. Runs intent + retrieval synchronously,
410
+ then streams planner candidate tokens as they arrive. Final event carries the
411
+ full ChatResponse-shaped payload.
412
+ """
413
+ guard = check_input(req.query)
414
+ if not guard["allowed"]:
415
+ # Mirror the non-stream /chat early-exit.
416
+ payload = {
417
+ "user_id": req.user_id,
418
+ "query": req.query,
419
+ "response": guard["fallback"],
420
+ "candidates": [],
421
+ "affect": "NEUTRAL",
422
+ "llm_tier": "none",
423
+ "llm_model": "none",
424
+ "retrieval_mode": "none",
425
+ "latency": {},
426
+ "guardrail_passed": False,
427
+ "turn_id": 0,
428
+ "run_id": None,
429
+ "eval_scores": None,
430
+ }
431
+
432
+ def _one_event():
433
+ yield _sse({"type": "complete", "response": payload})
434
+
435
+ return StreamingResponse(_one_event(), media_type="text/event-stream")
436
+
437
+ session = _get_or_init_session(req.user_id)
438
+ initial_state = _build_initial_state(req, session)
439
+
440
+ def _gen():
441
+ state = run_until_planner(initial_state)
442
+ tier = choose_planner_tier(state)
443
+
444
+ completion: dict | None = None
445
+ for evt in planner_node._run_stream(state, tier=tier):
446
+ if evt["type"] == "complete":
447
+ completion = evt["planner_update"]
448
+ break
449
+ yield _sse(evt)
450
+
451
+ if completion is None:
452
+ yield _sse({"type": "error", "message": "planner produced no completion"})
453
+ return
454
+
455
+ state.update(completion) # type: ignore[typeddict-item]
456
+ state.update(feedback_node.run(state)) # type: ignore[typeddict-item]
457
+
458
+ session["session_history"] = state["session_history"]
459
+ session["bucket_priors"] = state["bucket_priors"]
460
+ session["type_priors"] = state["type_priors"]
461
+ session["last_state"] = state
462
+
463
+ affect_emotion = (state.get("affect") or {}).get("emotion", "NEUTRAL")
464
+ run_id = state.get("run_id")
465
+
466
+ eval_scores = _compute_and_persist_evals(
467
+ run_id=run_id,
468
+ user_id=req.user_id,
469
+ turn_id=state["turn_id"],
470
+ response=state["selected_response"] or "",
471
+ chunks=list(state.get("retrieved_chunks") or []),
472
+ latency_log=dict(state.get("latency_log") or {}),
473
+ affect=affect_emotion,
474
+ gesture_tag=req.gesture_tag,
475
+ gaze_bucket=req.gaze_bucket,
476
+ )
477
+
478
+ final = {
479
+ "user_id": req.user_id,
480
+ "query": req.query,
481
+ "response": state["selected_response"] or "",
482
+ "candidates": [dict(c) for c in state.get("candidates") or []],
483
+ "affect": affect_emotion,
484
+ "llm_tier": state.get("llm_tier_used", "unknown"),
485
+ "llm_model": state.get("llm_model_used", "unknown"),
486
+ "retrieval_mode": state.get("retrieval_mode_used", "unknown"),
487
+ "latency": state.get("latency_log") or {},
488
+ "guardrail_passed": state.get("guardrail_passed", True),
489
+ "run_id": run_id,
490
+ "turn_id": state["turn_id"],
491
+ "eval_scores": eval_scores,
492
+ }
493
+ yield _sse({"type": "complete", "response": final})
494
+
495
+ return StreamingResponse(
496
+ _gen(),
497
+ media_type="text/event-stream",
498
+ headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
499
+ )
500
+
501
+
502
+ def _sse(data: dict) -> str:
503
+ return f"data: {json.dumps(data, ensure_ascii=False)}\n\n"
504
+
505
+
506
  @app.post("/chat/turnaround", response_model=ChatResponse)
507
  def chat_turnaround(req: TurnaroundRequest):
508
  if req.user_id not in _sessions:
 
593
  user_id=req.user_id,
594
  query=replan_state["raw_query"],
595
  response=replan_state["selected_response"] or "",
596
+ candidates=[CandidateOut(**c) for c in replan_state.get("candidates") or []],
597
+ affect=affect_emotion,
598
+ llm_tier=replan_state.get("llm_tier_used", "unknown"),
599
+ llm_model=replan_state.get("llm_model_used", "unknown"),
600
+ retrieval_mode=replan_state.get("retrieval_mode_used", "unknown"),
601
+ latency=replan_state.get("latency_log") or {},
602
+ guardrail_passed=replan_state.get("guardrail_passed", True),
603
+ run_id=run_id,
604
+ turn_id=replan_state["turn_id"],
605
+ eval_scores=eval_scores,
606
+ )
607
+
608
+
609
+ def _find_turn_from_jsonl(run_id: str) -> dict | None:
610
+ """Scan turns.jsonl from the end for a matching run_id. Used as fallback
611
+ when the session's last_state has already moved on."""
612
+ path = Path(settings.logs_dir) / "turns.jsonl"
613
+ if not path.exists():
614
+ return None
615
+ try:
616
+ with open(path, encoding="utf-8") as f:
617
+ lines = f.readlines()
618
+ except OSError:
619
+ return None
620
+ for line in reversed(lines[-500:]): # bounded tail scan
621
+ try:
622
+ row = json.loads(line)
623
+ except json.JSONDecodeError:
624
+ continue
625
+ if row.get("run_id") == run_id:
626
+ return row
627
+ return None
628
+
629
+
630
+ @app.post("/chat/pick")
631
+ def pick_candidate(req: PickRequest):
632
+ if not _RUN_ID_RE.match(req.run_id):
633
+ raise HTTPException(status_code=400, detail="invalid run_id")
634
+
635
+ session = _sessions.get(req.user_id) or {}
636
+ last = session.get("last_state") or {}
637
+ candidates = last.get("candidates") or []
638
+ query_text = last.get("raw_query") or ""
639
+
640
+ # Fallback: last_state already advanced past this run_id — read from JSONL
641
+ if last.get("run_id") != req.run_id or not candidates:
642
+ row = _find_turn_from_jsonl(req.run_id)
643
+ if not row:
644
+ raise HTTPException(status_code=404, detail="turn not found")
645
+ candidates = row.get("candidates") or []
646
+ query_text = row.get("query") or query_text
647
+
648
+ if req.picked_idx >= len(candidates):
649
+ raise HTTPException(status_code=400, detail="picked_idx out of range")
650
+
651
+ picked = candidates[req.picked_idx]
652
+ picked_text = picked.get("text", "")
653
+ strategy = picked.get("strategy", "unknown")
654
+ picked_buckets = [
655
+ b for b in (picked.get("grounded_buckets") or []) if b and b != "open_domain"
656
+ ]
657
+
658
+ if query_text and picked_text:
659
+ try:
660
+ pick_index.add(
661
+ query=query_text,
662
+ user_id=req.user_id,
663
+ strategy=strategy,
664
+ picked_text=picked_text,
665
+ picked_buckets=picked_buckets,
666
+ )
667
+ except Exception as exc:
668
+ _log.warning("pick_index add failed: %r", exc)
669
+
670
+ logs_dir = Path(settings.logs_dir)
671
+ logs_dir.mkdir(parents=True, exist_ok=True)
672
+ entry = {
673
+ "ts": time.time(),
674
+ "run_id": req.run_id,
675
+ "user_id": req.user_id,
676
+ "picked_idx": req.picked_idx,
677
+ "strategy": strategy,
678
+ "picked_text": picked_text,
679
+ "query": query_text,
680
+ }
681
+ with open(logs_dir / "picks.jsonl", "a", encoding="utf-8") as f:
682
+ f.write(json.dumps(entry, ensure_ascii=False) + "\n")
683
+
684
+ return {"status": "ok", "strategy": strategy}
685
+
686
+
687
+ @app.post("/chat/regenerate/stream")
688
+ def chat_regenerate_stream(req: RegenerateRequest):
689
+ """Streaming regenerate — same as /chat/stream but reuses last_state and
690
+ marks all prior candidates as rejected."""
691
+ if req.user_id not in _sessions:
692
+ raise HTTPException(status_code=404, detail="no active session")
693
+ session = _sessions[req.user_id]
694
+ last: PipelineState | None = session.get("last_state")
695
+ if last is None:
696
+ raise HTTPException(status_code=409, detail="no prior turn to regenerate")
697
+ if req.turn_id is not None and req.turn_id != last["turn_id"]:
698
+ raise HTTPException(status_code=409, detail="stale turn_id")
699
+
700
+ gen_cfg = dict(last.get("generation_config") or {})
701
+ gen_cfg["persona_mod"] = "all_rejected"
702
+ gen_cfg.setdefault("tone_tag", "[TONE:TRY_DIFFERENT_ANGLE]")
703
+
704
+ prior_rejected = [c.get("text", "") for c in (last.get("candidates") or [])]
705
+ merged = (
706
+ list(last.get("rejected_candidates") or [])
707
+ + [t for t in prior_rejected if t]
708
+ + [t for t in req.rejected_texts if t]
709
+ )
710
+ seen: set[str] = set()
711
+ rejected: list[str] = []
712
+ for t in merged:
713
+ key = t.strip().lower()
714
+ if key and key not in seen:
715
+ seen.add(key)
716
+ rejected.append(t)
717
+
718
+ trimmed_history = list(last.get("session_history") or [])
719
+ if trimmed_history and trimmed_history[-1].get("role") == "aac_user":
720
+ trimmed_history.pop()
721
+ if trimmed_history and trimmed_history[-1].get("role") == "partner":
722
+ trimmed_history.pop()
723
+
724
+ replan_state: PipelineState = dict(last) # type: ignore[assignment]
725
+ replan_state["session_history"] = trimmed_history
726
+ replan_state["generation_config"] = gen_cfg
727
+ replan_state["rejected_candidates"] = rejected
728
+ replan_state["turnaround_triggered"] = False
729
+ replan_state["latency_log"] = {
730
+ "t_sensing": 0.0,
731
+ "t_intent": 0.0,
732
+ "t_retrieval": 0.0,
733
+ "t_generation": 0.0,
734
+ "t_total": 0.0,
735
+ }
736
+
737
+ def _gen():
738
+ completion: dict | None = None
739
+ for evt in planner_node._run_stream(replan_state, tier="primary"):
740
+ if evt["type"] == "complete":
741
+ completion = evt["planner_update"]
742
+ break
743
+ yield _sse(evt)
744
+ if completion is None:
745
+ yield _sse({"type": "error", "message": "planner produced no completion"})
746
+ return
747
+ replan_state.update(completion) # type: ignore[typeddict-item]
748
+ replan_state.update(feedback_node.run(replan_state)) # type: ignore[typeddict-item]
749
+
750
+ session["session_history"] = replan_state["session_history"]
751
+ session["bucket_priors"] = replan_state["bucket_priors"]
752
+ session["type_priors"] = replan_state["type_priors"]
753
+ session["last_state"] = replan_state
754
+
755
+ affect_emotion = (replan_state.get("affect") or {}).get("emotion", "NEUTRAL")
756
+ run_id = replan_state.get("run_id")
757
+ eval_scores = _compute_and_persist_evals(
758
+ run_id=run_id,
759
+ user_id=req.user_id,
760
+ turn_id=replan_state["turn_id"],
761
+ response=replan_state["selected_response"] or "",
762
+ chunks=list(replan_state.get("retrieved_chunks") or []),
763
+ latency_log=dict(replan_state.get("latency_log") or {}),
764
+ affect=affect_emotion,
765
+ gesture_tag=replan_state.get("gesture_tag"),
766
+ gaze_bucket=replan_state.get("gaze_bucket"),
767
+ )
768
+ final = {
769
+ "user_id": req.user_id,
770
+ "query": replan_state["raw_query"],
771
+ "response": replan_state["selected_response"] or "",
772
+ "candidates": [dict(c) for c in replan_state.get("candidates") or []],
773
+ "affect": affect_emotion,
774
+ "llm_tier": replan_state.get("llm_tier_used", "unknown"),
775
+ "llm_model": replan_state.get("llm_model_used", "unknown"),
776
+ "retrieval_mode": replan_state.get("retrieval_mode_used", "unknown"),
777
+ "latency": replan_state.get("latency_log") or {},
778
+ "guardrail_passed": replan_state.get("guardrail_passed", True),
779
+ "run_id": run_id,
780
+ "turn_id": replan_state["turn_id"],
781
+ "eval_scores": eval_scores,
782
+ }
783
+ yield _sse({"type": "complete", "response": final})
784
+
785
+ return StreamingResponse(
786
+ _gen(),
787
+ media_type="text/event-stream",
788
+ headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
789
+ )
790
+
791
+
792
+ @app.post("/chat/regenerate", response_model=ChatResponse)
793
+ def chat_regenerate(req: RegenerateRequest):
794
+ """Re-run the planner for the same turn with all prior candidates marked rejected.
795
+ Does NOT advance turn_id — same partner query, fresh fan-out of candidates.
796
+ """
797
+ if req.user_id not in _sessions:
798
+ raise HTTPException(status_code=404, detail="no active session")
799
+ session = _sessions[req.user_id]
800
+ last: PipelineState | None = session.get("last_state")
801
+ if last is None:
802
+ raise HTTPException(status_code=409, detail="no prior turn to regenerate")
803
+ if req.turn_id is not None and req.turn_id != last["turn_id"]:
804
+ raise HTTPException(status_code=409, detail="stale turn_id")
805
+
806
+ gen_cfg = dict(last.get("generation_config") or {})
807
+ gen_cfg["persona_mod"] = "all_rejected"
808
+ gen_cfg.setdefault("tone_tag", "[TONE:TRY_DIFFERENT_ANGLE]")
809
+
810
+ prior_rejected = [c.get("text", "") for c in (last.get("candidates") or [])]
811
+ merged_rejected = (
812
+ list(last.get("rejected_candidates") or [])
813
+ + [t for t in prior_rejected if t]
814
+ + [t for t in req.rejected_texts if t]
815
+ )
816
+ # Dedupe while preserving order.
817
+ seen: set[str] = set()
818
+ rejected: list[str] = []
819
+ for t in merged_rejected:
820
+ key = t.strip().lower()
821
+ if key and key not in seen:
822
+ seen.add(key)
823
+ rejected.append(t)
824
+
825
+ # Strip the tail (partner, aac_user) so feedback doesn't stack duplicate
826
+ # history entries on every regenerate — the user hasn't committed yet.
827
+ trimmed_history = list(last.get("session_history") or [])
828
+ if trimmed_history and trimmed_history[-1].get("role") == "aac_user":
829
+ trimmed_history.pop()
830
+ if trimmed_history and trimmed_history[-1].get("role") == "partner":
831
+ trimmed_history.pop()
832
+
833
+ replan_state: PipelineState = dict(last) # type: ignore[assignment]
834
+ replan_state["session_history"] = trimmed_history
835
+ replan_state["generation_config"] = gen_cfg
836
+ replan_state["rejected_candidates"] = rejected
837
+ replan_state["turnaround_triggered"] = False # keep multi-shot
838
+ replan_state["latency_log"] = {
839
+ "t_sensing": 0.0,
840
+ "t_intent": 0.0,
841
+ "t_retrieval": 0.0,
842
+ "t_generation": 0.0,
843
+ "t_total": 0.0,
844
+ }
845
+
846
+ planner_update = planner_node.run_primary(replan_state)
847
+ replan_state.update(planner_update) # type: ignore[typeddict-item]
848
+
849
+ # Feedback node rewrites history + assigns a new run_id. Each regenerate
850
+ # is its own row in turns.jsonl for the eval record.
851
+ feedback_update = feedback_node.run(replan_state)
852
+ replan_state.update(feedback_update) # type: ignore[typeddict-item]
853
+
854
+ session["session_history"] = replan_state["session_history"]
855
+ session["bucket_priors"] = replan_state["bucket_priors"]
856
+ session["type_priors"] = replan_state["type_priors"]
857
+ session["last_state"] = replan_state
858
+
859
+ affect_emotion = (replan_state.get("affect") or {}).get("emotion", "NEUTRAL")
860
+ run_id = replan_state.get("run_id")
861
+
862
+ eval_scores = _compute_and_persist_evals(
863
+ run_id=run_id,
864
+ user_id=req.user_id,
865
+ turn_id=replan_state["turn_id"],
866
+ response=replan_state["selected_response"] or "",
867
+ chunks=list(replan_state.get("retrieved_chunks") or []),
868
+ latency_log=dict(replan_state.get("latency_log") or {}),
869
+ affect=affect_emotion,
870
+ gesture_tag=replan_state.get("gesture_tag"),
871
+ gaze_bucket=replan_state.get("gaze_bucket"),
872
+ )
873
+
874
+ return ChatResponse(
875
+ user_id=req.user_id,
876
+ query=replan_state["raw_query"],
877
+ response=replan_state["selected_response"] or "",
878
+ candidates=[CandidateOut(**c) for c in replan_state.get("candidates") or []],
879
  affect=affect_emotion,
880
  llm_tier=replan_state.get("llm_tier_used", "unknown"),
881
  llm_model=replan_state.get("llm_model_used", "unknown"),
backend/generation/llm_client.py CHANGED
@@ -1,5 +1,6 @@
1
  # Two-tier LLM client — primary / fallback, both Ollama Cloud over OpenAI-compatible HTTP.
2
  import re
 
3
  from functools import lru_cache
4
  from typing import Any
5
 
@@ -87,6 +88,56 @@ def chat_complete(
87
  return stripped
88
 
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  def warmup(tier: str | None = None) -> None:
91
  chat_complete(
92
  messages=[{"role": "user", "content": "hi"}],
 
1
  # Two-tier LLM client — primary / fallback, both Ollama Cloud over OpenAI-compatible HTTP.
2
  import re
3
+ from collections.abc import Iterator
4
  from functools import lru_cache
5
  from typing import Any
6
 
 
88
  return stripped
89
 
90
 
91
+ def chat_complete_stream(
92
+ messages: list[dict],
93
+ max_tokens: int,
94
+ tier: str | None = None,
95
+ temperature: float = 0.7,
96
+ **kwargs: Any,
97
+ ) -> Iterator[str]:
98
+ """Yield token deltas as they arrive. Thinking-mode stripping is applied
99
+ post-hoc on the buffered text by the caller — streaming <think>…</think>
100
+ into the UI would confuse the picker anyway.
101
+ """
102
+ resolved_tier = tier or settings.active_llm_tier
103
+ model = active_model(resolved_tier)
104
+ client = get_client(resolved_tier)
105
+
106
+ patched_messages = messages
107
+ extra_body: dict[str, Any] = kwargs.pop("extra_body", {})
108
+
109
+ if settings.thinking_mode == "suppress":
110
+ patched_messages = _apply_no_think(messages)
111
+
112
+ effective_max_tokens = max_tokens
113
+ if settings.thinking_mode in ("strip", "full"):
114
+ effective_max_tokens = max_tokens + settings.thinking_token_budget
115
+
116
+ stream = client.chat.completions.create(
117
+ model=model,
118
+ messages=patched_messages,
119
+ max_tokens=effective_max_tokens,
120
+ temperature=temperature,
121
+ stream=True,
122
+ extra_body=extra_body or None,
123
+ **kwargs,
124
+ )
125
+ for chunk in stream:
126
+ if not chunk.choices:
127
+ continue
128
+ delta = chunk.choices[0].delta
129
+ piece = getattr(delta, "content", None) or ""
130
+ if piece:
131
+ yield piece
132
+
133
+
134
+ def finalize_streamed(text: str) -> str:
135
+ """Apply the same post-processing chat_complete does once a stream is done."""
136
+ if settings.thinking_mode in ("off", "strip"):
137
+ text = _strip_think_tags(text)
138
+ return text.strip()
139
+
140
+
141
  def warmup(tier: str | None = None) -> None:
142
  chat_complete(
143
  messages=[{"role": "user", "content": "hi"}],
backend/main.py CHANGED
@@ -180,6 +180,7 @@ def main() -> None:
180
  retrieval_mode_used="",
181
  augmented_prompt=None,
182
  candidates=[],
 
183
  selected_response=None,
184
  llm_tier_used="",
185
  latency_log={
 
180
  retrieval_mode_used="",
181
  augmented_prompt=None,
182
  candidates=[],
183
+ rejected_candidates=[],
184
  selected_response=None,
185
  llm_tier_used="",
186
  latency_log={
backend/pipeline/graph.py CHANGED
@@ -34,3 +34,19 @@ def run_pipeline(state: PipelineState) -> PipelineState:
34
 
35
  _merge(state, feedback.run(state))
36
  return state
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  _merge(state, feedback.run(state))
36
  return state
37
+
38
+
39
+ def run_until_planner(state: PipelineState) -> PipelineState:
40
+ """Run intent + retrieval only. Used by the streaming endpoint so it can
41
+ then drive the planner's token stream itself and call feedback at the end.
42
+ """
43
+ _merge(state, intent.run(state))
44
+ if _route_by_affect(state) == "fast":
45
+ _merge(state, retrieval.run_fast(state))
46
+ else:
47
+ _merge(state, retrieval.run_full(state))
48
+ return state
49
+
50
+
51
+ def choose_planner_tier(state: PipelineState) -> str:
52
+ return _route_by_latency(state)
backend/pipeline/nodes/feedback.py CHANGED
@@ -39,12 +39,14 @@ def _log_to_jsonl(
39
  latency = state.get("latency_log") or {}
40
  affect = (state.get("affect") or {}).get("emotion", "UNKNOWN")
41
  chunks = state.get("retrieved_chunks") or []
 
42
 
43
  entry = {
44
  "run_id": run_id,
45
  "ts": time.time(),
46
  "user_id": state["user_id"],
47
  "turn_id": state["turn_id"],
 
48
  "llm_tier": state.get("llm_tier_used", "unknown"),
49
  "retrieval_mode": state.get("retrieval_mode_used", "unknown"),
50
  "affect": affect,
@@ -57,6 +59,7 @@ def _log_to_jsonl(
57
  ),
58
  "num_contextual": sum(1 for c in chunks if c.get("source") == "contextual"),
59
  "num_open_domain": sum(1 for c in chunks if c.get("source") == "open_domain"),
 
60
  "latency": {
61
  "t_sensing": latency.get("t_sensing", 0.0),
62
  "t_intent": latency.get("t_intent", 0.0),
@@ -65,6 +68,8 @@ def _log_to_jsonl(
65
  "t_total": latency.get("t_total", 0.0),
66
  },
67
  "response": state.get("selected_response") or "",
 
 
68
  "bucket_priors_after": bucket_priors_after,
69
  "type_priors_after": type_priors_after,
70
  }
 
39
  latency = state.get("latency_log") or {}
40
  affect = (state.get("affect") or {}).get("emotion", "UNKNOWN")
41
  chunks = state.get("retrieved_chunks") or []
42
+ candidates = state.get("candidates") or []
43
 
44
  entry = {
45
  "run_id": run_id,
46
  "ts": time.time(),
47
  "user_id": state["user_id"],
48
  "turn_id": state["turn_id"],
49
+ "query": state["raw_query"],
50
  "llm_tier": state.get("llm_tier_used", "unknown"),
51
  "retrieval_mode": state.get("retrieval_mode_used", "unknown"),
52
  "affect": affect,
 
59
  ),
60
  "num_contextual": sum(1 for c in chunks if c.get("source") == "contextual"),
61
  "num_open_domain": sum(1 for c in chunks if c.get("source") == "open_domain"),
62
+ "num_prior_pick": sum(1 for c in chunks if c.get("source") == "prior_pick"),
63
  "latency": {
64
  "t_sensing": latency.get("t_sensing", 0.0),
65
  "t_intent": latency.get("t_intent", 0.0),
 
68
  "t_total": latency.get("t_total", 0.0),
69
  },
70
  "response": state.get("selected_response") or "",
71
+ "candidates": [dict(c) for c in candidates],
72
+ "n_candidates": len(candidates),
73
  "bucket_priors_after": bucket_priors_after,
74
  "type_priors_after": type_priors_after,
75
  }
backend/pipeline/nodes/planner.py CHANGED
@@ -1,12 +1,32 @@
 
 
 
 
1
  import time
 
2
 
3
  from backend.config.settings import settings
4
- from backend.generation.llm_client import active_model, chat_complete
 
 
 
 
 
5
  from backend.guardrails.checks import check_output
6
  from backend.pipeline.intent_kind import classify_intent_kind
7
- from backend.pipeline.state import PipelineState, StyleDirective
 
8
  from backend.sensing.labels import GESTURE_DIRECTIVES
9
 
 
 
 
 
 
 
 
 
 
10
  _PERSONA_MOD_INSTRUCTIONS = {
11
  "amplify_quirks": "Amplify your characteristic style and personality.",
12
  "suppress_humor": "Be direct and supportive. Suppress humor.",
@@ -30,6 +50,12 @@ _PERSONA_MOD_INSTRUCTIONS = {
30
  "read (if you said 'good', try 'not great') or honestly admit "
31
  "you're not sure how you feel right now. Do NOT invent details."
32
  ),
 
 
 
 
 
 
33
  }
34
 
35
 
@@ -41,7 +67,23 @@ def run_fallback(state: PipelineState) -> dict:
41
  return _run(state, tier="fallback")
42
 
43
 
44
- def _run(state: PipelineState, tier: str) -> dict:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  t0 = time.perf_counter()
46
 
47
  profile = state["persona_profile"]
@@ -57,31 +99,386 @@ def _run(state: PipelineState, tier: str) -> dict:
57
  rejected_response: str | None = None
58
  if turnaround_triggered:
59
  rejected_response = state.get("selected_response")
 
60
  intent_kind = classify_intent_kind(state.get("intent_route"))
61
- messages = _build_messages(
62
- profile,
63
- chunks,
64
- history,
65
- state["raw_query"],
66
- style,
67
- gen_cfg,
68
- gesture_tag=gesture_tag,
69
- air_written_text=air_written_text,
70
- rejected_response=rejected_response,
71
- intent_kind=intent_kind,
72
- affect=affect,
73
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
- selected = chat_complete(
76
- messages=messages,
77
- max_tokens=gen_cfg.get("max_tokens", settings.max_tokens_neutral),
78
- temperature=0.8,
79
- tier=tier,
 
 
 
 
80
  )
81
 
82
- guard = check_output(selected, chunks)
83
- if not guard["passed"]:
84
- selected = guard["fallback"]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
  t_gen = time.perf_counter() - t0
87
  latency_log = dict(state.get("latency_log") or {})
@@ -94,15 +491,32 @@ def _run(state: PipelineState, tier: str) -> dict:
94
  4,
95
  )
96
 
97
- augmented_prompt = "\n\n".join(f"[{m['role']}] {m['content']}" for m in messages)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98
  return {
99
  "augmented_prompt": augmented_prompt,
100
- "candidates": [selected],
101
  "selected_response": selected,
102
  "llm_tier_used": tier,
103
  "llm_model_used": active_model(tier),
104
  "latency_log": latency_log,
105
- "guardrail_passed": guard["passed"],
106
  }
107
 
108
 
@@ -128,6 +542,7 @@ def _build_messages(
128
  gesture_tag: str | None = None,
129
  air_written_text: str | None = None,
130
  rejected_response: str | None = None,
 
131
  intent_kind: str = "memory",
132
  affect: str = "NEUTRAL",
133
  ) -> list[dict]:
@@ -146,6 +561,7 @@ def _build_messages(
146
  air_written_text,
147
  profile["name"],
148
  rejected_response=rejected_response,
 
149
  intent_kind=intent_kind,
150
  affect=affect,
151
  )
@@ -206,12 +622,14 @@ def _build_user(
206
  persona_name: str,
207
  *,
208
  rejected_response: str | None = None,
 
209
  intent_kind: str = "memory",
210
  affect: str = "NEUTRAL",
211
  ) -> str:
212
  personal_chunks = [c for c in chunks if c.get("source", "personal") == "personal"]
213
  contextual_chunks = [c for c in chunks if c.get("source") == "contextual"]
214
  open_domain_chunks = [c for c in chunks if c.get("source") == "open_domain"]
 
215
 
216
  memory_block = (
217
  "\n".join(
@@ -220,6 +638,11 @@ def _build_user(
220
  )
221
  or " (no memories retrieved)"
222
  )
 
 
 
 
 
223
  contextual_block = (
224
  "\n".join(f" {c['text']}" for c in contextual_chunks)
225
  or " (nothing relevant from this session)"
@@ -277,6 +700,15 @@ def _build_user(
277
  f"\nYour previous reply (which you need to replace, not repeat): "
278
  f'"{safe_rejected}"'
279
  )
 
 
 
 
 
 
 
 
 
280
 
281
  if intent_kind == "present_state":
282
  affect_hint = _AFFECT_HINTS.get(affect, _AFFECT_HINTS["NEUTRAL"])
@@ -299,8 +731,16 @@ Reply as {persona_name} in 1–2 sentences, first person.
299
  - If the affect read is NEUTRAL or doesn't match what you'd say, it's better to say "I'm not sure" or "honestly, I don't really know right now" than to invent.
300
  - Do NOT use autobiographical facts (job, family, hobbies) unless the partner asked."""
301
 
 
 
 
 
 
 
 
 
302
  return f"""\
303
- {directive_block}{air_writing_block}{turnaround_line}{persona_instruction_line}
304
 
305
  Personal memories:
306
  {memory_block}
 
1
+ import concurrent.futures
2
+ import queue
3
+ import random
4
+ import threading
5
  import time
6
+ from collections.abc import Iterator
7
 
8
  from backend.config.settings import settings
9
+ from backend.generation.llm_client import (
10
+ active_model,
11
+ chat_complete,
12
+ chat_complete_stream,
13
+ finalize_streamed,
14
+ )
15
  from backend.guardrails.checks import check_output
16
  from backend.pipeline.intent_kind import classify_intent_kind
17
+ from backend.pipeline.state import Candidate, PipelineState, StyleDirective
18
+ from backend.retrieval import pick_index
19
  from backend.sensing.labels import GESTURE_DIRECTIVES
20
 
21
+ # For present-state fan-out: three fixed emotional reads the persona can
22
+ # project, so the user can pick among "good / fine / not great" rather than
23
+ # three paraphrases of one mood.
24
+ _PRESENT_STATE_STRATEGIES = [
25
+ ("present_good", "HAPPY"),
26
+ ("present_fine", "NEUTRAL"),
27
+ ("present_rough", "FRUSTRATED"),
28
+ ]
29
+
30
  _PERSONA_MOD_INSTRUCTIONS = {
31
  "amplify_quirks": "Amplify your characteristic style and personality.",
32
  "suppress_humor": "Be direct and supportive. Suppress humor.",
 
50
  "read (if you said 'good', try 'not great') or honestly admit "
51
  "you're not sure how you feel right now. Do NOT invent details."
52
  ),
53
+ "all_rejected": (
54
+ "The user rejected every option you gave last time. Try a "
55
+ "meaningfully different angle — different memory focus, different "
56
+ "emotional register, or admit you don't have a clean answer. Do "
57
+ "NOT re-use wording from the rejected options."
58
+ ),
59
  }
60
 
61
 
 
67
  return _run(state, tier="fallback")
68
 
69
 
70
+ def run_primary_stream(state: PipelineState) -> Iterator[dict]:
71
+ """Token-level streaming variant of the planner.
72
+
73
+ Yields events as tokens arrive across all concurrent candidate streams:
74
+ {"type": "candidate_start", "idx": 0, "strategy": "broad", "grounded_buckets": [...]}
75
+ {"type": "token", "idx": 0, "delta": "Hello"}
76
+ {"type": "candidate_done", "idx": 0, "text": "Hello world."}
77
+ {"type": "side_index", "text": "..."} (optional, at start if there's a hit)
78
+ {"type": "complete", "candidates": [...], "selected_response": "...", ... final state dict}
79
+ """
80
+ yield from _run_stream(state, tier="primary")
81
+
82
+
83
+ _STREAM_SENTINEL = object()
84
+
85
+
86
+ def _run_stream(state: PipelineState, tier: str) -> Iterator[dict]:
87
  t0 = time.perf_counter()
88
 
89
  profile = state["persona_profile"]
 
99
  rejected_response: str | None = None
100
  if turnaround_triggered:
101
  rejected_response = state.get("selected_response")
102
+ rejected_candidates: list[str] = list(state.get("rejected_candidates") or [])
103
  intent_kind = classify_intent_kind(state.get("intent_route"))
104
+ max_tokens = gen_cfg.get("max_tokens", settings.max_tokens_neutral)
105
+
106
+ # Turnaround rephrases are single-shot; everything else fans out.
107
+ # Present-state varies affect (good/fine/rough), memory questions vary
108
+ # which chunks are primary (broad/focused/serendipitous).
109
+ single_shot = turnaround_triggered
110
+ is_present_state = intent_kind == "present_state"
111
+ if single_shot:
112
+ strategies: list[tuple[str, str | None]] = [("focused", None)]
113
+ elif is_present_state:
114
+ strategies = list(_PRESENT_STATE_STRATEGIES)
115
+ else:
116
+ strategies = [
117
+ ("broad", None),
118
+ ("focused", None),
119
+ ("serendipitous", None),
120
+ ]
121
+ # Higher temp on regenerate; also bump for present-state since three
122
+ # strategies share the same (empty) grounding and need sampling noise.
123
+ base_temp = 1.0 if (rejected_candidates or is_present_state) else 0.7
124
+
125
+ # Optional side-index hit — surface as an extra card right away, not generated.
126
+ side_index_candidate: Candidate | None = None
127
+ if not single_shot and not is_present_state:
128
+ try:
129
+ hit = pick_index.lookup(
130
+ query=state["raw_query"],
131
+ user_id=state["user_id"],
132
+ threshold=0.85,
133
+ )
134
+ except Exception as exc:
135
+ print(f"[planner] pick_index lookup failed: {exc!r}")
136
+ hit = None
137
+ if hit:
138
+ text = (hit.get("picked_text") or "").strip()
139
+ if text:
140
+ side_index_candidate = Candidate(
141
+ text=text,
142
+ strategy="side_index",
143
+ grounded_buckets=[],
144
+ )
145
+
146
+ # Pre-announce each candidate slot so the UI can draw empty cards immediately.
147
+ cards: list[dict] = []
148
+ if side_index_candidate:
149
+ cards.append({"strategy": "side_index", "grounded_buckets": []})
150
+ for strategy_name, _affect_override in strategies:
151
+ if is_present_state:
152
+ card_buckets: list[str] = []
153
+ else:
154
+ strategy_chunks = _pick_strategy_chunks(list(chunks), strategy_name)
155
+ card_buckets = [c.get("bucket", "") for c in strategy_chunks]
156
+ cards.append(
157
+ {
158
+ "strategy": strategy_name,
159
+ "grounded_buckets": card_buckets,
160
+ }
161
+ )
162
+ for idx, card in enumerate(cards):
163
+ yield {
164
+ "type": "candidate_start",
165
+ "idx": idx,
166
+ "strategy": card["strategy"],
167
+ "grounded_buckets": card["grounded_buckets"],
168
+ }
169
+ if side_index_candidate is not None:
170
+ yield {
171
+ "type": "candidate_done",
172
+ "idx": 0,
173
+ "text": side_index_candidate["text"],
174
+ }
175
+
176
+ # Spawn a worker thread per strategy. Each one streams tokens into a shared
177
+ # queue; the generator forwards them as SSE events.
178
+ llm_cards_offset = 1 if side_index_candidate else 0
179
+ evt_queue: queue.Queue[dict | object] = queue.Queue()
180
+ completed: list[Candidate | None] = [None] * len(strategies)
181
+ completed_lock = threading.Lock()
182
+
183
+ def _worker(slot: int, strategy: str, affect_override: str | None) -> None:
184
+ if is_present_state:
185
+ strategy_chunks = [] # present-state has no memory grounding
186
+ else:
187
+ strategy_chunks = _pick_strategy_chunks(list(chunks), strategy)
188
+ effective_affect = affect_override if affect_override is not None else affect
189
+ messages = _build_messages(
190
+ profile,
191
+ strategy_chunks,
192
+ history,
193
+ state["raw_query"],
194
+ style,
195
+ gen_cfg,
196
+ gesture_tag=gesture_tag,
197
+ air_written_text=air_written_text,
198
+ rejected_response=rejected_response,
199
+ rejected_candidates=rejected_candidates,
200
+ intent_kind=intent_kind,
201
+ affect=effective_affect,
202
+ )
203
+ buf: list[str] = []
204
+ try:
205
+ for piece in chat_complete_stream(
206
+ messages=messages,
207
+ max_tokens=max_tokens,
208
+ temperature=base_temp,
209
+ tier=tier,
210
+ ):
211
+ buf.append(piece)
212
+ evt_queue.put(
213
+ {
214
+ "type": "token",
215
+ "idx": llm_cards_offset + slot,
216
+ "delta": piece,
217
+ }
218
+ )
219
+ except Exception as exc:
220
+ evt_queue.put(
221
+ {
222
+ "type": "candidate_error",
223
+ "idx": llm_cards_offset + slot,
224
+ "error": repr(exc),
225
+ }
226
+ )
227
+ with completed_lock:
228
+ completed[slot] = None
229
+ evt_queue.put(_STREAM_SENTINEL)
230
+ return
231
+
232
+ final = finalize_streamed("".join(buf))
233
+ guard = check_output(final, strategy_chunks)
234
+ if not guard["passed"]:
235
+ final = guard["fallback"]
236
+ cand = Candidate(
237
+ text=final,
238
+ strategy=strategy,
239
+ grounded_buckets=[c.get("bucket", "") for c in strategy_chunks],
240
+ )
241
+ with completed_lock:
242
+ completed[slot] = cand
243
+ evt_queue.put(
244
+ {
245
+ "type": "candidate_done",
246
+ "idx": llm_cards_offset + slot,
247
+ "text": final,
248
+ }
249
+ )
250
+ evt_queue.put(_STREAM_SENTINEL)
251
+
252
+ threads = [
253
+ threading.Thread(target=_worker, args=(i, s, a), daemon=True)
254
+ for i, (s, a) in enumerate(strategies)
255
+ ]
256
+ for t in threads:
257
+ t.start()
258
+
259
+ remaining = len(threads)
260
+ while remaining > 0:
261
+ evt = evt_queue.get()
262
+ if evt is _STREAM_SENTINEL:
263
+ remaining -= 1
264
+ continue
265
+ yield evt # type: ignore[misc]
266
+
267
+ for t in threads:
268
+ t.join()
269
+
270
+ with completed_lock:
271
+ llm_cands = [c for c in completed if c is not None]
272
+ all_cands: list[Candidate] = []
273
+ if side_index_candidate is not None:
274
+ all_cands.append(side_index_candidate)
275
+ all_cands.extend(llm_cands)
276
+
277
+ # De-dupe against rejected + each other.
278
+ seen: set[str] = {r.strip().lower() for r in rejected_candidates if r}
279
+ uniq: list[Candidate] = []
280
+ for c in all_cands:
281
+ key = c["text"].strip().lower()
282
+ if key and key not in seen:
283
+ seen.add(key)
284
+ uniq.append(c)
285
+ if not uniq:
286
+ # Every candidate was a dup-of-rejected or guardrail-rejected. Surface
287
+ # a non-empty placeholder so the UI isn't showing a blank bubble and
288
+ # the user knows they can regenerate. Logged so we notice if this ever
289
+ # fires in practice — it means something upstream collapsed.
290
+ print(
291
+ f"[planner] empty-candidate fallback fired "
292
+ f"user={state.get('user_id')!r} turn_id={state.get('turn_id')} "
293
+ f"raw_query={state.get('raw_query', '')[:80]!r}"
294
+ )
295
+ uniq = all_cands[:1] or [
296
+ Candidate(
297
+ text="I'm not sure how to answer that — try asking in a different way.",
298
+ strategy="empty",
299
+ grounded_buckets=[],
300
+ )
301
+ ]
302
+
303
+ selected = uniq[0]["text"]
304
 
305
+ t_gen = time.perf_counter() - t0
306
+ latency_log = dict(state.get("latency_log") or {})
307
+ latency_log["t_generation"] = round(t_gen, 4)
308
+ latency_log["t_total"] = round(
309
+ latency_log.get("t_sensing", 0)
310
+ + latency_log.get("t_intent", 0)
311
+ + latency_log.get("t_retrieval", 0)
312
+ + t_gen,
313
+ 4,
314
  )
315
 
316
+ yield {
317
+ "type": "complete",
318
+ "planner_update": {
319
+ "augmented_prompt": None, # skipping for streaming — not worth rebuilding
320
+ "candidates": uniq,
321
+ "selected_response": selected,
322
+ "llm_tier_used": tier,
323
+ "llm_model_used": active_model(tier),
324
+ "latency_log": latency_log,
325
+ "guardrail_passed": True,
326
+ },
327
+ }
328
+
329
+
330
+ def _pick_strategy_chunks(all_chunks: list[dict], strategy: str) -> list[dict]:
331
+ """Select which chunks become the *primary* grounding for a candidate.
332
+ Non-personal chunks (contextual, open_domain) always pass through —
333
+ they're small and query-grounded, not memory variation.
334
+ """
335
+ personal = [c for c in all_chunks if c.get("source", "personal") == "personal"]
336
+ others = [c for c in all_chunks if c.get("source", "personal") != "personal"]
337
+
338
+ if not personal:
339
+ return all_chunks
340
+
341
+ if strategy == "broad":
342
+ chosen = personal
343
+ elif strategy == "focused":
344
+ chosen = personal[:1]
345
+ elif strategy == "serendipitous":
346
+ if len(personal) >= 2:
347
+ pool = personal[1:]
348
+ k = min(len(pool), max(1, len(personal) - 1))
349
+ chosen = random.sample(pool, k)
350
+ else:
351
+ chosen = personal
352
+ else:
353
+ chosen = personal
354
+
355
+ return chosen + others
356
+
357
+
358
+ def _run(state: PipelineState, tier: str) -> dict:
359
+ t0 = time.perf_counter()
360
+
361
+ profile = state["persona_profile"]
362
+ affect = (state.get("affect") or {}).get("emotion", "NEUTRAL")
363
+ gen_cfg = state.get("generation_config") or {}
364
+ chunks = state.get("retrieved_chunks") or []
365
+ history = (state.get("session_history") or [])[-20:]
366
+
367
+ style: StyleDirective = gen_cfg["style"]
368
+ gesture_tag = state.get("gesture_tag")
369
+ air_written_text = state.get("air_written_text")
370
+ turnaround_triggered = state.get("turnaround_triggered", False)
371
+ rejected_response: str | None = None
372
+ if turnaround_triggered:
373
+ rejected_response = state.get("selected_response")
374
+ rejected_candidates: list[str] = list(state.get("rejected_candidates") or [])
375
+ intent_kind = classify_intent_kind(state.get("intent_route"))
376
+ max_tokens = gen_cfg.get("max_tokens", settings.max_tokens_neutral)
377
+
378
+ # Turnaround rephrases are single-shot; everything else fans out. Present-
379
+ # state varies affect (good/fine/rough), memory questions vary chunks
380
+ # (broad/focused/serendipitous).
381
+ single_shot = turnaround_triggered
382
+ is_present_state = intent_kind == "present_state"
383
+ if single_shot:
384
+ strategies_cfg: list[tuple[str, str | None]] = [("focused", None)]
385
+ elif is_present_state:
386
+ strategies_cfg = list(_PRESENT_STATE_STRATEGIES)
387
+ else:
388
+ strategies_cfg = [
389
+ ("broad", None),
390
+ ("focused", None),
391
+ ("serendipitous", None),
392
+ ]
393
+
394
+ base_temp = 1.0 if (rejected_candidates or is_present_state) else 0.7
395
+
396
+ def _gen_one(cfg: tuple[str, str | None]) -> Candidate:
397
+ strategy, affect_override = cfg
398
+ if is_present_state:
399
+ strategy_chunks: list[dict] = []
400
+ else:
401
+ strategy_chunks = _pick_strategy_chunks(list(chunks), strategy)
402
+ effective_affect = affect_override if affect_override is not None else affect
403
+ messages = _build_messages(
404
+ profile,
405
+ strategy_chunks,
406
+ history,
407
+ state["raw_query"],
408
+ style,
409
+ gen_cfg,
410
+ gesture_tag=gesture_tag,
411
+ air_written_text=air_written_text,
412
+ rejected_response=rejected_response,
413
+ rejected_candidates=rejected_candidates,
414
+ intent_kind=intent_kind,
415
+ affect=effective_affect,
416
+ )
417
+ text = chat_complete(
418
+ messages=messages,
419
+ max_tokens=max_tokens,
420
+ temperature=base_temp,
421
+ tier=tier,
422
+ )
423
+ guard = check_output(text, strategy_chunks)
424
+ if not guard["passed"]:
425
+ text = guard["fallback"]
426
+ return Candidate(
427
+ text=text,
428
+ strategy=strategy,
429
+ grounded_buckets=[c.get("bucket", "") for c in strategy_chunks],
430
+ )
431
+
432
+ if len(strategies_cfg) == 1:
433
+ candidates = [_gen_one(strategies_cfg[0])]
434
+ else:
435
+ with concurrent.futures.ThreadPoolExecutor(
436
+ max_workers=len(strategies_cfg)
437
+ ) as pool:
438
+ candidates = list(pool.map(_gen_one, strategies_cfg))
439
+
440
+ # Side-index hit: if the user has picked a similar query before, surface the
441
+ # previously-picked text as an extra candidate. Not generated by the LLM;
442
+ # skipped on single-shot (turnaround/present-state) so rephrases always
443
+ # produce fresh text.
444
+ if not single_shot and not is_present_state:
445
+ try:
446
+ hit = pick_index.lookup(
447
+ query=state["raw_query"],
448
+ user_id=state["user_id"],
449
+ threshold=0.85,
450
+ )
451
+ except Exception as exc:
452
+ print(f"[planner] pick_index lookup failed: {exc!r}")
453
+ hit = None
454
+ if hit:
455
+ text = hit.get("picked_text", "").strip()
456
+ if text and text.lower() not in {
457
+ c["text"].strip().lower() for c in candidates
458
+ }:
459
+ candidates.insert(
460
+ 0,
461
+ Candidate(
462
+ text=text,
463
+ strategy="side_index",
464
+ grounded_buckets=[],
465
+ ),
466
+ )
467
+
468
+ # De-dupe by normalised text — if two strategies produced the same response,
469
+ # keep the first. Also exclude anything the user already rejected this turn.
470
+ # Don't retry; latency budget matters more than N=3 on the dot.
471
+ seen: set[str] = {r.strip().lower() for r in rejected_candidates if r}
472
+ uniq: list[Candidate] = []
473
+ for c in candidates:
474
+ key = c["text"].strip().lower()
475
+ if key and key not in seen:
476
+ seen.add(key)
477
+ uniq.append(c)
478
+ if not uniq:
479
+ uniq = candidates[:1] # every guardrail rejected — fall back to the first
480
+
481
+ selected = uniq[0]["text"]
482
 
483
  t_gen = time.perf_counter() - t0
484
  latency_log = dict(state.get("latency_log") or {})
 
491
  4,
492
  )
493
 
494
+ # Represent the default-candidate prompt in augmented_prompt for logging.
495
+ default_strategy_chunks = _pick_strategy_chunks(list(chunks), uniq[0]["strategy"])
496
+ default_messages = _build_messages(
497
+ profile,
498
+ default_strategy_chunks,
499
+ history,
500
+ state["raw_query"],
501
+ style,
502
+ gen_cfg,
503
+ gesture_tag=gesture_tag,
504
+ air_written_text=air_written_text,
505
+ rejected_response=rejected_response,
506
+ intent_kind=intent_kind,
507
+ affect=affect,
508
+ )
509
+ augmented_prompt = "\n\n".join(
510
+ f"[{m['role']}] {m['content']}" for m in default_messages
511
+ )
512
  return {
513
  "augmented_prompt": augmented_prompt,
514
+ "candidates": uniq,
515
  "selected_response": selected,
516
  "llm_tier_used": tier,
517
  "llm_model_used": active_model(tier),
518
  "latency_log": latency_log,
519
+ "guardrail_passed": True,
520
  }
521
 
522
 
 
542
  gesture_tag: str | None = None,
543
  air_written_text: str | None = None,
544
  rejected_response: str | None = None,
545
+ rejected_candidates: list[str] | None = None,
546
  intent_kind: str = "memory",
547
  affect: str = "NEUTRAL",
548
  ) -> list[dict]:
 
561
  air_written_text,
562
  profile["name"],
563
  rejected_response=rejected_response,
564
+ rejected_candidates=rejected_candidates,
565
  intent_kind=intent_kind,
566
  affect=affect,
567
  )
 
622
  persona_name: str,
623
  *,
624
  rejected_response: str | None = None,
625
+ rejected_candidates: list[str] | None = None,
626
  intent_kind: str = "memory",
627
  affect: str = "NEUTRAL",
628
  ) -> str:
629
  personal_chunks = [c for c in chunks if c.get("source", "personal") == "personal"]
630
  contextual_chunks = [c for c in chunks if c.get("source") == "contextual"]
631
  open_domain_chunks = [c for c in chunks if c.get("source") == "open_domain"]
632
+ prior_pick_chunks = [c for c in chunks if c.get("source") == "prior_pick"]
633
 
634
  memory_block = (
635
  "\n".join(
 
638
  )
639
  or " (no memories retrieved)"
640
  )
641
+ prior_pick_block = (
642
+ "\n".join(f" {c['text']}" for c in prior_pick_chunks)
643
+ if prior_pick_chunks
644
+ else ""
645
+ )
646
  contextual_block = (
647
  "\n".join(f" {c['text']}" for c in contextual_chunks)
648
  or " (nothing relevant from this session)"
 
700
  f"\nYour previous reply (which you need to replace, not repeat): "
701
  f'"{safe_rejected}"'
702
  )
703
+ if rejected_candidates:
704
+ safe_list = [
705
+ r.replace('"', "'").replace("\n", " ")[:300] for r in rejected_candidates
706
+ ][:10]
707
+ rejected_block = "\n".join(f' - "{r}"' for r in safe_list)
708
+ turnaround_line += (
709
+ f"\nThe user rejected these options you gave last time "
710
+ f"(do NOT re-use their wording or angle):\n{rejected_block}"
711
+ )
712
 
713
  if intent_kind == "present_state":
714
  affect_hint = _AFFECT_HINTS.get(affect, _AFFECT_HINTS["NEUTRAL"])
 
731
  - If the affect read is NEUTRAL or doesn't match what you'd say, it's better to say "I'm not sure" or "honestly, I don't really know right now" than to invent.
732
  - Do NOT use autobiographical facts (job, family, hobbies) unless the partner asked."""
733
 
734
+ prior_pick_section = (
735
+ f"\n\nWhen asked this kind of thing before, you answered like:\n{prior_pick_block}\n"
736
+ "Treat this as your own prior voice — re-use the phrasing if it still fits, "
737
+ "or stay in the same register if you'd answer slightly differently now."
738
+ if prior_pick_block
739
+ else ""
740
+ )
741
+
742
  return f"""\
743
+ {directive_block}{air_writing_block}{turnaround_line}{persona_instruction_line}{prior_pick_section}
744
 
745
  Personal memories:
746
  {memory_block}
backend/pipeline/nodes/retrieval.py CHANGED
@@ -8,10 +8,17 @@ import torch
8
  from backend.config.settings import settings
9
  from backend.pipeline.intent_kind import is_present_state_only
10
  from backend.pipeline.state import PipelineState, RetrievedChunk, SubIntent
 
11
  from backend.retrieval.contextual import retrieve_from_history
 
12
  from backend.retrieval.reranker import build_context_vector, mmr_rerank
13
  from backend.retrieval.vector_store import get_device, get_embedder, retrieve
14
 
 
 
 
 
 
15
  _OPEN_DOMAIN_STUB_TEXT = (
16
  "(no external knowledge source wired — answer from general knowledge)"
17
  )
@@ -22,9 +29,14 @@ def run_fast(state: PipelineState) -> dict:
22
  t0 = time.perf_counter()
23
  if is_present_state_only(state.get("intent_route")):
24
  return _build_return(state, [], "skipped_present_state", t0, 0.0)
 
 
25
  final_k = settings.retrieval_fast_k
26
  pool_k = settings.rerank_fast_pool_k
27
  chunks, t_rerank = _dispatch_all(state, pool_k=pool_k, final_k=final_k)
 
 
 
28
  return _build_return(state, chunks, "fast", t0, t_rerank)
29
 
30
 
@@ -33,12 +45,83 @@ def run_full(state: PipelineState) -> dict:
33
  t0 = time.perf_counter()
34
  if is_present_state_only(state.get("intent_route")):
35
  return _build_return(state, [], "skipped_present_state", t0, 0.0)
 
 
36
  final_k = settings.retrieval_rerank_k
37
  pool_k = settings.rerank_pool_k
38
  chunks, t_rerank = _dispatch_all(state, pool_k=pool_k, final_k=final_k)
 
 
 
39
  return _build_return(state, chunks, "full", t0, t_rerank)
40
 
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  def _dispatch_all(
43
  state: PipelineState, pool_k: int, final_k: int
44
  ) -> tuple[list[RetrievedChunk], float]:
 
8
  from backend.config.settings import settings
9
  from backend.pipeline.intent_kind import is_present_state_only
10
  from backend.pipeline.state import PipelineState, RetrievedChunk, SubIntent
11
+ from backend.retrieval import pick_index
12
  from backend.retrieval.contextual import retrieve_from_history
13
+ from backend.retrieval.priors import BUCKETS
14
  from backend.retrieval.reranker import build_context_vector, mmr_rerank
15
  from backend.retrieval.vector_store import get_device, get_embedder, retrieve
16
 
17
+ # Weight of the pick-history bucket prior, relative to the session bucket prior.
18
+ # 0.3 means: a user who always picks "family" over "medical" gets a noticeable
19
+ # but not overwhelming nudge — session-in-progress signals still dominate.
20
+ _PICK_PRIOR_WEIGHT = 0.3
21
+
22
  _OPEN_DOMAIN_STUB_TEXT = (
23
  "(no external knowledge source wired — answer from general knowledge)"
24
  )
 
29
  t0 = time.perf_counter()
30
  if is_present_state_only(state.get("intent_route")):
31
  return _build_return(state, [], "skipped_present_state", t0, 0.0)
32
+ session_priors = state.get("bucket_priors")
33
+ _blend_pick_history_into_priors(state)
34
  final_k = settings.retrieval_fast_k
35
  pool_k = settings.rerank_fast_pool_k
36
  chunks, t_rerank = _dispatch_all(state, pool_k=pool_k, final_k=final_k)
37
+ if session_priors is not None:
38
+ state["bucket_priors"] = session_priors # blend was transient
39
+ chunks = _prepend_prior_pick(state, chunks)
40
  return _build_return(state, chunks, "fast", t0, t_rerank)
41
 
42
 
 
45
  t0 = time.perf_counter()
46
  if is_present_state_only(state.get("intent_route")):
47
  return _build_return(state, [], "skipped_present_state", t0, 0.0)
48
+ session_priors = state.get("bucket_priors")
49
+ _blend_pick_history_into_priors(state)
50
  final_k = settings.retrieval_rerank_k
51
  pool_k = settings.rerank_pool_k
52
  chunks, t_rerank = _dispatch_all(state, pool_k=pool_k, final_k=final_k)
53
+ if session_priors is not None:
54
+ state["bucket_priors"] = session_priors # blend was transient
55
+ chunks = _prepend_prior_pick(state, chunks)
56
  return _build_return(state, chunks, "full", t0, t_rerank)
57
 
58
 
59
+ def _blend_pick_history_into_priors(state: PipelineState) -> None:
60
+ """Mix cumulative bucket-pick counts into this turn's bucket_priors.
61
+
62
+ Mutates state in-place. Session priors still dominate; pick history adds
63
+ a small, steady bias toward buckets the user has historically picked.
64
+ """
65
+ try:
66
+ counts = pick_index.bucket_pick_counts(state["user_id"])
67
+ except Exception as exc:
68
+ print(f"[retrieval] pick_index.bucket_pick_counts failed: {exc!r}")
69
+ return
70
+ if not counts:
71
+ return
72
+ total = sum(counts.values())
73
+ if total <= 0:
74
+ return
75
+ pick_dist = {b: counts.get(b, 0.0) / total for b in BUCKETS}
76
+ session_priors = state.get("bucket_priors") or {}
77
+ if not session_priors:
78
+ session_priors = {b: 1.0 / len(BUCKETS) for b in BUCKETS}
79
+ blended = {
80
+ b: (1 - _PICK_PRIOR_WEIGHT) * session_priors.get(b, 0.0)
81
+ + _PICK_PRIOR_WEIGHT * pick_dist[b]
82
+ for b in BUCKETS
83
+ }
84
+ s = sum(blended.values())
85
+ if s > 0:
86
+ blended = {b: v / s for b, v in blended.items()}
87
+ state["bucket_priors"] = blended
88
+
89
+
90
+ def _prepend_prior_pick(
91
+ state: PipelineState, chunks: list[RetrievedChunk]
92
+ ) -> list[RetrievedChunk]:
93
+ """On a side-index hit, surface the previously-picked text as a special
94
+ chunk the LLM sees in its grounding block. Not deduped against personal
95
+ chunks — the prior pick is phrased in the persona's voice and is useful
96
+ even when similar memories are present.
97
+ """
98
+ try:
99
+ hit = pick_index.lookup(
100
+ query=state["raw_query"], user_id=state["user_id"], threshold=0.85
101
+ )
102
+ except Exception as exc:
103
+ print(f"[retrieval] pick_index.lookup failed: {exc!r}")
104
+ return chunks
105
+ if not hit:
106
+ return chunks
107
+ text = (hit.get("picked_text") or "").strip()
108
+ if not text:
109
+ return chunks
110
+ # Avoid injecting an identical chunk twice (the side_index-strategy
111
+ # candidate in the planner handles that path separately).
112
+ if any(c.get("text") == text for c in chunks):
113
+ return chunks
114
+ prior = RetrievedChunk(
115
+ text=text,
116
+ bucket="prior_pick",
117
+ type="narrative",
118
+ user="",
119
+ score=float(hit.get("match_score", 0.0)),
120
+ source="prior_pick",
121
+ )
122
+ return [prior] + list(chunks)
123
+
124
+
125
  def _dispatch_all(
126
  state: PipelineState, pool_k: int, final_k: int
127
  ) -> tuple[list[RetrievedChunk], float]:
backend/pipeline/state.py CHANGED
@@ -75,6 +75,13 @@ class LatencyLog(TypedDict):
75
  # ── Main pipeline state ────────────────────────────────────────────────────────
76
 
77
 
 
 
 
 
 
 
 
78
  class PipelineState(TypedDict):
79
  # ── Session context (set at turn start, stable across nodes) ──────────────
80
  user_id: str
@@ -103,7 +110,8 @@ class PipelineState(TypedDict):
103
 
104
  # ── L4: Generation outputs ────────────────────────────────────────────────
105
  augmented_prompt: str | None
106
- candidates: list[str] # 2-3 candidate responses
 
107
  selected_response: str | None
108
  llm_tier_used: str # "primary" | "fallback"
109
  llm_model_used: str # actual model name (e.g. "gemma4:31b-cloud")
 
75
  # ── Main pipeline state ────────────────────────────────────────────────────────
76
 
77
 
78
+ class Candidate(TypedDict):
79
+ text: str
80
+ strategy: str # "broad" | "focused" | "serendipitous" | "side_index"
81
+ # chunks fed as the primary grounding for this candidate (bucket/type only)
82
+ grounded_buckets: list[str]
83
+
84
+
85
  class PipelineState(TypedDict):
86
  # ── Session context (set at turn start, stable across nodes) ──────────────
87
  user_id: str
 
110
 
111
  # ── L4: Generation outputs ────────────────────────────────────────────────
112
  augmented_prompt: str | None
113
+ candidates: list[Candidate] # 2-3 candidate responses w/ strategy metadata
114
+ rejected_candidates: list[str] # texts the user already dismissed this turn
115
  selected_response: str | None
116
  llm_tier_used: str # "primary" | "fallback"
117
  llm_model_used: str # actual model name (e.g. "gemma4:31b-cloud")
backend/retrieval/pick_index.py ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import threading
3
+ import time
4
+ from pathlib import Path
5
+
6
+ import torch
7
+
8
+ from backend.config.settings import settings
9
+ from backend.retrieval.vector_store import get_device, get_embedder
10
+
11
+
12
+ def _store_path(user_id: str) -> Path:
13
+ return settings.data_dir / "pick_index" / user_id
14
+
15
+
16
+ # Guards the module-level cache. `add()` and `lookup()` can be called
17
+ # concurrently from an SSE handler and the /chat/pick POST — without this,
18
+ # a concurrent add mid-lookup could see a partially-built tensor.
19
+ _cache_lock = threading.RLock()
20
+ _cache: dict[str, tuple[torch.Tensor, list[dict]]] = {}
21
+
22
+
23
+ def _load(user_id: str) -> tuple[torch.Tensor, list[dict]]:
24
+ with _cache_lock:
25
+ if user_id in _cache:
26
+ return _cache[user_id]
27
+ p = _store_path(user_id)
28
+ if not (p / "vectors.pt").exists():
29
+ empty = torch.empty((0, 0), device=get_device())
30
+ _cache[user_id] = (empty, [])
31
+ return _cache[user_id]
32
+ vecs = torch.load(
33
+ p / "vectors.pt", map_location=get_device(), weights_only=True
34
+ )
35
+ with open(p / "entries.json") as f:
36
+ entries = json.load(f)
37
+ _cache[user_id] = (vecs, entries)
38
+ return _cache[user_id]
39
+
40
+
41
+ def _persist(user_id: str, vecs: torch.Tensor, entries: list[dict]) -> None:
42
+ p = _store_path(user_id)
43
+ p.mkdir(parents=True, exist_ok=True)
44
+ torch.save(vecs.detach().cpu(), p / "vectors.pt")
45
+ with open(p / "entries.json", "w") as f:
46
+ json.dump(entries, f, indent=2)
47
+
48
+
49
+ def lookup(query: str, user_id: str, threshold: float = 0.85) -> dict | None:
50
+ # Snapshot the (vecs, entries) tuple under the lock so a concurrent add()
51
+ # can't swap it out mid-search. Read-only work on the snapshot is safe.
52
+ with _cache_lock:
53
+ vecs, entries = _load(user_id)
54
+ if vecs.numel() == 0 or not entries:
55
+ return None
56
+ embedder = get_embedder()
57
+ q = embedder.encode(
58
+ [query],
59
+ convert_to_tensor=True,
60
+ normalize_embeddings=True,
61
+ device=get_device(),
62
+ )[0]
63
+ scores = vecs @ q
64
+ top_score, top_idx = torch.max(scores, dim=0)
65
+ score = float(top_score)
66
+ if score < threshold:
67
+ return None
68
+ hit = dict(entries[int(top_idx)])
69
+ hit["match_score"] = score
70
+ return hit
71
+
72
+
73
+ def add(
74
+ query: str,
75
+ user_id: str,
76
+ strategy: str,
77
+ picked_text: str,
78
+ picked_buckets: list[str] | None = None,
79
+ ) -> None:
80
+ embedder = get_embedder()
81
+ q = embedder.encode(
82
+ [query],
83
+ convert_to_tensor=True,
84
+ normalize_embeddings=True,
85
+ device=get_device(),
86
+ ) # (1, D)
87
+ # The whole read-modify-write is locked so two concurrent adds can't
88
+ # both read the same `vecs`, each concat their own vector, and clobber
89
+ # each other on writeback.
90
+ with _cache_lock:
91
+ vecs, entries = _load(user_id)
92
+ new_vecs = q if vecs.numel() == 0 else torch.cat([vecs, q], dim=0)
93
+ new_entries = list(entries) + [
94
+ {
95
+ "query": query,
96
+ "strategy": strategy,
97
+ "picked_text": picked_text,
98
+ "picked_buckets": picked_buckets or [],
99
+ "ts": time.time(),
100
+ }
101
+ ]
102
+ _cache[user_id] = (new_vecs, new_entries)
103
+ _persist(user_id, new_vecs, new_entries)
104
+
105
+
106
+ def bucket_pick_counts(user_id: str) -> dict[str, float]:
107
+ """Cumulative pick counts per bucket for this user.
108
+
109
+ Each pick contributes 1.0 mass split evenly across the buckets grounding
110
+ the picked candidate. Used by retrieval to bias bucket priors toward
111
+ memories the user has historically preferred.
112
+ """
113
+ with _cache_lock:
114
+ _, entries = _load(user_id)
115
+ entries_snapshot = list(entries)
116
+ counts: dict[str, float] = {}
117
+ for e in entries_snapshot:
118
+ buckets = [b for b in (e.get("picked_buckets") or []) if b]
119
+ if not buckets:
120
+ continue
121
+ share = 1.0 / len(buckets)
122
+ for b in buckets:
123
+ counts[b] = counts.get(b, 0.0) + share
124
+ return counts
frontend/src/App.css CHANGED
@@ -410,6 +410,110 @@ input[type="text"]:hover {
410
  color: #ffffff;
411
  }
412
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
413
  .turnaround-btn {
414
  background: transparent !important;
415
  color: var(--accent) !important;
 
410
  color: #ffffff;
411
  }
412
 
413
+ .badge-picker {
414
+ background: rgba(0, 0, 0, 0.08);
415
+ color: var(--text);
416
+ }
417
+
418
+ .badge-picked {
419
+ background: rgba(46, 160, 67, 0.18);
420
+ color: #2ea043;
421
+ }
422
+
423
+ .chat-bubble.picker {
424
+ background: transparent;
425
+ padding: 4px 0 0 0;
426
+ max-width: 85%;
427
+ }
428
+
429
+ .candidate-list {
430
+ display: flex;
431
+ flex-direction: column;
432
+ gap: 6px;
433
+ margin-top: 6px;
434
+ }
435
+
436
+ .candidate-card {
437
+ text-align: left;
438
+ background: var(--surface);
439
+ color: var(--text);
440
+ border: 1px solid rgba(0, 0, 0, 0.12);
441
+ border-radius: 10px;
442
+ padding: 10px 12px;
443
+ cursor: pointer;
444
+ transition: border-color 120ms ease, background 120ms ease, transform 80ms ease;
445
+ }
446
+
447
+ .candidate-card:hover {
448
+ border-color: var(--accent);
449
+ background: rgba(59, 130, 246, 0.05);
450
+ }
451
+
452
+ .candidate-card:active {
453
+ transform: scale(0.995);
454
+ }
455
+
456
+ .candidate-strategy {
457
+ font-size: 11px;
458
+ color: rgba(0, 0, 0, 0.55);
459
+ margin-bottom: 4px;
460
+ text-transform: lowercase;
461
+ letter-spacing: 0.02em;
462
+ }
463
+
464
+ .candidate-text {
465
+ font-size: 14px;
466
+ line-height: 1.4;
467
+ }
468
+
469
+ .candidate-list.rejected-round {
470
+ opacity: 0.55;
471
+ margin-bottom: 4px;
472
+ }
473
+
474
+ .rejected-round-label {
475
+ font-size: 10px;
476
+ color: rgba(0, 0, 0, 0.45);
477
+ text-transform: uppercase;
478
+ letter-spacing: 0.05em;
479
+ margin-bottom: 2px;
480
+ }
481
+
482
+ .candidate-card.rejected {
483
+ cursor: default;
484
+ background: rgba(0, 0, 0, 0.03);
485
+ border-color: rgba(0, 0, 0, 0.08);
486
+ }
487
+
488
+ .candidate-card.rejected .candidate-text {
489
+ text-decoration: line-through;
490
+ color: rgba(0, 0, 0, 0.55);
491
+ }
492
+
493
+ .candidate-card.rejected:hover {
494
+ border-color: rgba(0, 0, 0, 0.08);
495
+ background: rgba(0, 0, 0, 0.03);
496
+ }
497
+
498
+ .candidate-card.try-again {
499
+ border-style: dashed;
500
+ border-color: var(--accent);
501
+ background: rgba(59, 130, 246, 0.03);
502
+ }
503
+
504
+ .candidate-card.try-again .candidate-strategy {
505
+ color: var(--accent);
506
+ }
507
+
508
+ .candidate-card.try-again:hover:not(:disabled) {
509
+ background: rgba(59, 130, 246, 0.09);
510
+ }
511
+
512
+ .candidate-card:disabled {
513
+ opacity: 0.5;
514
+ cursor: wait;
515
+ }
516
+
517
  .turnaround-btn {
518
  background: transparent !important;
519
  color: var(--accent) !important;
frontend/src/components/ChatPanel.tsx CHANGED
@@ -1,8 +1,30 @@
1
  import { useState, useRef, useEffect, useCallback } from "react";
2
- import type { ChatMessage, SensingState, Affect, LatencyLog } from "../types";
3
- import { sendChat, sendTurnaround } from "../lib/api";
 
 
 
 
 
 
 
 
 
 
 
4
  import { EvalPanel } from "./EvalPanel";
5
 
 
 
 
 
 
 
 
 
 
 
 
6
  interface Props {
7
  userId: string | null;
8
  personaName: string;
@@ -18,6 +40,83 @@ interface Props {
18
 
19
  const TURNAROUND_WINDOW_MS = 5000;
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  export function ChatPanel({
22
  userId,
23
  personaName,
@@ -33,6 +132,8 @@ export function ChatPanel({
33
  const [input, setInput] = useState("");
34
  const [loading, setLoading] = useState(false);
35
  const [turnaroundLoading, setTurnaroundLoading] = useState(false);
 
 
36
  const bottomRef = useRef<HTMLDivElement>(null);
37
  const lastResponseTsRef = useRef<number>(0);
38
  const lastTurnIdRef = useRef<number | null>(null);
@@ -76,7 +177,7 @@ export function ChatPanel({
76
  const next = [...prev];
77
  for (let i = next.length - 1; i >= 0; i--) {
78
  if (next[i].role === "aac_user" && !next[i].isTurnaround) {
79
- next[i] = { ...next[i], rephrased: true };
80
  break;
81
  }
82
  }
@@ -89,6 +190,8 @@ export function ChatPanel({
89
  turnId: res.turn_id,
90
  evalScores: res.eval_scores ?? null,
91
  isTurnaround: true,
 
 
92
  });
93
  return next;
94
  });
@@ -123,6 +226,120 @@ export function ChatPanel({
123
  ]
124
  );
125
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
126
  useEffect(() => {
127
  if (
128
  sensing.headSignal !== "HEAD_NOD_DISSATISFIED" &&
@@ -130,6 +347,23 @@ export function ChatPanel({
130
  ) {
131
  return;
132
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
  const targetTurnId = lastTurnIdRef.current;
134
  const eligible =
135
  targetTurnId !== null &&
@@ -145,63 +379,156 @@ export function ChatPanel({
145
  // detection fired, then clear it. (Instant clear made detection invisible.)
146
  const id = window.setTimeout(() => onHeadSignalConsumed(), 1500);
147
  return () => window.clearTimeout(id);
148
- }, [sensing.headSignal, handleTurnaround, onHeadSignalConsumed]);
 
 
 
 
 
 
149
 
150
  async function handleSend() {
151
  if (!input.trim() || !userId || !backendReady || loading) return;
152
 
153
  const query = input.trim();
154
  setInput("");
155
- setMessages((prev) => [...prev, { role: "partner", content: query }]);
156
  setLoading(true);
157
 
158
  const airText = sensing.airWrittenText || null;
159
- try {
160
- const res = await sendChat({
161
- user_id: userId,
162
- query,
163
- affect_override: affectOverride ?? sensing.affect,
164
- gesture_tag: sensing.gestureTag,
165
- gaze_bucket: sensing.gazeBucket,
166
- air_written_text: airText,
167
- head_signal: sensing.headSignal,
168
- });
169
-
170
- lastTurnIdRef.current = res.turn_id;
171
- setMessages((prev) => [
172
  ...prev,
 
173
  {
174
- role: "aac_user",
175
- content: res.response,
176
- latency: res.latency,
177
- affect: res.affect,
178
- runId: res.run_id,
179
- turnId: res.turn_id,
180
- evalScores: res.eval_scores ?? null,
181
  },
182
- ]);
183
- onLatency(res.latency);
184
- lastResponseTsRef.current = performance.now();
185
- } catch (e) {
186
- setMessages((prev) => [
187
- ...prev,
 
 
 
 
 
 
 
 
 
188
  {
189
- role: "aac_user",
190
- content: `Error: ${e instanceof Error ? e.message : "request failed"}`,
 
 
 
 
 
191
  },
192
- ]);
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
193
  } finally {
194
  if (airText) onAirTextConsumed();
195
  setLoading(false);
196
  }
197
  }
198
 
199
- const canTurnaround =
200
- !!userId &&
201
- backendReady &&
202
- !loading &&
203
- !turnaroundLoading &&
204
- lastTurnIdRef.current !== null;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
205
 
206
  return (
207
  <div className="chat-panel">
@@ -209,39 +536,107 @@ export function ChatPanel({
209
  Talking as: {personaName || "select a persona"}
210
  </div>
211
  <div className="chat-messages">
212
- {messages.map((msg, i) => (
213
- <div
214
- key={i}
215
- className={`chat-bubble ${msg.role}${
216
- msg.rephrased ? " rephrased" : ""
217
- }${msg.isTurnaround ? " turnaround" : ""}`}
218
- >
219
- <span className="chat-role">
220
- {msg.role === "partner" ? "Partner" : "AAC User"}
221
- {msg.rephrased && (
222
- <span className="badge badge-rephrased"> rephrased</span>
223
- )}
224
- {msg.isTurnaround && (
225
- <span className="badge badge-turnaround"> ↻ turnaround</span>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
226
  )}
227
- </span>
228
- <p>{msg.content}</p>
229
- {msg.role === "aac_user" && msg.runId && userId && (
230
- <EvalPanel
231
- runId={msg.runId}
232
- userId={userId}
233
- latencyTotal={msg.latency?.t_total ?? 0}
234
- evalScores={msg.evalScores ?? null}
235
- />
236
- )}
237
- </div>
238
- ))}
239
- {loading && (
240
- <div className="chat-bubble aac_user loading">
241
- <span className="chat-role">AAC User</span>
242
- <p>Generating...</p>
243
- </div>
244
- )}
245
  {turnaroundLoading && (
246
  <div className="chat-bubble aac_user loading">
247
  <span className="chat-role">AAC User</span>
@@ -262,15 +657,6 @@ export function ChatPanel({
262
  <button onClick={handleSend} disabled={!userId || loading || !backendReady || !input.trim()}>
263
  Send
264
  </button>
265
- <button
266
- type="button"
267
- className="turnaround-btn"
268
- onClick={() => handleTurnaround("manual")}
269
- disabled={!canTurnaround}
270
- title="Re-plan the last response (also triggered by a head shake / sharp nod)"
271
- >
272
- ↻ Not quite right
273
- </button>
274
  </div>
275
  </div>
276
  );
 
1
  import { useState, useRef, useEffect, useCallback } from "react";
2
+ import type {
3
+ Affect,
4
+ Candidate,
5
+ ChatMessage,
6
+ LatencyLog,
7
+ SensingState,
8
+ } from "../types";
9
+ import {
10
+ sendPick,
11
+ sendTurnaround,
12
+ streamChat,
13
+ streamRegenerate,
14
+ } from "../lib/api";
15
  import { EvalPanel } from "./EvalPanel";
16
 
17
+ const STRATEGY_LABELS: Record<string, string> = {
18
+ broad: "broad — all memories",
19
+ focused: "focused — top memory",
20
+ serendipitous: "serendipitous — other memory",
21
+ side_index: "like last time",
22
+ present_good: "feeling good",
23
+ present_fine: "doing okay",
24
+ present_rough: "not great",
25
+ pending: "",
26
+ };
27
+
28
  interface Props {
29
  userId: string | null;
30
  personaName: string;
 
40
 
41
  const TURNAROUND_WINDOW_MS = 5000;
42
 
43
+ // Batches token deltas per (msgIdx, candIdx) and flushes them in a single
44
+ // setState call per animation frame. Streaming tokens at 30-60/s × 3 candidates
45
+ // otherwise causes a rerender per token. Non-token events (start/done/complete)
46
+ // flush the pending deltas first to preserve ordering.
47
+ //
48
+ // INVARIANT: keys are message indices into the messages[] array. Callers must
49
+ // ensure no message is inserted *before* a streaming message for the duration
50
+ // of its stream — appending to the end is fine, mid-list insert is not. Today
51
+ // every path appends to the end; if that changes, switch to a stable message
52
+ // id (e.g. the placeholder's runId or a freshly-minted uuid).
53
+ function useTokenBatcher(
54
+ setMessages: React.Dispatch<React.SetStateAction<ChatMessage[]>>,
55
+ ) {
56
+ // Lazy-init refs to avoid allocating a fresh Map on every render.
57
+ const pending = useRef<Map<number, Map<number, string>> | null>(null);
58
+ if (pending.current === null) pending.current = new Map();
59
+ const rafId = useRef<number | null>(null);
60
+
61
+ const flush = useCallback(() => {
62
+ rafId.current = null;
63
+ const batch = pending.current;
64
+ if (!batch || batch.size === 0) return;
65
+ pending.current = new Map();
66
+ setMessages((prev) =>
67
+ prev.map((m, i) => {
68
+ const perCand = batch.get(i);
69
+ if (!perCand) return m;
70
+ const cands = [...(m.candidates ?? [])];
71
+ for (const [ci, delta] of perCand) {
72
+ if (cands[ci]) {
73
+ cands[ci] = { ...cands[ci], text: cands[ci].text + delta };
74
+ }
75
+ }
76
+ return { ...m, candidates: cands };
77
+ }),
78
+ );
79
+ }, [setMessages]);
80
+
81
+ const queueToken = useCallback(
82
+ (msgIdx: number, candIdx: number, delta: string) => {
83
+ const batch = pending.current!;
84
+ let perMsg = batch.get(msgIdx);
85
+ if (!perMsg) {
86
+ perMsg = new Map();
87
+ batch.set(msgIdx, perMsg);
88
+ }
89
+ perMsg.set(candIdx, (perMsg.get(candIdx) ?? "") + delta);
90
+ if (rafId.current === null) {
91
+ rafId.current = window.requestAnimationFrame(flush);
92
+ }
93
+ },
94
+ [flush],
95
+ );
96
+
97
+ const flushNow = useCallback(() => {
98
+ if (rafId.current !== null) {
99
+ window.cancelAnimationFrame(rafId.current);
100
+ rafId.current = null;
101
+ }
102
+ flush();
103
+ }, [flush]);
104
+
105
+ // Cancel any pending rAF on unmount — otherwise a persona switch mid-stream
106
+ // leaves a scheduled flush that calls setMessages against the new state.
107
+ useEffect(() => {
108
+ return () => {
109
+ if (rafId.current !== null) {
110
+ window.cancelAnimationFrame(rafId.current);
111
+ rafId.current = null;
112
+ }
113
+ pending.current = null;
114
+ };
115
+ }, []);
116
+
117
+ return { queueToken, flushNow };
118
+ }
119
+
120
  export function ChatPanel({
121
  userId,
122
  personaName,
 
132
  const [input, setInput] = useState("");
133
  const [loading, setLoading] = useState(false);
134
  const [turnaroundLoading, setTurnaroundLoading] = useState(false);
135
+ const [regenerateLoading, setRegenerateLoading] = useState(false);
136
+ const { queueToken, flushNow } = useTokenBatcher(setMessages);
137
  const bottomRef = useRef<HTMLDivElement>(null);
138
  const lastResponseTsRef = useRef<number>(0);
139
  const lastTurnIdRef = useRef<number | null>(null);
 
177
  const next = [...prev];
178
  for (let i = next.length - 1; i >= 0; i--) {
179
  if (next[i].role === "aac_user" && !next[i].isTurnaround) {
180
+ next[i] = { ...next[i], rephrased: true, picked: true };
181
  break;
182
  }
183
  }
 
190
  turnId: res.turn_id,
191
  evalScores: res.eval_scores ?? null,
192
  isTurnaround: true,
193
+ candidates: res.candidates ?? [],
194
+ picked: true,
195
  });
196
  return next;
197
  });
 
226
  ]
227
  );
228
 
229
+ const handleRegenerate = useCallback(
230
+ async (msgIdx: number) => {
231
+ if (!userId || !backendReady || regenerateLoading || loading) return;
232
+ const msg = messages[msgIdx];
233
+ if (!msg || !msg.candidates || msg.picked || msg.turnId === undefined) return;
234
+
235
+ const currentRound = msg.candidates;
236
+ const priorRounds = msg.rejectedRounds ?? [];
237
+ const rejected_texts = [
238
+ ...priorRounds.flat().map((c) => c.text),
239
+ ...currentRound.map((c) => c.text),
240
+ ];
241
+
242
+ setRegenerateLoading(true);
243
+
244
+ // Move the current round into rejectedRounds + clear candidates so the
245
+ // UI shows empty-card placeholders while streams fill in.
246
+ setMessages((prev) =>
247
+ prev.map((m, i) =>
248
+ i === msgIdx
249
+ ? {
250
+ ...m,
251
+ candidates: [],
252
+ rejectedRounds: [...priorRounds, currentRound],
253
+ picked: false,
254
+ }
255
+ : m,
256
+ ),
257
+ );
258
+
259
+ const updateMsg = (
260
+ updater: (m: ChatMessage) => ChatMessage,
261
+ ) => {
262
+ setMessages((prev) =>
263
+ prev.map((m, i) => (i === msgIdx ? updater(m) : m)),
264
+ );
265
+ };
266
+
267
+ try {
268
+ await streamRegenerate(
269
+ {
270
+ user_id: userId,
271
+ turn_id: msg.turnId,
272
+ rejected_texts,
273
+ },
274
+ (evt) => {
275
+ if (evt.type === "token") {
276
+ queueToken(msgIdx, evt.idx, evt.delta);
277
+ return;
278
+ }
279
+ flushNow();
280
+ if (evt.type === "candidate_start") {
281
+ updateMsg((m) => {
282
+ const cands = [...(m.candidates ?? [])];
283
+ while (cands.length <= evt.idx) {
284
+ cands.push({
285
+ text: "",
286
+ strategy: "pending",
287
+ grounded_buckets: [],
288
+ });
289
+ }
290
+ cands[evt.idx] = {
291
+ text: "",
292
+ strategy: evt.strategy,
293
+ grounded_buckets: evt.grounded_buckets,
294
+ };
295
+ return { ...m, candidates: cands };
296
+ });
297
+ } else if (evt.type === "candidate_done") {
298
+ updateMsg((m) => {
299
+ const cands = [...(m.candidates ?? [])];
300
+ if (cands[evt.idx]) {
301
+ cands[evt.idx] = { ...cands[evt.idx], text: evt.text };
302
+ }
303
+ return { ...m, candidates: cands };
304
+ });
305
+ } else if (evt.type === "complete") {
306
+ const res = evt.response;
307
+ lastTurnIdRef.current = res.turn_id;
308
+ updateMsg((m) => ({
309
+ ...m,
310
+ content: res.response,
311
+ latency: res.latency,
312
+ affect: res.affect,
313
+ runId: res.run_id,
314
+ turnId: res.turn_id,
315
+ evalScores: res.eval_scores ?? null,
316
+ candidates: res.candidates ?? m.candidates ?? [],
317
+ picked: false,
318
+ }));
319
+ onLatency(res.latency);
320
+ }
321
+ },
322
+ );
323
+ } catch (e) {
324
+ flushNow();
325
+ console.warn("streamRegenerate failed", e);
326
+ } finally {
327
+ setRegenerateLoading(false);
328
+ }
329
+ },
330
+ [
331
+ userId,
332
+ backendReady,
333
+ regenerateLoading,
334
+ loading,
335
+ messages,
336
+ setMessages,
337
+ queueToken,
338
+ flushNow,
339
+ onLatency,
340
+ ]
341
+ );
342
+
343
  useEffect(() => {
344
  if (
345
  sensing.headSignal !== "HEAD_NOD_DISSATISFIED" &&
 
347
  ) {
348
  return;
349
  }
350
+
351
+ // If the most recent AAC message has an open picker, head-signal means
352
+ // "regenerate" — the user hasn't committed, so there's nothing to
353
+ // "rephrase" yet.
354
+ let openPickerIdx = -1;
355
+ for (let i = messages.length - 1; i >= 0; i--) {
356
+ const m = messages[i];
357
+ if (m.role !== "aac_user") continue;
358
+ if (!m.picked && (m.candidates?.length ?? 0) > 1) openPickerIdx = i;
359
+ break;
360
+ }
361
+ if (openPickerIdx !== -1) {
362
+ handleRegenerate(openPickerIdx);
363
+ onHeadSignalConsumed();
364
+ return;
365
+ }
366
+
367
  const targetTurnId = lastTurnIdRef.current;
368
  const eligible =
369
  targetTurnId !== null &&
 
379
  // detection fired, then clear it. (Instant clear made detection invisible.)
380
  const id = window.setTimeout(() => onHeadSignalConsumed(), 1500);
381
  return () => window.clearTimeout(id);
382
+ }, [
383
+ sensing.headSignal,
384
+ handleTurnaround,
385
+ handleRegenerate,
386
+ onHeadSignalConsumed,
387
+ messages,
388
+ ]);
389
 
390
  async function handleSend() {
391
  if (!input.trim() || !userId || !backendReady || loading) return;
392
 
393
  const query = input.trim();
394
  setInput("");
 
395
  setLoading(true);
396
 
397
  const airText = sensing.airWrittenText || null;
398
+
399
+ // Push the partner bubble, and a placeholder AAC message we'll fill in
400
+ // progressively. We need the placeholder's index to target updates — use
401
+ // a ref captured from the setter so we don't rely on stale state.
402
+ let placeholderIdx = -1;
403
+ setMessages((prev) => {
404
+ const next = [
 
 
 
 
 
 
405
  ...prev,
406
+ { role: "partner" as const, content: query },
407
  {
408
+ role: "aac_user" as const,
409
+ content: "",
410
+ candidates: [] as Candidate[],
411
+ picked: false,
 
 
 
412
  },
413
+ ];
414
+ placeholderIdx = next.length - 1;
415
+ return next;
416
+ });
417
+
418
+ const updatePlaceholder = (
419
+ updater: (m: ChatMessage) => ChatMessage,
420
+ ) => {
421
+ setMessages((prev) =>
422
+ prev.map((m, i) => (i === placeholderIdx ? updater(m) : m)),
423
+ );
424
+ };
425
+
426
+ try {
427
+ await streamChat(
428
  {
429
+ user_id: userId,
430
+ query,
431
+ affect_override: affectOverride ?? sensing.affect,
432
+ gesture_tag: sensing.gestureTag,
433
+ gaze_bucket: sensing.gazeBucket,
434
+ air_written_text: airText,
435
+ head_signal: sensing.headSignal,
436
  },
437
+ (evt) => {
438
+ if (evt.type === "token") {
439
+ queueToken(placeholderIdx, evt.idx, evt.delta);
440
+ return;
441
+ }
442
+ // Any non-token event must see the latest text — flush the queue first.
443
+ flushNow();
444
+ if (evt.type === "candidate_start") {
445
+ updatePlaceholder((m) => {
446
+ const cands = [...(m.candidates ?? [])];
447
+ while (cands.length <= evt.idx) {
448
+ cands.push({
449
+ text: "",
450
+ strategy: "pending",
451
+ grounded_buckets: [],
452
+ });
453
+ }
454
+ cands[evt.idx] = {
455
+ text: "",
456
+ strategy: evt.strategy,
457
+ grounded_buckets: evt.grounded_buckets,
458
+ };
459
+ return { ...m, candidates: cands };
460
+ });
461
+ } else if (evt.type === "candidate_done") {
462
+ updatePlaceholder((m) => {
463
+ const cands = [...(m.candidates ?? [])];
464
+ if (cands[evt.idx]) {
465
+ cands[evt.idx] = { ...cands[evt.idx], text: evt.text };
466
+ }
467
+ return { ...m, candidates: cands };
468
+ });
469
+ } else if (evt.type === "complete") {
470
+ const res = evt.response;
471
+ lastTurnIdRef.current = res.turn_id;
472
+ updatePlaceholder((m) => ({
473
+ ...m,
474
+ content: res.response,
475
+ latency: res.latency,
476
+ affect: res.affect,
477
+ runId: res.run_id,
478
+ turnId: res.turn_id,
479
+ evalScores: res.eval_scores ?? null,
480
+ candidates: res.candidates ?? m.candidates ?? [],
481
+ picked: (res.candidates ?? []).length <= 1,
482
+ }));
483
+ onLatency(res.latency);
484
+ lastResponseTsRef.current = performance.now();
485
+ }
486
+ },
487
+ );
488
+ } catch (e) {
489
+ flushNow();
490
+ updatePlaceholder((m) => ({
491
+ ...m,
492
+ content: `Error: ${e instanceof Error ? e.message : "request failed"}`,
493
+ }));
494
  } finally {
495
  if (airText) onAirTextConsumed();
496
  setLoading(false);
497
  }
498
  }
499
 
500
+ const handlePick = useCallback(
501
+ async (msgIdx: number, candIdx: number) => {
502
+ const msg = messages[msgIdx];
503
+ if (!msg || !msg.candidates || !msg.runId || !userId) return;
504
+ if (msg.picked) return;
505
+ const picked = msg.candidates[candIdx];
506
+ if (!picked) return;
507
+
508
+ setMessages((prev) =>
509
+ prev.map((m, i) =>
510
+ i === msgIdx
511
+ ? {
512
+ ...m,
513
+ content: picked.text,
514
+ picked: true,
515
+ pickedIdx: candIdx,
516
+ }
517
+ : m
518
+ )
519
+ );
520
+ try {
521
+ await sendPick({
522
+ run_id: msg.runId,
523
+ user_id: userId,
524
+ picked_idx: candIdx,
525
+ });
526
+ } catch (e) {
527
+ console.warn("sendPick failed", e);
528
+ }
529
+ },
530
+ [messages, setMessages, userId]
531
+ );
532
 
533
  return (
534
  <div className="chat-panel">
 
536
  Talking as: {personaName || "select a persona"}
537
  </div>
538
  <div className="chat-messages">
539
+ {messages.map((msg, i) => {
540
+ const hasRegenerated = (msg.rejectedRounds?.length ?? 0) > 0;
541
+ const showPicker =
542
+ msg.role === "aac_user" &&
543
+ !msg.picked &&
544
+ !!msg.candidates &&
545
+ (msg.candidates.length > 1 || hasRegenerated);
546
+
547
+ if (showPicker) {
548
+ const priorRounds = msg.rejectedRounds ?? [];
549
+ return (
550
+ <div key={i} className="chat-bubble aac_user picker">
551
+ <span className="chat-role">
552
+ AAC User
553
+ <span className="badge badge-picker">
554
+ pick one ({msg.candidates!.length} options)
555
+ </span>
556
+ </span>
557
+ {priorRounds.map((round, ri) => (
558
+ <div key={`r${ri}`} className="candidate-list rejected-round">
559
+ <div className="rejected-round-label">
560
+ rejected round {ri + 1}
561
+ </div>
562
+ {round.map((cand, ci) => (
563
+ <div key={ci} className="candidate-card rejected">
564
+ <div className="candidate-strategy">
565
+ {STRATEGY_LABELS[cand.strategy] ?? cand.strategy}
566
+ </div>
567
+ <div className="candidate-text">{cand.text}</div>
568
+ </div>
569
+ ))}
570
+ </div>
571
+ ))}
572
+ <div className="candidate-list">
573
+ {msg.candidates!.map((cand, ci) => (
574
+ <button
575
+ key={ci}
576
+ type="button"
577
+ className="candidate-card"
578
+ onClick={() => handlePick(i, ci)}
579
+ disabled={regenerateLoading}
580
+ title="Click to send this one"
581
+ >
582
+ <div className="candidate-strategy">
583
+ {STRATEGY_LABELS[cand.strategy] ?? cand.strategy}
584
+ </div>
585
+ <div className="candidate-text">{cand.text}</div>
586
+ </button>
587
+ ))}
588
+ <button
589
+ type="button"
590
+ className="candidate-card try-again"
591
+ onClick={() => handleRegenerate(i)}
592
+ disabled={regenerateLoading}
593
+ title="None of these fit — generate fresh options"
594
+ >
595
+ <div className="candidate-strategy">try again</div>
596
+ <div className="candidate-text">
597
+ {regenerateLoading
598
+ ? "Regenerating…"
599
+ : "↻ None of these fit — try different angles"}
600
+ </div>
601
+ </button>
602
+ </div>
603
+ </div>
604
+ );
605
+ }
606
+
607
+ return (
608
+ <div
609
+ key={i}
610
+ className={`chat-bubble ${msg.role}${
611
+ msg.rephrased ? " rephrased" : ""
612
+ }${msg.isTurnaround ? " turnaround" : ""}`}
613
+ >
614
+ <span className="chat-role">
615
+ {msg.role === "partner" ? "Partner" : "AAC User"}
616
+ {msg.rephrased && (
617
+ <span className="badge badge-rephrased"> rephrased</span>
618
+ )}
619
+ {msg.isTurnaround && (
620
+ <span className="badge badge-turnaround"> ↻ turnaround</span>
621
+ )}
622
+ {msg.picked && msg.pickedIdx !== undefined && msg.candidates && msg.candidates[msg.pickedIdx] && (
623
+ <span className="badge badge-picked">
624
+ ✓ {STRATEGY_LABELS[msg.candidates[msg.pickedIdx].strategy] ?? msg.candidates[msg.pickedIdx].strategy}
625
+ </span>
626
+ )}
627
+ </span>
628
+ <p>{msg.content}</p>
629
+ {msg.role === "aac_user" && msg.runId && userId && (
630
+ <EvalPanel
631
+ runId={msg.runId}
632
+ userId={userId}
633
+ latencyTotal={msg.latency?.t_total ?? 0}
634
+ evalScores={msg.evalScores ?? null}
635
+ />
636
  )}
637
+ </div>
638
+ );
639
+ })}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
640
  {turnaroundLoading && (
641
  <div className="chat-bubble aac_user loading">
642
  <span className="chat-role">AAC User</span>
 
657
  <button onClick={handleSend} disabled={!userId || loading || !backendReady || !input.trim()}>
658
  Send
659
  </button>
 
 
 
 
 
 
 
 
 
660
  </div>
661
  </div>
662
  );
frontend/src/lib/api.ts CHANGED
@@ -44,6 +44,102 @@ export async function resetSession(userId: string): Promise<void> {
44
  if (!res.ok) throw new Error(`API error: ${res.status}`);
45
  }
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  export async function submitRating(args: {
48
  run_id: string;
49
  user_id: string;
 
44
  if (!res.ok) throw new Error(`API error: ${res.status}`);
45
  }
46
 
47
+ export type StreamEvent =
48
+ | { type: "candidate_start"; idx: number; strategy: string; grounded_buckets: string[] }
49
+ | { type: "token"; idx: number; delta: string }
50
+ | { type: "candidate_done"; idx: number; text: string }
51
+ | { type: "candidate_error"; idx: number; error: string }
52
+ | { type: "complete"; response: ChatResponse }
53
+ | { type: "error"; message: string };
54
+
55
+ async function readSSE(
56
+ res: Response,
57
+ onEvent: (evt: StreamEvent) => void,
58
+ ): Promise<void> {
59
+ if (!res.body) throw new Error("no response body");
60
+ const reader = res.body.getReader();
61
+ const decoder = new TextDecoder();
62
+ let buffer = "";
63
+
64
+ const emitFrame = (frame: string) => {
65
+ const line = frame.split("\n").find((l) => l.startsWith("data:"));
66
+ if (!line) return;
67
+ const json = line.slice(5).trim();
68
+ if (!json) return;
69
+ try {
70
+ onEvent(JSON.parse(json) as StreamEvent);
71
+ } catch (e) {
72
+ console.warn("SSE parse failed", e, json.slice(0, 200));
73
+ }
74
+ };
75
+
76
+ while (true) {
77
+ const { done, value } = await reader.read();
78
+ if (done) break;
79
+ buffer += decoder.decode(value, { stream: true });
80
+ // SSE frames are separated by blank lines.
81
+ const parts = buffer.split("\n\n");
82
+ buffer = parts.pop() ?? "";
83
+ for (const part of parts) emitFrame(part);
84
+ }
85
+ // Server closed cleanly but the final frame didn't end with \n\n —
86
+ // emit whatever remains so the terminal event isn't dropped.
87
+ if (buffer.trim()) emitFrame(buffer);
88
+ }
89
+
90
+ export async function streamChat(
91
+ req: ChatRequest,
92
+ onEvent: (evt: StreamEvent) => void,
93
+ ): Promise<void> {
94
+ const res = await fetch(`${API_BASE}/chat/stream`, {
95
+ method: "POST",
96
+ headers: { "Content-Type": "application/json" },
97
+ body: JSON.stringify(req),
98
+ });
99
+ if (!res.ok) throw new Error(`API error: ${res.status}`);
100
+ await readSSE(res, onEvent);
101
+ }
102
+
103
+ export async function streamRegenerate(
104
+ args: { user_id: string; turn_id: number; rejected_texts: string[] },
105
+ onEvent: (evt: StreamEvent) => void,
106
+ ): Promise<void> {
107
+ const res = await fetch(`${API_BASE}/chat/regenerate/stream`, {
108
+ method: "POST",
109
+ headers: { "Content-Type": "application/json" },
110
+ body: JSON.stringify(args),
111
+ });
112
+ if (!res.ok) throw new Error(`API error: ${res.status}`);
113
+ await readSSE(res, onEvent);
114
+ }
115
+
116
+ export async function sendRegenerate(args: {
117
+ user_id: string;
118
+ turn_id: number;
119
+ rejected_texts: string[];
120
+ }): Promise<ChatResponse> {
121
+ const res = await fetch(`${API_BASE}/chat/regenerate`, {
122
+ method: "POST",
123
+ headers: { "Content-Type": "application/json" },
124
+ body: JSON.stringify(args),
125
+ });
126
+ if (!res.ok) throw new Error(`API error: ${res.status}`);
127
+ return res.json();
128
+ }
129
+
130
+ export async function sendPick(args: {
131
+ run_id: string;
132
+ user_id: string;
133
+ picked_idx: number;
134
+ }): Promise<void> {
135
+ const res = await fetch(`${API_BASE}/chat/pick`, {
136
+ method: "POST",
137
+ headers: { "Content-Type": "application/json" },
138
+ body: JSON.stringify(args),
139
+ });
140
+ if (!res.ok) throw new Error(`API error: ${res.status}`);
141
+ }
142
+
143
  export async function submitRating(args: {
144
  run_id: string;
145
  user_id: string;
frontend/src/types.ts CHANGED
@@ -66,10 +66,23 @@ export interface EvalScores {
66
  gaze_alignment: number;
67
  }
68
 
 
 
 
 
 
 
 
 
 
 
 
 
69
  export interface ChatResponse {
70
  user_id: string;
71
  query: string;
72
  response: string;
 
73
  affect: string;
74
  llm_tier: string;
75
  retrieval_mode: string;
@@ -90,4 +103,10 @@ export interface ChatMessage {
90
  rephrased?: boolean;
91
  isTurnaround?: boolean;
92
  evalScores?: EvalScores | null;
 
 
 
 
 
 
93
  }
 
66
  gaze_alignment: number;
67
  }
68
 
69
+ export type CandidateStrategy =
70
+ | "broad"
71
+ | "focused"
72
+ | "serendipitous"
73
+ | "side_index";
74
+
75
+ export interface Candidate {
76
+ text: string;
77
+ strategy: CandidateStrategy | string;
78
+ grounded_buckets: string[];
79
+ }
80
+
81
  export interface ChatResponse {
82
  user_id: string;
83
  query: string;
84
  response: string;
85
+ candidates: Candidate[];
86
  affect: string;
87
  llm_tier: string;
88
  retrieval_mode: string;
 
103
  rephrased?: boolean;
104
  isTurnaround?: boolean;
105
  evalScores?: EvalScores | null;
106
+ candidates?: Candidate[];
107
+ // picked becomes true after the user clicks one — also locks in `content` to the picked text
108
+ picked?: boolean;
109
+ pickedIdx?: number;
110
+ // Candidates from prior regeneration rounds — rendered struck-through above the active picker
111
+ rejectedRounds?: Candidate[][];
112
  }