Spaces:

ub-aac-chatbot
/

aac-chatbot

Sleeping

shwetangisingh commited on Apr 20

Commit

df78c68

1 Parent(s): 690c106

Streaming candidate picker + side-index feedback loops

- Planner fans out 3 candidates with distinct grounding strategies
(broad/focused/serendipitous for memory; good/fine/rough for
present-state). Streams tokens via SSE through /chat/stream and
/chat/regenerate/stream.
- Picker UI: stackable cards per candidate, "try again" regenerates
with prior options marked rejected. Head-shake during open picker
triggers regenerate; post-pick still triggers turnaround.
- Side-index at data/pick_index/<uid>/ stores (query → picked text +
buckets). Feeds back into generation as a prior_pick retrieved
chunk and blends into bucket_priors at weight 0.3 (transient).
- Concurrency: RLock on pick_index cache, Lock on planner completed[].
rAF-batched token rendering. SSE reader drains trailing buffer.
Empty-candidate fallback surfaces actionable text + logs.

Files changed (15) hide show

.gitignore +1 -0
README.md +10 -6
backend/api/main.py +407 -1
backend/generation/llm_client.py +51 -0
backend/main.py +1 -0
backend/pipeline/graph.py +16 -0
backend/pipeline/nodes/feedback.py +5 -0
backend/pipeline/nodes/planner.py +468 -28
backend/pipeline/nodes/retrieval.py +83 -0
backend/pipeline/state.py +9 -1
backend/retrieval/pick_index.py +124 -0
frontend/src/App.css +104 -0
frontend/src/components/ChatPanel.tsx +467 -81
frontend/src/lib/api.ts +96 -0
frontend/src/types.ts +19 -0

.gitignore CHANGED Viewed

@@ -18,6 +18,7 @@ env/
 # Data — indexes are rebuilt from source; do NOT commit binaries
 data/vector_store/
 # Per-turn JSONL logs (contain user conversation content)
 logs/

 # Data — indexes are rebuilt from source; do NOT commit binaries
 data/vector_store/
+data/pick_index/
 # Per-turn JSONL logs (contain user conversation content)
 logs/

README.md CHANGED Viewed

@@ -386,7 +386,6 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
 ### Dataset
 - [x] **[Core]** Memories carry three chunk types per persona — `narrative`, `social_post`, `chat_log` — each with a `bucket` label. Type is preserved through the vector-store metadata and feeds the P(type) session prior.
-- [ ] **[Core]** Write down the data schema somewhere so evals can reuse it
 ### Sensing (frontend)
@@ -427,10 +426,10 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
 ### Generation
-- [ ] **[Core]** API returns one response. Should return multiple candidates so the user can pick (and so the next item works)
-- [ ] **[Core]** Frontend needs a candidate picker — show all the options, let the user click one, send the selection back
-- [ ] **[Bonus]** When user picks a candidate, save the `(query, picked)` pair to a side vector index and check it first next turn
-- [x] LLM temperature bumped from 0.4 → 0.8 in [backend/pipeline/nodes/planner.py](backend/pipeline/nodes/planner.py). The old setting produced near-identical responses across turns even when affect/gesture changed, which made the sensing→output link hard to see. 0.8 gives meaningful lexical variation while staying in the persona's voice.
 ### Evals
@@ -450,10 +449,15 @@ Scoring runs synchronously on the `/chat` response path and the `eval_scores` di
 - [x] **[Eval]** Multimodal alignment — affect scored by positive/negative lexicon overlap vs. target sentiment, gesture by opener-phrase regex (THUMBS_UP/THUMBS_DOWN/WAVING), gaze by fraction of retrieved chunks matching the looked-at bucket
 - [x] **[Eval]** Authenticity — per-turn stars under each assistant bubble, POST to `/feedback/rating`, logged with `run_id + rater_id`
 - [ ] **[Eval]** For the live in-class eval: figure out the actual session — who rates (partners + experts per spec), how many turns each, what gets shown to them. The Likert form is the easy part; the protocol isn't written down anywhere
 ### Cleanup
-- [ ] move the affect → `StyleDirective` config (`_AFFECT_CONFIG` in [intent.py](backend/pipeline/nodes/intent.py)) and the gesture directives ([labels.py](backend/sensing/labels.py)) out of code into a yaml
 - [x] delete `backend/sensing/` (dead code, sensing is in frontend) — done, only `labels.py` remains
 - [x] per-persona affect overrides (`_PERSONA_TONE_OVERRIDES`) deleted — redundant with `stylistic_preferences` in the new persona JSONs

 ### Dataset
 - [x] **[Core]** Memories carry three chunk types per persona — `narrative`, `social_post`, `chat_log` — each with a `bucket` label. Type is preserved through the vector-store metadata and feeds the P(type) session prior.
 ### Sensing (frontend)
 ### Generation
+- [x] **[Core]** API returns 3 candidates (plus an optional side-index hit) on `/chat` — see `candidates` in [backend/api/main.py](backend/api/main.py) `ChatResponse`. Planner fans out three grounding strategies in parallel threads and dedupes identical outputs: **broad** (all retrieved personal chunks), **focused** (top chunk only), and **serendipitous** (random non-top chunks) — see `_pick_strategy_chunks` in [backend/pipeline/nodes/planner.py](backend/pipeline/nodes/planner.py). Turnaround/present-state retries skip the fan-out and regenerate a single response.
+- [x] **[Core]** Frontend picker shows stacked candidate cards with a strategy label under each; click to commit, which strikes the rest, locks the AAC bubble to the chosen text, and fires `POST /chat/pick`. One-candidate responses render as a normal bubble. See `handlePick` + `.candidate-list` in [frontend/src/components/ChatPanel.tsx](frontend/src/components/ChatPanel.tsx).
+- [x] **[Bonus]** Side-index at `data/pick_index/<uid>/` stores `(query embedding → picked text, strategy, picked_buckets)` after every pick. Two feedback loops into generation: (1) the retrieval node injects the previously-picked text as a `source: "prior_pick"` chunk rendered in a "you answered like this before" block — the three LLM candidates all see it and riff on it; (2) retrieval blends cumulative `bucket_pick_counts` into this turn's `bucket_priors` at weight 0.3 (transient — doesn't persist across turns), so users who historically pick family memories bias retrieval toward family without overriding the session prior. The raw picked text is also still surfaced as a standalone `side_index` candidate. See [backend/retrieval/pick_index.py](backend/retrieval/pick_index.py), `_blend_pick_history_into_priors` + `_prepend_prior_pick` in [backend/pipeline/nodes/retrieval.py](backend/pipeline/nodes/retrieval.py), and the prior-pick block in `_build_user` in [backend/pipeline/nodes/planner.py](backend/pipeline/nodes/planner.py).
+- [x] LLM temperature bumped from 0.4 → 0.8, then pulled back to 0.7 once chunk-variation became the primary diversity axis. With three different grounding strategies feeding three parallel calls, sampling noise matters less than which memories are in the context window.
 ### Evals
 - [x] **[Eval]** Multimodal alignment — affect scored by positive/negative lexicon overlap vs. target sentiment, gesture by opener-phrase regex (THUMBS_UP/THUMBS_DOWN/WAVING), gaze by fraction of retrieved chunks matching the looked-at bucket
 - [x] **[Eval]** Authenticity — per-turn stars under each assistant bubble, POST to `/feedback/rating`, logged with `run_id + rater_id`
 - [ ] **[Eval]** For the live in-class eval: figure out the actual session — who rates (partners + experts per spec), how many turns each, what gets shown to them. The Likert form is the easy part; the protocol isn't written down anywhere
+- [ ] **[Eval]** Relevance score — one NLI call per turn asking "does the response address the partner's query?" Fills the biggest current gap: a perfectly grounded but off-topic reply scores 100% grounded today and we'd never catch it
+- [ ] **[Eval]** Candidate diversity — mean pairwise cosine distance among the 3 candidates in a picker round. Low diversity = picker showing three paraphrases of the same answer (the "aloha" problem), which is a signal that retrieval or temperature needs tuning for that query
+- [ ] **[Eval]** Picker-aware metrics from `turns.jsonl` + `picks.jsonl`: which strategy wins most (`broad` vs `focused` vs `serendipitous` vs `side_index`), pick rate (% of turns where user clicked a card), regenerate rate (% of turns where user clicked "try again"). All computable offline, no runtime cost
+- [ ] **[Eval]** Score alternate candidates too, not just the selected one. Right now `compute_evals` only scores `selected_response`; scoring all 3 would let us measure whether the picker actually improves quality over taking candidate 0 blindly
+- [ ] **[Eval]** UI coverage gap: `compute_evals` returns 12 fields but `EvalPanel` renders only 5 pills (latency, grounded, affect, gesture, gaze). Hallucination rate, overall multimodal_alignment, SLO target/margin are computed and logged but never surfaced in the bubble. Decide what belongs as a pill vs a tooltip-only number vs an offline-only log field
+- [ ] **[Eval]** On pill hover, tooltip should explain *how* the number was computed, not just what it means. Today the `title` attributes say "Groundedness: fraction of response sentences supported by retrieved memories" — which is the definition but not the math. Want: "5/8 sentences had NLI entailment prob ≥ 0.5 against the retrieved chunks" for groundedness; "3/4 positive-lexicon words matched HAPPY target" for affect; raw scorer inputs + thresholds exposed inline so the number isn't a black box
 ### Cleanup
 - [x] delete `backend/sensing/` (dead code, sensing is in frontend) — done, only `labels.py` remains
 - [x] per-persona affect overrides (`_PERSONA_TONE_OVERRIDES`) deleted — redundant with `stylistic_preferences` in the new persona JSONs

backend/api/main.py CHANGED Viewed

@@ -9,6 +9,7 @@ from pathlib import Path
 from fastapi import FastAPI, HTTPException
 from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel, Field
 from backend.config.settings import settings
@@ -18,11 +19,12 @@ from backend.generation.llm_client import (  # active_model used by /debug/confi
     get_client,
 )
 from backend.guardrails.checks import check_input
-from backend.pipeline.graph import run_pipeline
 from backend.pipeline.intent_kind import classify_intent_kind
 from backend.pipeline.nodes import feedback as feedback_node
 from backend.pipeline.nodes import planner as planner_node
 from backend.pipeline.state import PipelineState
 from backend.retrieval.priors import BUCKETS, CHUNK_TYPES, uniform
 from backend.retrieval.vector_store import _get_embedder, retrieve
@@ -83,10 +85,17 @@ class TurnaroundRequest(BaseModel):
     head_signal: str | None = None
 class ChatResponse(BaseModel):
     user_id: str
     query: str
     response: str
     affect: str
     llm_tier: str
     llm_model: str
@@ -98,6 +107,18 @@ class ChatResponse(BaseModel):
     eval_scores: dict | None = None
 class RatingRequest(BaseModel):
     run_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
     user_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
@@ -170,6 +191,7 @@ def _build_initial_state(req: ChatRequest, session: dict) -> PipelineState:
         retrieval_mode_used="",
         augmented_prompt=None,
         candidates=[],
         selected_response=None,
         llm_tier_used="",
         llm_model_used="",
@@ -330,6 +352,7 @@ def chat(req: ChatRequest):
             user_id=req.user_id,
             query=req.query,
             response=guard["fallback"],
             affect="NEUTRAL",
             llm_tier="none",
             llm_model="none",
@@ -368,6 +391,7 @@ def chat(req: ChatRequest):
         user_id=req.user_id,
         query=req.query,
         response=result["selected_response"] or "",
         affect=affect_emotion,
         llm_tier=result.get("llm_tier_used", "unknown"),
         llm_model=result.get("llm_model_used", "unknown"),
@@ -380,6 +404,105 @@ def chat(req: ChatRequest):
     )
 @app.post("/chat/turnaround", response_model=ChatResponse)
 def chat_turnaround(req: TurnaroundRequest):
     if req.user_id not in _sessions:
@@ -470,6 +593,289 @@ def chat_turnaround(req: TurnaroundRequest):
         user_id=req.user_id,
         query=replan_state["raw_query"],
         response=replan_state["selected_response"] or "",
         affect=affect_emotion,
         llm_tier=replan_state.get("llm_tier_used", "unknown"),
         llm_model=replan_state.get("llm_model_used", "unknown"),

 from fastapi import FastAPI, HTTPException
 from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import StreamingResponse
 from pydantic import BaseModel, Field
 from backend.config.settings import settings
     get_client,
 )
 from backend.guardrails.checks import check_input
+from backend.pipeline.graph import choose_planner_tier, run_pipeline, run_until_planner
 from backend.pipeline.intent_kind import classify_intent_kind
 from backend.pipeline.nodes import feedback as feedback_node
 from backend.pipeline.nodes import planner as planner_node
 from backend.pipeline.state import PipelineState
+from backend.retrieval import pick_index
 from backend.retrieval.priors import BUCKETS, CHUNK_TYPES, uniform
 from backend.retrieval.vector_store import _get_embedder, retrieve
     head_signal: str | None = None
+class CandidateOut(BaseModel):
+    text: str
+    strategy: str
+    grounded_buckets: list[str] = []
 class ChatResponse(BaseModel):
     user_id: str
     query: str
     response: str
+    candidates: list[CandidateOut] = []
     affect: str
     llm_tier: str
     llm_model: str
     eval_scores: dict | None = None
+class PickRequest(BaseModel):
+    run_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
+    user_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
+    picked_idx: int = Field(ge=0, le=10)
+class RegenerateRequest(BaseModel):
+    user_id: str
+    turn_id: int | None = None
+    rejected_texts: list[str] = Field(default_factory=list, max_length=20)
 class RatingRequest(BaseModel):
     run_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
     user_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
         retrieval_mode_used="",
         augmented_prompt=None,
         candidates=[],
+        rejected_candidates=[],
         selected_response=None,
         llm_tier_used="",
         llm_model_used="",
             user_id=req.user_id,
             query=req.query,
             response=guard["fallback"],
+            candidates=[],
             affect="NEUTRAL",
             llm_tier="none",
             llm_model="none",
         user_id=req.user_id,
         query=req.query,
         response=result["selected_response"] or "",
+        candidates=[CandidateOut(**c) for c in result.get("candidates") or []],
         affect=affect_emotion,
         llm_tier=result.get("llm_tier_used", "unknown"),
         llm_model=result.get("llm_model_used", "unknown"),
     )
+@app.post("/chat/stream")
+def chat_stream(req: ChatRequest):
+    """Server-Sent Events version of /chat. Runs intent + retrieval synchronously,
+    then streams planner candidate tokens as they arrive. Final event carries the
+    full ChatResponse-shaped payload.
+    """
+    guard = check_input(req.query)
+    if not guard["allowed"]:
+        # Mirror the non-stream /chat early-exit.
+        payload = {
+            "user_id": req.user_id,
+            "query": req.query,
+            "response": guard["fallback"],
+            "candidates": [],
+            "affect": "NEUTRAL",
+            "llm_tier": "none",
+            "llm_model": "none",
+            "retrieval_mode": "none",
+            "latency": {},
+            "guardrail_passed": False,
+            "turn_id": 0,
+            "run_id": None,
+            "eval_scores": None,
+        }
+        def _one_event():
+            yield _sse({"type": "complete", "response": payload})
+        return StreamingResponse(_one_event(), media_type="text/event-stream")
+    session = _get_or_init_session(req.user_id)
+    initial_state = _build_initial_state(req, session)
+    def _gen():
+        state = run_until_planner(initial_state)
+        tier = choose_planner_tier(state)
+        completion: dict | None = None
+        for evt in planner_node._run_stream(state, tier=tier):
+            if evt["type"] == "complete":
+                completion = evt["planner_update"]
+                break
+            yield _sse(evt)
+        if completion is None:
+            yield _sse({"type": "error", "message": "planner produced no completion"})
+            return
+        state.update(completion)  # type: ignore[typeddict-item]
+        state.update(feedback_node.run(state))  # type: ignore[typeddict-item]
+        session["session_history"] = state["session_history"]
+        session["bucket_priors"] = state["bucket_priors"]
+        session["type_priors"] = state["type_priors"]
+        session["last_state"] = state
+        affect_emotion = (state.get("affect") or {}).get("emotion", "NEUTRAL")
+        run_id = state.get("run_id")
+        eval_scores = _compute_and_persist_evals(
+            run_id=run_id,
+            user_id=req.user_id,
+            turn_id=state["turn_id"],
+            response=state["selected_response"] or "",
+            chunks=list(state.get("retrieved_chunks") or []),
+            latency_log=dict(state.get("latency_log") or {}),
+            affect=affect_emotion,
+            gesture_tag=req.gesture_tag,
+            gaze_bucket=req.gaze_bucket,
+        )
+        final = {
+            "user_id": req.user_id,
+            "query": req.query,
+            "response": state["selected_response"] or "",
+            "candidates": [dict(c) for c in state.get("candidates") or []],
+            "affect": affect_emotion,
+            "llm_tier": state.get("llm_tier_used", "unknown"),
+            "llm_model": state.get("llm_model_used", "unknown"),
+            "retrieval_mode": state.get("retrieval_mode_used", "unknown"),
+            "latency": state.get("latency_log") or {},
+            "guardrail_passed": state.get("guardrail_passed", True),
+            "run_id": run_id,
+            "turn_id": state["turn_id"],
+            "eval_scores": eval_scores,
+        }
+        yield _sse({"type": "complete", "response": final})
+    return StreamingResponse(
+        _gen(),
+        media_type="text/event-stream",
+        headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+    )
+def _sse(data: dict) -> str:
+    return f"data: {json.dumps(data, ensure_ascii=False)}\n\n"
 @app.post("/chat/turnaround", response_model=ChatResponse)
 def chat_turnaround(req: TurnaroundRequest):
     if req.user_id not in _sessions:
         user_id=req.user_id,
         query=replan_state["raw_query"],
         response=replan_state["selected_response"] or "",
+        candidates=[CandidateOut(**c) for c in replan_state.get("candidates") or []],
+        affect=affect_emotion,
+        llm_tier=replan_state.get("llm_tier_used", "unknown"),
+        llm_model=replan_state.get("llm_model_used", "unknown"),
+        retrieval_mode=replan_state.get("retrieval_mode_used", "unknown"),
+        latency=replan_state.get("latency_log") or {},
+        guardrail_passed=replan_state.get("guardrail_passed", True),
+        run_id=run_id,
+        turn_id=replan_state["turn_id"],
+        eval_scores=eval_scores,
+    )
+def _find_turn_from_jsonl(run_id: str) -> dict | None:
+    """Scan turns.jsonl from the end for a matching run_id. Used as fallback
+    when the session's last_state has already moved on."""
+    path = Path(settings.logs_dir) / "turns.jsonl"
+    if not path.exists():
+        return None
+    try:
+        with open(path, encoding="utf-8") as f:
+            lines = f.readlines()
+    except OSError:
+        return None
+    for line in reversed(lines[-500:]):  # bounded tail scan
+        try:
+            row = json.loads(line)
+        except json.JSONDecodeError:
+            continue
+        if row.get("run_id") == run_id:
+            return row
+    return None
+@app.post("/chat/pick")
+def pick_candidate(req: PickRequest):
+    if not _RUN_ID_RE.match(req.run_id):
+        raise HTTPException(status_code=400, detail="invalid run_id")
+    session = _sessions.get(req.user_id) or {}
+    last = session.get("last_state") or {}
+    candidates = last.get("candidates") or []
+    query_text = last.get("raw_query") or ""
+    # Fallback: last_state already advanced past this run_id — read from JSONL
+    if last.get("run_id") != req.run_id or not candidates:
+        row = _find_turn_from_jsonl(req.run_id)
+        if not row:
+            raise HTTPException(status_code=404, detail="turn not found")
+        candidates = row.get("candidates") or []
+        query_text = row.get("query") or query_text
+    if req.picked_idx >= len(candidates):
+        raise HTTPException(status_code=400, detail="picked_idx out of range")
+    picked = candidates[req.picked_idx]
+    picked_text = picked.get("text", "")
+    strategy = picked.get("strategy", "unknown")
+    picked_buckets = [
+        b for b in (picked.get("grounded_buckets") or []) if b and b != "open_domain"
+    ]
+    if query_text and picked_text:
+        try:
+            pick_index.add(
+                query=query_text,
+                user_id=req.user_id,
+                strategy=strategy,
+                picked_text=picked_text,
+                picked_buckets=picked_buckets,
+            )
+        except Exception as exc:
+            _log.warning("pick_index add failed: %r", exc)
+    logs_dir = Path(settings.logs_dir)
+    logs_dir.mkdir(parents=True, exist_ok=True)
+    entry = {
+        "ts": time.time(),
+        "run_id": req.run_id,
+        "user_id": req.user_id,
+        "picked_idx": req.picked_idx,
+        "strategy": strategy,
+        "picked_text": picked_text,
+        "query": query_text,
+    }
+    with open(logs_dir / "picks.jsonl", "a", encoding="utf-8") as f:
+        f.write(json.dumps(entry, ensure_ascii=False) + "\n")
+    return {"status": "ok", "strategy": strategy}
+@app.post("/chat/regenerate/stream")
+def chat_regenerate_stream(req: RegenerateRequest):
+    """Streaming regenerate — same as /chat/stream but reuses last_state and
+    marks all prior candidates as rejected."""
+    if req.user_id not in _sessions:
+        raise HTTPException(status_code=404, detail="no active session")
+    session = _sessions[req.user_id]
+    last: PipelineState | None = session.get("last_state")
+    if last is None:
+        raise HTTPException(status_code=409, detail="no prior turn to regenerate")
+    if req.turn_id is not None and req.turn_id != last["turn_id"]:
+        raise HTTPException(status_code=409, detail="stale turn_id")
+    gen_cfg = dict(last.get("generation_config") or {})
+    gen_cfg["persona_mod"] = "all_rejected"
+    gen_cfg.setdefault("tone_tag", "[TONE:TRY_DIFFERENT_ANGLE]")
+    prior_rejected = [c.get("text", "") for c in (last.get("candidates") or [])]
+    merged = (
+        list(last.get("rejected_candidates") or [])
+        + [t for t in prior_rejected if t]
+        + [t for t in req.rejected_texts if t]
+    )
+    seen: set[str] = set()
+    rejected: list[str] = []
+    for t in merged:
+        key = t.strip().lower()
+        if key and key not in seen:
+            seen.add(key)
+            rejected.append(t)
+    trimmed_history = list(last.get("session_history") or [])
+    if trimmed_history and trimmed_history[-1].get("role") == "aac_user":
+        trimmed_history.pop()
+    if trimmed_history and trimmed_history[-1].get("role") == "partner":
+        trimmed_history.pop()
+    replan_state: PipelineState = dict(last)  # type: ignore[assignment]
+    replan_state["session_history"] = trimmed_history
+    replan_state["generation_config"] = gen_cfg
+    replan_state["rejected_candidates"] = rejected
+    replan_state["turnaround_triggered"] = False
+    replan_state["latency_log"] = {
+        "t_sensing": 0.0,
+        "t_intent": 0.0,
+        "t_retrieval": 0.0,
+        "t_generation": 0.0,
+        "t_total": 0.0,
+    }
+    def _gen():
+        completion: dict | None = None
+        for evt in planner_node._run_stream(replan_state, tier="primary"):
+            if evt["type"] == "complete":
+                completion = evt["planner_update"]
+                break
+            yield _sse(evt)
+        if completion is None:
+            yield _sse({"type": "error", "message": "planner produced no completion"})
+            return
+        replan_state.update(completion)  # type: ignore[typeddict-item]
+        replan_state.update(feedback_node.run(replan_state))  # type: ignore[typeddict-item]
+        session["session_history"] = replan_state["session_history"]
+        session["bucket_priors"] = replan_state["bucket_priors"]
+        session["type_priors"] = replan_state["type_priors"]
+        session["last_state"] = replan_state
+        affect_emotion = (replan_state.get("affect") or {}).get("emotion", "NEUTRAL")
+        run_id = replan_state.get("run_id")
+        eval_scores = _compute_and_persist_evals(
+            run_id=run_id,
+            user_id=req.user_id,
+            turn_id=replan_state["turn_id"],
+            response=replan_state["selected_response"] or "",
+            chunks=list(replan_state.get("retrieved_chunks") or []),
+            latency_log=dict(replan_state.get("latency_log") or {}),
+            affect=affect_emotion,
+            gesture_tag=replan_state.get("gesture_tag"),
+            gaze_bucket=replan_state.get("gaze_bucket"),
+        )
+        final = {
+            "user_id": req.user_id,
+            "query": replan_state["raw_query"],
+            "response": replan_state["selected_response"] or "",
+            "candidates": [dict(c) for c in replan_state.get("candidates") or []],
+            "affect": affect_emotion,
+            "llm_tier": replan_state.get("llm_tier_used", "unknown"),
+            "llm_model": replan_state.get("llm_model_used", "unknown"),
+            "retrieval_mode": replan_state.get("retrieval_mode_used", "unknown"),
+            "latency": replan_state.get("latency_log") or {},
+            "guardrail_passed": replan_state.get("guardrail_passed", True),
+            "run_id": run_id,
+            "turn_id": replan_state["turn_id"],
+            "eval_scores": eval_scores,
+        }
+        yield _sse({"type": "complete", "response": final})
+    return StreamingResponse(
+        _gen(),
+        media_type="text/event-stream",
+        headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+    )
+@app.post("/chat/regenerate", response_model=ChatResponse)
+def chat_regenerate(req: RegenerateRequest):
+    """Re-run the planner for the same turn with all prior candidates marked rejected.
+    Does NOT advance turn_id — same partner query, fresh fan-out of candidates.
+    """
+    if req.user_id not in _sessions:
+        raise HTTPException(status_code=404, detail="no active session")
+    session = _sessions[req.user_id]
+    last: PipelineState | None = session.get("last_state")
+    if last is None:
+        raise HTTPException(status_code=409, detail="no prior turn to regenerate")
+    if req.turn_id is not None and req.turn_id != last["turn_id"]:
+        raise HTTPException(status_code=409, detail="stale turn_id")
+    gen_cfg = dict(last.get("generation_config") or {})
+    gen_cfg["persona_mod"] = "all_rejected"
+    gen_cfg.setdefault("tone_tag", "[TONE:TRY_DIFFERENT_ANGLE]")
+    prior_rejected = [c.get("text", "") for c in (last.get("candidates") or [])]
+    merged_rejected = (
+        list(last.get("rejected_candidates") or [])
+        + [t for t in prior_rejected if t]
+        + [t for t in req.rejected_texts if t]
+    )
+    # Dedupe while preserving order.
+    seen: set[str] = set()
+    rejected: list[str] = []
+    for t in merged_rejected:
+        key = t.strip().lower()
+        if key and key not in seen:
+            seen.add(key)
+            rejected.append(t)
+    # Strip the tail (partner, aac_user) so feedback doesn't stack duplicate
+    # history entries on every regenerate — the user hasn't committed yet.
+    trimmed_history = list(last.get("session_history") or [])
+    if trimmed_history and trimmed_history[-1].get("role") == "aac_user":
+        trimmed_history.pop()
+    if trimmed_history and trimmed_history[-1].get("role") == "partner":
+        trimmed_history.pop()
+    replan_state: PipelineState = dict(last)  # type: ignore[assignment]
+    replan_state["session_history"] = trimmed_history
+    replan_state["generation_config"] = gen_cfg
+    replan_state["rejected_candidates"] = rejected
+    replan_state["turnaround_triggered"] = False  # keep multi-shot
+    replan_state["latency_log"] = {
+        "t_sensing": 0.0,
+        "t_intent": 0.0,
+        "t_retrieval": 0.0,
+        "t_generation": 0.0,
+        "t_total": 0.0,
+    }
+    planner_update = planner_node.run_primary(replan_state)
+    replan_state.update(planner_update)  # type: ignore[typeddict-item]
+    # Feedback node rewrites history + assigns a new run_id. Each regenerate
+    # is its own row in turns.jsonl for the eval record.
+    feedback_update = feedback_node.run(replan_state)
+    replan_state.update(feedback_update)  # type: ignore[typeddict-item]
+    session["session_history"] = replan_state["session_history"]
+    session["bucket_priors"] = replan_state["bucket_priors"]
+    session["type_priors"] = replan_state["type_priors"]
+    session["last_state"] = replan_state
+    affect_emotion = (replan_state.get("affect") or {}).get("emotion", "NEUTRAL")
+    run_id = replan_state.get("run_id")
+    eval_scores = _compute_and_persist_evals(
+        run_id=run_id,
+        user_id=req.user_id,
+        turn_id=replan_state["turn_id"],
+        response=replan_state["selected_response"] or "",
+        chunks=list(replan_state.get("retrieved_chunks") or []),
+        latency_log=dict(replan_state.get("latency_log") or {}),
+        affect=affect_emotion,
+        gesture_tag=replan_state.get("gesture_tag"),
+        gaze_bucket=replan_state.get("gaze_bucket"),
+    )
+    return ChatResponse(
+        user_id=req.user_id,
+        query=replan_state["raw_query"],
+        response=replan_state["selected_response"] or "",
+        candidates=[CandidateOut(**c) for c in replan_state.get("candidates") or []],
         affect=affect_emotion,
         llm_tier=replan_state.get("llm_tier_used", "unknown"),
         llm_model=replan_state.get("llm_model_used", "unknown"),

backend/generation/llm_client.py CHANGED Viewed

@@ -1,5 +1,6 @@
 # Two-tier LLM client — primary / fallback, both Ollama Cloud over OpenAI-compatible HTTP.
 import re
 from functools import lru_cache
 from typing import Any
@@ -87,6 +88,56 @@ def chat_complete(
     return stripped
 def warmup(tier: str | None = None) -> None:
     chat_complete(
         messages=[{"role": "user", "content": "hi"}],

 # Two-tier LLM client — primary / fallback, both Ollama Cloud over OpenAI-compatible HTTP.
 import re
+from collections.abc import Iterator
 from functools import lru_cache
 from typing import Any
     return stripped
+def chat_complete_stream(
+    messages: list[dict],
+    max_tokens: int,
+    tier: str | None = None,
+    temperature: float = 0.7,
+    **kwargs: Any,
+) -> Iterator[str]:
+    """Yield token deltas as they arrive. Thinking-mode stripping is applied
+    post-hoc on the buffered text by the caller — streaming <think>…</think>
+    into the UI would confuse the picker anyway.
+    """
+    resolved_tier = tier or settings.active_llm_tier
+    model = active_model(resolved_tier)
+    client = get_client(resolved_tier)
+    patched_messages = messages
+    extra_body: dict[str, Any] = kwargs.pop("extra_body", {})
+    if settings.thinking_mode == "suppress":
+        patched_messages = _apply_no_think(messages)
+    effective_max_tokens = max_tokens
+    if settings.thinking_mode in ("strip", "full"):
+        effective_max_tokens = max_tokens + settings.thinking_token_budget
+    stream = client.chat.completions.create(
+        model=model,
+        messages=patched_messages,
+        max_tokens=effective_max_tokens,
+        temperature=temperature,
+        stream=True,
+        extra_body=extra_body or None,
+        **kwargs,
+    )
+    for chunk in stream:
+        if not chunk.choices:
+            continue
+        delta = chunk.choices[0].delta
+        piece = getattr(delta, "content", None) or ""
+        if piece:
+            yield piece
+def finalize_streamed(text: str) -> str:
+    """Apply the same post-processing chat_complete does once a stream is done."""
+    if settings.thinking_mode in ("off", "strip"):
+        text = _strip_think_tags(text)
+    return text.strip()
 def warmup(tier: str | None = None) -> None:
     chat_complete(
         messages=[{"role": "user", "content": "hi"}],

backend/main.py CHANGED Viewed

@@ -180,6 +180,7 @@ def main() -> None:
             retrieval_mode_used="",
             augmented_prompt=None,
             candidates=[],
             selected_response=None,
             llm_tier_used="",
             latency_log={

             retrieval_mode_used="",
             augmented_prompt=None,
             candidates=[],
+            rejected_candidates=[],
             selected_response=None,
             llm_tier_used="",
             latency_log={

backend/pipeline/graph.py CHANGED Viewed

@@ -34,3 +34,19 @@ def run_pipeline(state: PipelineState) -> PipelineState:
     _merge(state, feedback.run(state))
     return state

     _merge(state, feedback.run(state))
     return state
+def run_until_planner(state: PipelineState) -> PipelineState:
+    """Run intent + retrieval only. Used by the streaming endpoint so it can
+    then drive the planner's token stream itself and call feedback at the end.
+    """
+    _merge(state, intent.run(state))
+    if _route_by_affect(state) == "fast":
+        _merge(state, retrieval.run_fast(state))
+    else:
+        _merge(state, retrieval.run_full(state))
+    return state
+def choose_planner_tier(state: PipelineState) -> str:
+    return _route_by_latency(state)

backend/pipeline/nodes/feedback.py CHANGED Viewed

@@ -39,12 +39,14 @@ def _log_to_jsonl(
     latency = state.get("latency_log") or {}
     affect = (state.get("affect") or {}).get("emotion", "UNKNOWN")
     chunks = state.get("retrieved_chunks") or []
     entry = {
         "run_id": run_id,
         "ts": time.time(),
         "user_id": state["user_id"],
         "turn_id": state["turn_id"],
         "llm_tier": state.get("llm_tier_used", "unknown"),
         "retrieval_mode": state.get("retrieval_mode_used", "unknown"),
         "affect": affect,
@@ -57,6 +59,7 @@ def _log_to_jsonl(
         ),
         "num_contextual": sum(1 for c in chunks if c.get("source") == "contextual"),
         "num_open_domain": sum(1 for c in chunks if c.get("source") == "open_domain"),
         "latency": {
             "t_sensing": latency.get("t_sensing", 0.0),
             "t_intent": latency.get("t_intent", 0.0),
@@ -65,6 +68,8 @@ def _log_to_jsonl(
             "t_total": latency.get("t_total", 0.0),
         },
         "response": state.get("selected_response") or "",
         "bucket_priors_after": bucket_priors_after,
         "type_priors_after": type_priors_after,
     }

     latency = state.get("latency_log") or {}
     affect = (state.get("affect") or {}).get("emotion", "UNKNOWN")
     chunks = state.get("retrieved_chunks") or []
+    candidates = state.get("candidates") or []
     entry = {
         "run_id": run_id,
         "ts": time.time(),
         "user_id": state["user_id"],
         "turn_id": state["turn_id"],
+        "query": state["raw_query"],
         "llm_tier": state.get("llm_tier_used", "unknown"),
         "retrieval_mode": state.get("retrieval_mode_used", "unknown"),
         "affect": affect,
         ),
         "num_contextual": sum(1 for c in chunks if c.get("source") == "contextual"),
         "num_open_domain": sum(1 for c in chunks if c.get("source") == "open_domain"),
+        "num_prior_pick": sum(1 for c in chunks if c.get("source") == "prior_pick"),
         "latency": {
             "t_sensing": latency.get("t_sensing", 0.0),
             "t_intent": latency.get("t_intent", 0.0),
             "t_total": latency.get("t_total", 0.0),
         },
         "response": state.get("selected_response") or "",
+        "candidates": [dict(c) for c in candidates],
+        "n_candidates": len(candidates),
         "bucket_priors_after": bucket_priors_after,
         "type_priors_after": type_priors_after,
     }

backend/pipeline/nodes/planner.py CHANGED Viewed

@@ -1,12 +1,32 @@
 import time
 from backend.config.settings import settings
-from backend.generation.llm_client import active_model, chat_complete
 from backend.guardrails.checks import check_output
 from backend.pipeline.intent_kind import classify_intent_kind
-from backend.pipeline.state import PipelineState, StyleDirective
 from backend.sensing.labels import GESTURE_DIRECTIVES
 _PERSONA_MOD_INSTRUCTIONS = {
     "amplify_quirks": "Amplify your characteristic style and personality.",
     "suppress_humor": "Be direct and supportive. Suppress humor.",
@@ -30,6 +50,12 @@ _PERSONA_MOD_INSTRUCTIONS = {
         "read (if you said 'good', try 'not great') or honestly admit "
         "you're not sure how you feel right now. Do NOT invent details."
     ),
 }
@@ -41,7 +67,23 @@ def run_fallback(state: PipelineState) -> dict:
     return _run(state, tier="fallback")
-def _run(state: PipelineState, tier: str) -> dict:
     t0 = time.perf_counter()
     profile = state["persona_profile"]
@@ -57,31 +99,386 @@ def _run(state: PipelineState, tier: str) -> dict:
     rejected_response: str | None = None
     if turnaround_triggered:
         rejected_response = state.get("selected_response")
     intent_kind = classify_intent_kind(state.get("intent_route"))
-    messages = _build_messages(
-        profile,
-        chunks,
-        history,
-        state["raw_query"],
-        style,
-        gen_cfg,
-        gesture_tag=gesture_tag,
-        air_written_text=air_written_text,
-        rejected_response=rejected_response,
-        intent_kind=intent_kind,
-        affect=affect,
-    )
-    selected = chat_complete(
-        messages=messages,
-        max_tokens=gen_cfg.get("max_tokens", settings.max_tokens_neutral),
-        temperature=0.8,
-        tier=tier,
     )
-    guard = check_output(selected, chunks)
-    if not guard["passed"]:
-        selected = guard["fallback"]
     t_gen = time.perf_counter() - t0
     latency_log = dict(state.get("latency_log") or {})
@@ -94,15 +491,32 @@ def _run(state: PipelineState, tier: str) -> dict:
         4,
     )
-    augmented_prompt = "\n\n".join(f"[{m['role']}] {m['content']}" for m in messages)
     return {
         "augmented_prompt": augmented_prompt,
-        "candidates": [selected],
         "selected_response": selected,
         "llm_tier_used": tier,
         "llm_model_used": active_model(tier),
         "latency_log": latency_log,
-        "guardrail_passed": guard["passed"],
     }
@@ -128,6 +542,7 @@ def _build_messages(
     gesture_tag: str | None = None,
     air_written_text: str | None = None,
     rejected_response: str | None = None,
     intent_kind: str = "memory",
     affect: str = "NEUTRAL",
 ) -> list[dict]:
@@ -146,6 +561,7 @@ def _build_messages(
         air_written_text,
         profile["name"],
         rejected_response=rejected_response,
         intent_kind=intent_kind,
         affect=affect,
     )
@@ -206,12 +622,14 @@ def _build_user(
     persona_name: str,
     *,
     rejected_response: str | None = None,
     intent_kind: str = "memory",
     affect: str = "NEUTRAL",
 ) -> str:
     personal_chunks = [c for c in chunks if c.get("source", "personal") == "personal"]
     contextual_chunks = [c for c in chunks if c.get("source") == "contextual"]
     open_domain_chunks = [c for c in chunks if c.get("source") == "open_domain"]
     memory_block = (
         "\n".join(
@@ -220,6 +638,11 @@ def _build_user(
         )
         or "  (no memories retrieved)"
     )
     contextual_block = (
         "\n".join(f"  {c['text']}" for c in contextual_chunks)
         or "  (nothing relevant from this session)"
@@ -277,6 +700,15 @@ def _build_user(
             f"\nYour previous reply (which you need to replace, not repeat): "
             f'"{safe_rejected}"'
         )
     if intent_kind == "present_state":
         affect_hint = _AFFECT_HINTS.get(affect, _AFFECT_HINTS["NEUTRAL"])
@@ -299,8 +731,16 @@ Reply as {persona_name} in 1–2 sentences, first person.
 - If the affect read is NEUTRAL or doesn't match what you'd say, it's better to say "I'm not sure" or "honestly, I don't really know right now" than to invent.
 - Do NOT use autobiographical facts (job, family, hobbies) unless the partner asked."""
     return f"""\
-{directive_block}{air_writing_block}{turnaround_line}{persona_instruction_line}
 Personal memories:
 {memory_block}

+import concurrent.futures
+import queue
+import random
+import threading
 import time
+from collections.abc import Iterator
 from backend.config.settings import settings
+from backend.generation.llm_client import (
+    active_model,
+    chat_complete,
+    chat_complete_stream,
+    finalize_streamed,
+)
 from backend.guardrails.checks import check_output
 from backend.pipeline.intent_kind import classify_intent_kind
+from backend.pipeline.state import Candidate, PipelineState, StyleDirective
+from backend.retrieval import pick_index
 from backend.sensing.labels import GESTURE_DIRECTIVES
+# For present-state fan-out: three fixed emotional reads the persona can
+# project, so the user can pick among "good / fine / not great" rather than
+# three paraphrases of one mood.
+_PRESENT_STATE_STRATEGIES = [
+    ("present_good", "HAPPY"),
+    ("present_fine", "NEUTRAL"),
+    ("present_rough", "FRUSTRATED"),
+]
 _PERSONA_MOD_INSTRUCTIONS = {
     "amplify_quirks": "Amplify your characteristic style and personality.",
     "suppress_humor": "Be direct and supportive. Suppress humor.",
         "read (if you said 'good', try 'not great') or honestly admit "
         "you're not sure how you feel right now. Do NOT invent details."
     ),
+    "all_rejected": (
+        "The user rejected every option you gave last time. Try a "
+        "meaningfully different angle — different memory focus, different "
+        "emotional register, or admit you don't have a clean answer. Do "
+        "NOT re-use wording from the rejected options."
+    ),
 }
     return _run(state, tier="fallback")
+def run_primary_stream(state: PipelineState) -> Iterator[dict]:
+    """Token-level streaming variant of the planner.
+    Yields events as tokens arrive across all concurrent candidate streams:
+      {"type": "candidate_start", "idx": 0, "strategy": "broad", "grounded_buckets": [...]}
+      {"type": "token", "idx": 0, "delta": "Hello"}
+      {"type": "candidate_done", "idx": 0, "text": "Hello world."}
+      {"type": "side_index", "text": "..."}   (optional, at start if there's a hit)
+      {"type": "complete", "candidates": [...], "selected_response": "...", ... final state dict}
+    """
+    yield from _run_stream(state, tier="primary")
+_STREAM_SENTINEL = object()
+def _run_stream(state: PipelineState, tier: str) -> Iterator[dict]:
     t0 = time.perf_counter()
     profile = state["persona_profile"]
     rejected_response: str | None = None
     if turnaround_triggered:
         rejected_response = state.get("selected_response")
+    rejected_candidates: list[str] = list(state.get("rejected_candidates") or [])
     intent_kind = classify_intent_kind(state.get("intent_route"))
+    max_tokens = gen_cfg.get("max_tokens", settings.max_tokens_neutral)
+    # Turnaround rephrases are single-shot; everything else fans out.
+    # Present-state varies affect (good/fine/rough), memory questions vary
+    # which chunks are primary (broad/focused/serendipitous).
+    single_shot = turnaround_triggered
+    is_present_state = intent_kind == "present_state"
+    if single_shot:
+        strategies: list[tuple[str, str | None]] = [("focused", None)]
+    elif is_present_state:
+        strategies = list(_PRESENT_STATE_STRATEGIES)
+    else:
+        strategies = [
+            ("broad", None),
+            ("focused", None),
+            ("serendipitous", None),
+        ]
+    # Higher temp on regenerate; also bump for present-state since three
+    # strategies share the same (empty) grounding and need sampling noise.
+    base_temp = 1.0 if (rejected_candidates or is_present_state) else 0.7
+    # Optional side-index hit — surface as an extra card right away, not generated.
+    side_index_candidate: Candidate | None = None
+    if not single_shot and not is_present_state:
+        try:
+            hit = pick_index.lookup(
+                query=state["raw_query"],
+                user_id=state["user_id"],
+                threshold=0.85,
+            )
+        except Exception as exc:
+            print(f"[planner] pick_index lookup failed: {exc!r}")
+            hit = None
+        if hit:
+            text = (hit.get("picked_text") or "").strip()
+            if text:
+                side_index_candidate = Candidate(
+                    text=text,
+                    strategy="side_index",
+                    grounded_buckets=[],
+                )
+    # Pre-announce each candidate slot so the UI can draw empty cards immediately.
+    cards: list[dict] = []
+    if side_index_candidate:
+        cards.append({"strategy": "side_index", "grounded_buckets": []})
+    for strategy_name, _affect_override in strategies:
+        if is_present_state:
+            card_buckets: list[str] = []
+        else:
+            strategy_chunks = _pick_strategy_chunks(list(chunks), strategy_name)
+            card_buckets = [c.get("bucket", "") for c in strategy_chunks]
+        cards.append(
+            {
+                "strategy": strategy_name,
+                "grounded_buckets": card_buckets,
+            }
+        )
+    for idx, card in enumerate(cards):
+        yield {
+            "type": "candidate_start",
+            "idx": idx,
+            "strategy": card["strategy"],
+            "grounded_buckets": card["grounded_buckets"],
+        }
+    if side_index_candidate is not None:
+        yield {
+            "type": "candidate_done",
+            "idx": 0,
+            "text": side_index_candidate["text"],
+        }
+    # Spawn a worker thread per strategy. Each one streams tokens into a shared
+    # queue; the generator forwards them as SSE events.
+    llm_cards_offset = 1 if side_index_candidate else 0
+    evt_queue: queue.Queue[dict | object] = queue.Queue()
+    completed: list[Candidate | None] = [None] * len(strategies)
+    completed_lock = threading.Lock()
+    def _worker(slot: int, strategy: str, affect_override: str | None) -> None:
+        if is_present_state:
+            strategy_chunks = []  # present-state has no memory grounding
+        else:
+            strategy_chunks = _pick_strategy_chunks(list(chunks), strategy)
+        effective_affect = affect_override if affect_override is not None else affect
+        messages = _build_messages(
+            profile,
+            strategy_chunks,
+            history,
+            state["raw_query"],
+            style,
+            gen_cfg,
+            gesture_tag=gesture_tag,
+            air_written_text=air_written_text,
+            rejected_response=rejected_response,
+            rejected_candidates=rejected_candidates,
+            intent_kind=intent_kind,
+            affect=effective_affect,
+        )
+        buf: list[str] = []
+        try:
+            for piece in chat_complete_stream(
+                messages=messages,
+                max_tokens=max_tokens,
+                temperature=base_temp,
+                tier=tier,
+            ):
+                buf.append(piece)
+                evt_queue.put(
+                    {
+                        "type": "token",
+                        "idx": llm_cards_offset + slot,
+                        "delta": piece,
+                    }
+                )
+        except Exception as exc:
+            evt_queue.put(
+                {
+                    "type": "candidate_error",
+                    "idx": llm_cards_offset + slot,
+                    "error": repr(exc),
+                }
+            )
+            with completed_lock:
+                completed[slot] = None
+            evt_queue.put(_STREAM_SENTINEL)
+            return
+        final = finalize_streamed("".join(buf))
+        guard = check_output(final, strategy_chunks)
+        if not guard["passed"]:
+            final = guard["fallback"]
+        cand = Candidate(
+            text=final,
+            strategy=strategy,
+            grounded_buckets=[c.get("bucket", "") for c in strategy_chunks],
+        )
+        with completed_lock:
+            completed[slot] = cand
+        evt_queue.put(
+            {
+                "type": "candidate_done",
+                "idx": llm_cards_offset + slot,
+                "text": final,
+            }
+        )
+        evt_queue.put(_STREAM_SENTINEL)
+    threads = [
+        threading.Thread(target=_worker, args=(i, s, a), daemon=True)
+        for i, (s, a) in enumerate(strategies)
+    ]
+    for t in threads:
+        t.start()
+    remaining = len(threads)
+    while remaining > 0:
+        evt = evt_queue.get()
+        if evt is _STREAM_SENTINEL:
+            remaining -= 1
+            continue
+        yield evt  # type: ignore[misc]
+    for t in threads:
+        t.join()
+    with completed_lock:
+        llm_cands = [c for c in completed if c is not None]
+    all_cands: list[Candidate] = []
+    if side_index_candidate is not None:
+        all_cands.append(side_index_candidate)
+    all_cands.extend(llm_cands)
+    # De-dupe against rejected + each other.
+    seen: set[str] = {r.strip().lower() for r in rejected_candidates if r}
+    uniq: list[Candidate] = []
+    for c in all_cands:
+        key = c["text"].strip().lower()
+        if key and key not in seen:
+            seen.add(key)
+            uniq.append(c)
+    if not uniq:
+        # Every candidate was a dup-of-rejected or guardrail-rejected. Surface
+        # a non-empty placeholder so the UI isn't showing a blank bubble and
+        # the user knows they can regenerate. Logged so we notice if this ever
+        # fires in practice — it means something upstream collapsed.
+        print(
+            f"[planner] empty-candidate fallback fired "
+            f"user={state.get('user_id')!r} turn_id={state.get('turn_id')} "
+            f"raw_query={state.get('raw_query', '')[:80]!r}"
+        )
+        uniq = all_cands[:1] or [
+            Candidate(
+                text="I'm not sure how to answer that — try asking in a different way.",
+                strategy="empty",
+                grounded_buckets=[],
+            )
+        ]
+    selected = uniq[0]["text"]
+    t_gen = time.perf_counter() - t0
+    latency_log = dict(state.get("latency_log") or {})
+    latency_log["t_generation"] = round(t_gen, 4)
+    latency_log["t_total"] = round(
+        latency_log.get("t_sensing", 0)
+        + latency_log.get("t_intent", 0)
+        + latency_log.get("t_retrieval", 0)
+        + t_gen,
+        4,
     )
+    yield {
+        "type": "complete",
+        "planner_update": {
+            "augmented_prompt": None,  # skipping for streaming — not worth rebuilding
+            "candidates": uniq,
+            "selected_response": selected,
+            "llm_tier_used": tier,
+            "llm_model_used": active_model(tier),
+            "latency_log": latency_log,
+            "guardrail_passed": True,
+        },
+    }
+def _pick_strategy_chunks(all_chunks: list[dict], strategy: str) -> list[dict]:
+    """Select which chunks become the *primary* grounding for a candidate.
+    Non-personal chunks (contextual, open_domain) always pass through —
+    they're small and query-grounded, not memory variation.
+    """
+    personal = [c for c in all_chunks if c.get("source", "personal") == "personal"]
+    others = [c for c in all_chunks if c.get("source", "personal") != "personal"]
+    if not personal:
+        return all_chunks
+    if strategy == "broad":
+        chosen = personal
+    elif strategy == "focused":
+        chosen = personal[:1]
+    elif strategy == "serendipitous":
+        if len(personal) >= 2:
+            pool = personal[1:]
+            k = min(len(pool), max(1, len(personal) - 1))
+            chosen = random.sample(pool, k)
+        else:
+            chosen = personal
+    else:
+        chosen = personal
+    return chosen + others
+def _run(state: PipelineState, tier: str) -> dict:
+    t0 = time.perf_counter()
+    profile = state["persona_profile"]
+    affect = (state.get("affect") or {}).get("emotion", "NEUTRAL")
+    gen_cfg = state.get("generation_config") or {}
+    chunks = state.get("retrieved_chunks") or []
+    history = (state.get("session_history") or [])[-20:]
+    style: StyleDirective = gen_cfg["style"]
+    gesture_tag = state.get("gesture_tag")
+    air_written_text = state.get("air_written_text")
+    turnaround_triggered = state.get("turnaround_triggered", False)
+    rejected_response: str | None = None
+    if turnaround_triggered:
+        rejected_response = state.get("selected_response")
+    rejected_candidates: list[str] = list(state.get("rejected_candidates") or [])
+    intent_kind = classify_intent_kind(state.get("intent_route"))
+    max_tokens = gen_cfg.get("max_tokens", settings.max_tokens_neutral)
+    # Turnaround rephrases are single-shot; everything else fans out. Present-
+    # state varies affect (good/fine/rough), memory questions vary chunks
+    # (broad/focused/serendipitous).
+    single_shot = turnaround_triggered
+    is_present_state = intent_kind == "present_state"
+    if single_shot:
+        strategies_cfg: list[tuple[str, str | None]] = [("focused", None)]
+    elif is_present_state:
+        strategies_cfg = list(_PRESENT_STATE_STRATEGIES)
+    else:
+        strategies_cfg = [
+            ("broad", None),
+            ("focused", None),
+            ("serendipitous", None),
+        ]
+    base_temp = 1.0 if (rejected_candidates or is_present_state) else 0.7
+    def _gen_one(cfg: tuple[str, str | None]) -> Candidate:
+        strategy, affect_override = cfg
+        if is_present_state:
+            strategy_chunks: list[dict] = []
+        else:
+            strategy_chunks = _pick_strategy_chunks(list(chunks), strategy)
+        effective_affect = affect_override if affect_override is not None else affect
+        messages = _build_messages(
+            profile,
+            strategy_chunks,
+            history,
+            state["raw_query"],
+            style,
+            gen_cfg,
+            gesture_tag=gesture_tag,
+            air_written_text=air_written_text,
+            rejected_response=rejected_response,
+            rejected_candidates=rejected_candidates,
+            intent_kind=intent_kind,
+            affect=effective_affect,
+        )
+        text = chat_complete(
+            messages=messages,
+            max_tokens=max_tokens,
+            temperature=base_temp,
+            tier=tier,
+        )
+        guard = check_output(text, strategy_chunks)
+        if not guard["passed"]:
+            text = guard["fallback"]
+        return Candidate(
+            text=text,
+            strategy=strategy,
+            grounded_buckets=[c.get("bucket", "") for c in strategy_chunks],
+        )
+    if len(strategies_cfg) == 1:
+        candidates = [_gen_one(strategies_cfg[0])]
+    else:
+        with concurrent.futures.ThreadPoolExecutor(
+            max_workers=len(strategies_cfg)
+        ) as pool:
+            candidates = list(pool.map(_gen_one, strategies_cfg))
+    # Side-index hit: if the user has picked a similar query before, surface the
+    # previously-picked text as an extra candidate. Not generated by the LLM;
+    # skipped on single-shot (turnaround/present-state) so rephrases always
+    # produce fresh text.
+    if not single_shot and not is_present_state:
+        try:
+            hit = pick_index.lookup(
+                query=state["raw_query"],
+                user_id=state["user_id"],
+                threshold=0.85,
+            )
+        except Exception as exc:
+            print(f"[planner] pick_index lookup failed: {exc!r}")
+            hit = None
+        if hit:
+            text = hit.get("picked_text", "").strip()
+            if text and text.lower() not in {
+                c["text"].strip().lower() for c in candidates
+            }:
+                candidates.insert(
+                    0,
+                    Candidate(
+                        text=text,
+                        strategy="side_index",
+                        grounded_buckets=[],
+                    ),
+                )
+    # De-dupe by normalised text — if two strategies produced the same response,
+    # keep the first. Also exclude anything the user already rejected this turn.
+    # Don't retry; latency budget matters more than N=3 on the dot.
+    seen: set[str] = {r.strip().lower() for r in rejected_candidates if r}
+    uniq: list[Candidate] = []
+    for c in candidates:
+        key = c["text"].strip().lower()
+        if key and key not in seen:
+            seen.add(key)
+            uniq.append(c)
+    if not uniq:
+        uniq = candidates[:1]  # every guardrail rejected — fall back to the first
+    selected = uniq[0]["text"]
     t_gen = time.perf_counter() - t0
     latency_log = dict(state.get("latency_log") or {})
         4,
     )
+    # Represent the default-candidate prompt in augmented_prompt for logging.
+    default_strategy_chunks = _pick_strategy_chunks(list(chunks), uniq[0]["strategy"])
+    default_messages = _build_messages(
+        profile,
+        default_strategy_chunks,
+        history,
+        state["raw_query"],
+        style,
+        gen_cfg,
+        gesture_tag=gesture_tag,
+        air_written_text=air_written_text,
+        rejected_response=rejected_response,
+        intent_kind=intent_kind,
+        affect=affect,
+    )
+    augmented_prompt = "\n\n".join(
+        f"[{m['role']}] {m['content']}" for m in default_messages
+    )
     return {
         "augmented_prompt": augmented_prompt,
+        "candidates": uniq,
         "selected_response": selected,
         "llm_tier_used": tier,
         "llm_model_used": active_model(tier),
         "latency_log": latency_log,
+        "guardrail_passed": True,
     }
     gesture_tag: str | None = None,
     air_written_text: str | None = None,
     rejected_response: str | None = None,
+    rejected_candidates: list[str] | None = None,
     intent_kind: str = "memory",
     affect: str = "NEUTRAL",
 ) -> list[dict]:
         air_written_text,
         profile["name"],
         rejected_response=rejected_response,
+        rejected_candidates=rejected_candidates,
         intent_kind=intent_kind,
         affect=affect,
     )
     persona_name: str,
     *,
     rejected_response: str | None = None,
+    rejected_candidates: list[str] | None = None,
     intent_kind: str = "memory",
     affect: str = "NEUTRAL",
 ) -> str:
     personal_chunks = [c for c in chunks if c.get("source", "personal") == "personal"]
     contextual_chunks = [c for c in chunks if c.get("source") == "contextual"]
     open_domain_chunks = [c for c in chunks if c.get("source") == "open_domain"]
+    prior_pick_chunks = [c for c in chunks if c.get("source") == "prior_pick"]
     memory_block = (
         "\n".join(
         )
         or "  (no memories retrieved)"
     )
+    prior_pick_block = (
+        "\n".join(f"  {c['text']}" for c in prior_pick_chunks)
+        if prior_pick_chunks
+        else ""
+    )
     contextual_block = (
         "\n".join(f"  {c['text']}" for c in contextual_chunks)
         or "  (nothing relevant from this session)"
             f"\nYour previous reply (which you need to replace, not repeat): "
             f'"{safe_rejected}"'
         )
+    if rejected_candidates:
+        safe_list = [
+            r.replace('"', "'").replace("\n", " ")[:300] for r in rejected_candidates
+        ][:10]
+        rejected_block = "\n".join(f'  - "{r}"' for r in safe_list)
+        turnaround_line += (
+            f"\nThe user rejected these options you gave last time "
+            f"(do NOT re-use their wording or angle):\n{rejected_block}"
+        )
     if intent_kind == "present_state":
         affect_hint = _AFFECT_HINTS.get(affect, _AFFECT_HINTS["NEUTRAL"])
 - If the affect read is NEUTRAL or doesn't match what you'd say, it's better to say "I'm not sure" or "honestly, I don't really know right now" than to invent.
 - Do NOT use autobiographical facts (job, family, hobbies) unless the partner asked."""
+    prior_pick_section = (
+        f"\n\nWhen asked this kind of thing before, you answered like:\n{prior_pick_block}\n"
+        "Treat this as your own prior voice — re-use the phrasing if it still fits, "
+        "or stay in the same register if you'd answer slightly differently now."
+        if prior_pick_block
+        else ""
+    )
     return f"""\
+{directive_block}{air_writing_block}{turnaround_line}{persona_instruction_line}{prior_pick_section}
 Personal memories:
 {memory_block}

backend/pipeline/nodes/retrieval.py CHANGED Viewed

@@ -8,10 +8,17 @@ import torch
 from backend.config.settings import settings
 from backend.pipeline.intent_kind import is_present_state_only
 from backend.pipeline.state import PipelineState, RetrievedChunk, SubIntent
 from backend.retrieval.contextual import retrieve_from_history
 from backend.retrieval.reranker import build_context_vector, mmr_rerank
 from backend.retrieval.vector_store import get_device, get_embedder, retrieve
 _OPEN_DOMAIN_STUB_TEXT = (
     "(no external knowledge source wired — answer from general knowledge)"
 )
@@ -22,9 +29,14 @@ def run_fast(state: PipelineState) -> dict:
     t0 = time.perf_counter()
     if is_present_state_only(state.get("intent_route")):
         return _build_return(state, [], "skipped_present_state", t0, 0.0)
     final_k = settings.retrieval_fast_k
     pool_k = settings.rerank_fast_pool_k
     chunks, t_rerank = _dispatch_all(state, pool_k=pool_k, final_k=final_k)
     return _build_return(state, chunks, "fast", t0, t_rerank)
@@ -33,12 +45,83 @@ def run_full(state: PipelineState) -> dict:
     t0 = time.perf_counter()
     if is_present_state_only(state.get("intent_route")):
         return _build_return(state, [], "skipped_present_state", t0, 0.0)
     final_k = settings.retrieval_rerank_k
     pool_k = settings.rerank_pool_k
     chunks, t_rerank = _dispatch_all(state, pool_k=pool_k, final_k=final_k)
     return _build_return(state, chunks, "full", t0, t_rerank)
 def _dispatch_all(
     state: PipelineState, pool_k: int, final_k: int
 ) -> tuple[list[RetrievedChunk], float]:

 from backend.config.settings import settings
 from backend.pipeline.intent_kind import is_present_state_only
 from backend.pipeline.state import PipelineState, RetrievedChunk, SubIntent
+from backend.retrieval import pick_index
 from backend.retrieval.contextual import retrieve_from_history
+from backend.retrieval.priors import BUCKETS
 from backend.retrieval.reranker import build_context_vector, mmr_rerank
 from backend.retrieval.vector_store import get_device, get_embedder, retrieve
+# Weight of the pick-history bucket prior, relative to the session bucket prior.
+# 0.3 means: a user who always picks "family" over "medical" gets a noticeable
+# but not overwhelming nudge — session-in-progress signals still dominate.
+_PICK_PRIOR_WEIGHT = 0.3
 _OPEN_DOMAIN_STUB_TEXT = (
     "(no external knowledge source wired — answer from general knowledge)"
 )
     t0 = time.perf_counter()
     if is_present_state_only(state.get("intent_route")):
         return _build_return(state, [], "skipped_present_state", t0, 0.0)
+    session_priors = state.get("bucket_priors")
+    _blend_pick_history_into_priors(state)
     final_k = settings.retrieval_fast_k
     pool_k = settings.rerank_fast_pool_k
     chunks, t_rerank = _dispatch_all(state, pool_k=pool_k, final_k=final_k)
+    if session_priors is not None:
+        state["bucket_priors"] = session_priors  # blend was transient
+    chunks = _prepend_prior_pick(state, chunks)
     return _build_return(state, chunks, "fast", t0, t_rerank)
     t0 = time.perf_counter()
     if is_present_state_only(state.get("intent_route")):
         return _build_return(state, [], "skipped_present_state", t0, 0.0)
+    session_priors = state.get("bucket_priors")
+    _blend_pick_history_into_priors(state)
     final_k = settings.retrieval_rerank_k
     pool_k = settings.rerank_pool_k
     chunks, t_rerank = _dispatch_all(state, pool_k=pool_k, final_k=final_k)
+    if session_priors is not None:
+        state["bucket_priors"] = session_priors  # blend was transient
+    chunks = _prepend_prior_pick(state, chunks)
     return _build_return(state, chunks, "full", t0, t_rerank)
+def _blend_pick_history_into_priors(state: PipelineState) -> None:
+    """Mix cumulative bucket-pick counts into this turn's bucket_priors.
+    Mutates state in-place. Session priors still dominate; pick history adds
+    a small, steady bias toward buckets the user has historically picked.
+    """
+    try:
+        counts = pick_index.bucket_pick_counts(state["user_id"])
+    except Exception as exc:
+        print(f"[retrieval] pick_index.bucket_pick_counts failed: {exc!r}")
+        return
+    if not counts:
+        return
+    total = sum(counts.values())
+    if total <= 0:
+        return
+    pick_dist = {b: counts.get(b, 0.0) / total for b in BUCKETS}
+    session_priors = state.get("bucket_priors") or {}
+    if not session_priors:
+        session_priors = {b: 1.0 / len(BUCKETS) for b in BUCKETS}
+    blended = {
+        b: (1 - _PICK_PRIOR_WEIGHT) * session_priors.get(b, 0.0)
+        + _PICK_PRIOR_WEIGHT * pick_dist[b]
+        for b in BUCKETS
+    }
+    s = sum(blended.values())
+    if s > 0:
+        blended = {b: v / s for b, v in blended.items()}
+    state["bucket_priors"] = blended
+def _prepend_prior_pick(
+    state: PipelineState, chunks: list[RetrievedChunk]
+) -> list[RetrievedChunk]:
+    """On a side-index hit, surface the previously-picked text as a special
+    chunk the LLM sees in its grounding block. Not deduped against personal
+    chunks — the prior pick is phrased in the persona's voice and is useful
+    even when similar memories are present.
+    """
+    try:
+        hit = pick_index.lookup(
+            query=state["raw_query"], user_id=state["user_id"], threshold=0.85
+        )
+    except Exception as exc:
+        print(f"[retrieval] pick_index.lookup failed: {exc!r}")
+        return chunks
+    if not hit:
+        return chunks
+    text = (hit.get("picked_text") or "").strip()
+    if not text:
+        return chunks
+    # Avoid injecting an identical chunk twice (the side_index-strategy
+    # candidate in the planner handles that path separately).
+    if any(c.get("text") == text for c in chunks):
+        return chunks
+    prior = RetrievedChunk(
+        text=text,
+        bucket="prior_pick",
+        type="narrative",
+        user="",
+        score=float(hit.get("match_score", 0.0)),
+        source="prior_pick",
+    )
+    return [prior] + list(chunks)
 def _dispatch_all(
     state: PipelineState, pool_k: int, final_k: int
 ) -> tuple[list[RetrievedChunk], float]:

backend/pipeline/state.py CHANGED Viewed

@@ -75,6 +75,13 @@ class LatencyLog(TypedDict):
 # ── Main pipeline state ────────────────────────────────────────────────────────
 class PipelineState(TypedDict):
     # ── Session context (set at turn start, stable across nodes) ──────────────
     user_id: str
@@ -103,7 +110,8 @@ class PipelineState(TypedDict):
     # ── L4: Generation outputs ────────────────────────────────────────────────
     augmented_prompt: str | None
-    candidates: list[str]  # 2-3 candidate responses
     selected_response: str | None
     llm_tier_used: str  # "primary" | "fallback"
     llm_model_used: str  # actual model name (e.g. "gemma4:31b-cloud")

 # ── Main pipeline state ────────────────────────────────────────────────────────
+class Candidate(TypedDict):
+    text: str
+    strategy: str  # "broad" | "focused" | "serendipitous" | "side_index"
+    # chunks fed as the primary grounding for this candidate (bucket/type only)
+    grounded_buckets: list[str]
 class PipelineState(TypedDict):
     # ── Session context (set at turn start, stable across nodes) ──────────────
     user_id: str
     # ── L4: Generation outputs ────────────────────────────────────────────────
     augmented_prompt: str | None
+    candidates: list[Candidate]  # 2-3 candidate responses w/ strategy metadata
+    rejected_candidates: list[str]  # texts the user already dismissed this turn
     selected_response: str | None
     llm_tier_used: str  # "primary" | "fallback"
     llm_model_used: str  # actual model name (e.g. "gemma4:31b-cloud")

backend/retrieval/pick_index.py ADDED Viewed

	@@ -0,0 +1,124 @@

+import json
+import threading
+import time
+from pathlib import Path
+import torch
+from backend.config.settings import settings
+from backend.retrieval.vector_store import get_device, get_embedder
+def _store_path(user_id: str) -> Path:
+    return settings.data_dir / "pick_index" / user_id
+# Guards the module-level cache. `add()` and `lookup()` can be called
+# concurrently from an SSE handler and the /chat/pick POST — without this,
+# a concurrent add mid-lookup could see a partially-built tensor.
+_cache_lock = threading.RLock()
+_cache: dict[str, tuple[torch.Tensor, list[dict]]] = {}
+def _load(user_id: str) -> tuple[torch.Tensor, list[dict]]:
+    with _cache_lock:
+        if user_id in _cache:
+            return _cache[user_id]
+        p = _store_path(user_id)
+        if not (p / "vectors.pt").exists():
+            empty = torch.empty((0, 0), device=get_device())
+            _cache[user_id] = (empty, [])
+            return _cache[user_id]
+        vecs = torch.load(
+            p / "vectors.pt", map_location=get_device(), weights_only=True
+        )
+        with open(p / "entries.json") as f:
+            entries = json.load(f)
+        _cache[user_id] = (vecs, entries)
+        return _cache[user_id]
+def _persist(user_id: str, vecs: torch.Tensor, entries: list[dict]) -> None:
+    p = _store_path(user_id)
+    p.mkdir(parents=True, exist_ok=True)
+    torch.save(vecs.detach().cpu(), p / "vectors.pt")
+    with open(p / "entries.json", "w") as f:
+        json.dump(entries, f, indent=2)
+def lookup(query: str, user_id: str, threshold: float = 0.85) -> dict | None:
+    # Snapshot the (vecs, entries) tuple under the lock so a concurrent add()
+    # can't swap it out mid-search. Read-only work on the snapshot is safe.
+    with _cache_lock:
+        vecs, entries = _load(user_id)
+    if vecs.numel() == 0 or not entries:
+        return None
+    embedder = get_embedder()
+    q = embedder.encode(
+        [query],
+        convert_to_tensor=True,
+        normalize_embeddings=True,
+        device=get_device(),
+    )[0]
+    scores = vecs @ q
+    top_score, top_idx = torch.max(scores, dim=0)
+    score = float(top_score)
+    if score < threshold:
+        return None
+    hit = dict(entries[int(top_idx)])
+    hit["match_score"] = score
+    return hit
+def add(
+    query: str,
+    user_id: str,
+    strategy: str,
+    picked_text: str,
+    picked_buckets: list[str] | None = None,
+) -> None:
+    embedder = get_embedder()
+    q = embedder.encode(
+        [query],
+        convert_to_tensor=True,
+        normalize_embeddings=True,
+        device=get_device(),
+    )  # (1, D)
+    # The whole read-modify-write is locked so two concurrent adds can't
+    # both read the same `vecs`, each concat their own vector, and clobber
+    # each other on writeback.
+    with _cache_lock:
+        vecs, entries = _load(user_id)
+        new_vecs = q if vecs.numel() == 0 else torch.cat([vecs, q], dim=0)
+        new_entries = list(entries) + [
+            {
+                "query": query,
+                "strategy": strategy,
+                "picked_text": picked_text,
+                "picked_buckets": picked_buckets or [],
+                "ts": time.time(),
+            }
+        ]
+        _cache[user_id] = (new_vecs, new_entries)
+        _persist(user_id, new_vecs, new_entries)
+def bucket_pick_counts(user_id: str) -> dict[str, float]:
+    """Cumulative pick counts per bucket for this user.
+    Each pick contributes 1.0 mass split evenly across the buckets grounding
+    the picked candidate. Used by retrieval to bias bucket priors toward
+    memories the user has historically preferred.
+    """
+    with _cache_lock:
+        _, entries = _load(user_id)
+        entries_snapshot = list(entries)
+    counts: dict[str, float] = {}
+    for e in entries_snapshot:
+        buckets = [b for b in (e.get("picked_buckets") or []) if b]
+        if not buckets:
+            continue
+        share = 1.0 / len(buckets)
+        for b in buckets:
+            counts[b] = counts.get(b, 0.0) + share
+    return counts

frontend/src/App.css CHANGED Viewed

@@ -410,6 +410,110 @@ input[type="text"]:hover {
   color: #ffffff;
 }
 .turnaround-btn {
   background: transparent !important;
   color: var(--accent) !important;

   color: #ffffff;
 }
+.badge-picker {
+  background: rgba(0, 0, 0, 0.08);
+  color: var(--text);
+}
+.badge-picked {
+  background: rgba(46, 160, 67, 0.18);
+  color: #2ea043;
+}
+.chat-bubble.picker {
+  background: transparent;
+  padding: 4px 0 0 0;
+  max-width: 85%;
+}
+.candidate-list {
+  display: flex;
+  flex-direction: column;
+  gap: 6px;
+  margin-top: 6px;
+}
+.candidate-card {
+  text-align: left;
+  background: var(--surface);
+  color: var(--text);
+  border: 1px solid rgba(0, 0, 0, 0.12);
+  border-radius: 10px;
+  padding: 10px 12px;
+  cursor: pointer;
+  transition: border-color 120ms ease, background 120ms ease, transform 80ms ease;
+}
+.candidate-card:hover {
+  border-color: var(--accent);
+  background: rgba(59, 130, 246, 0.05);
+}
+.candidate-card:active {
+  transform: scale(0.995);
+}
+.candidate-strategy {
+  font-size: 11px;
+  color: rgba(0, 0, 0, 0.55);
+  margin-bottom: 4px;
+  text-transform: lowercase;
+  letter-spacing: 0.02em;
+}
+.candidate-text {
+  font-size: 14px;
+  line-height: 1.4;
+}
+.candidate-list.rejected-round {
+  opacity: 0.55;
+  margin-bottom: 4px;
+}
+.rejected-round-label {
+  font-size: 10px;
+  color: rgba(0, 0, 0, 0.45);
+  text-transform: uppercase;
+  letter-spacing: 0.05em;
+  margin-bottom: 2px;
+}
+.candidate-card.rejected {
+  cursor: default;
+  background: rgba(0, 0, 0, 0.03);
+  border-color: rgba(0, 0, 0, 0.08);
+}
+.candidate-card.rejected .candidate-text {
+  text-decoration: line-through;
+  color: rgba(0, 0, 0, 0.55);
+}
+.candidate-card.rejected:hover {
+  border-color: rgba(0, 0, 0, 0.08);
+  background: rgba(0, 0, 0, 0.03);
+}
+.candidate-card.try-again {
+  border-style: dashed;
+  border-color: var(--accent);
+  background: rgba(59, 130, 246, 0.03);
+}
+.candidate-card.try-again .candidate-strategy {
+  color: var(--accent);
+}
+.candidate-card.try-again:hover:not(:disabled) {
+  background: rgba(59, 130, 246, 0.09);
+}
+.candidate-card:disabled {
+  opacity: 0.5;
+  cursor: wait;
+}
 .turnaround-btn {
   background: transparent !important;
   color: var(--accent) !important;

frontend/src/components/ChatPanel.tsx CHANGED Viewed

@@ -1,8 +1,30 @@
 import { useState, useRef, useEffect, useCallback } from "react";
-import type { ChatMessage, SensingState, Affect, LatencyLog } from "../types";
-import { sendChat, sendTurnaround } from "../lib/api";
 import { EvalPanel } from "./EvalPanel";
 interface Props {
   userId: string | null;
   personaName: string;
@@ -18,6 +40,83 @@ interface Props {
 const TURNAROUND_WINDOW_MS = 5000;
 export function ChatPanel({
   userId,
   personaName,
@@ -33,6 +132,8 @@ export function ChatPanel({
   const [input, setInput] = useState("");
   const [loading, setLoading] = useState(false);
   const [turnaroundLoading, setTurnaroundLoading] = useState(false);
   const bottomRef = useRef<HTMLDivElement>(null);
   const lastResponseTsRef = useRef<number>(0);
   const lastTurnIdRef = useRef<number | null>(null);
@@ -76,7 +177,7 @@ export function ChatPanel({
           const next = [...prev];
           for (let i = next.length - 1; i >= 0; i--) {
             if (next[i].role === "aac_user" && !next[i].isTurnaround) {
-              next[i] = { ...next[i], rephrased: true };
               break;
             }
           }
@@ -89,6 +190,8 @@ export function ChatPanel({
             turnId: res.turn_id,
             evalScores: res.eval_scores ?? null,
             isTurnaround: true,
           });
           return next;
         });
@@ -123,6 +226,120 @@ export function ChatPanel({
     ]
   );
   useEffect(() => {
     if (
       sensing.headSignal !== "HEAD_NOD_DISSATISFIED" &&
@@ -130,6 +347,23 @@ export function ChatPanel({
     ) {
       return;
     }
     const targetTurnId = lastTurnIdRef.current;
     const eligible =
       targetTurnId !== null &&
@@ -145,63 +379,156 @@ export function ChatPanel({
     // detection fired, then clear it. (Instant clear made detection invisible.)
     const id = window.setTimeout(() => onHeadSignalConsumed(), 1500);
     return () => window.clearTimeout(id);
-  }, [sensing.headSignal, handleTurnaround, onHeadSignalConsumed]);
   async function handleSend() {
     if (!input.trim() || !userId || !backendReady || loading) return;
     const query = input.trim();
     setInput("");
-    setMessages((prev) => [...prev, { role: "partner", content: query }]);
     setLoading(true);
     const airText = sensing.airWrittenText || null;
-    try {
-      const res = await sendChat({
-        user_id: userId,
-        query,
-        affect_override: affectOverride ?? sensing.affect,
-        gesture_tag: sensing.gestureTag,
-        gaze_bucket: sensing.gazeBucket,
-        air_written_text: airText,
-        head_signal: sensing.headSignal,
-      });
-      lastTurnIdRef.current = res.turn_id;
-      setMessages((prev) => [
         ...prev,
         {
-          role: "aac_user",
-          content: res.response,
-          latency: res.latency,
-          affect: res.affect,
-          runId: res.run_id,
-          turnId: res.turn_id,
-          evalScores: res.eval_scores ?? null,
         },
-      ]);
-      onLatency(res.latency);
-      lastResponseTsRef.current = performance.now();
-    } catch (e) {
-      setMessages((prev) => [
-        ...prev,
         {
-          role: "aac_user",
-          content: `Error: ${e instanceof Error ? e.message : "request failed"}`,
         },
-      ]);
     } finally {
       if (airText) onAirTextConsumed();
       setLoading(false);
     }
   }
-  const canTurnaround =
-    !!userId &&
-    backendReady &&
-    !loading &&
-    !turnaroundLoading &&
-    lastTurnIdRef.current !== null;
   return (
     <div className="chat-panel">
@@ -209,39 +536,107 @@ export function ChatPanel({
         Talking as: {personaName || "select a persona"}
       </div>
       <div className="chat-messages">
-        {messages.map((msg, i) => (
-          <div
-            key={i}
-            className={`chat-bubble ${msg.role}${
-              msg.rephrased ? " rephrased" : ""
-            }${msg.isTurnaround ? " turnaround" : ""}`}
-          >
-            <span className="chat-role">
-              {msg.role === "partner" ? "Partner" : "AAC User"}
-              {msg.rephrased && (
-                <span className="badge badge-rephrased"> rephrased</span>
-              )}
-              {msg.isTurnaround && (
-                <span className="badge badge-turnaround"> ↻ turnaround</span>
               )}
-            </span>
-            <p>{msg.content}</p>
-            {msg.role === "aac_user" && msg.runId && userId && (
-              <EvalPanel
-                runId={msg.runId}
-                userId={userId}
-                latencyTotal={msg.latency?.t_total ?? 0}
-                evalScores={msg.evalScores ?? null}
-              />
-            )}
-          </div>
-        ))}
-        {loading && (
-          <div className="chat-bubble aac_user loading">
-            <span className="chat-role">AAC User</span>
-            <p>Generating...</p>
-          </div>
-        )}
         {turnaroundLoading && (
           <div className="chat-bubble aac_user loading">
             <span className="chat-role">AAC User</span>
@@ -262,15 +657,6 @@ export function ChatPanel({
         <button onClick={handleSend} disabled={!userId || loading || !backendReady || !input.trim()}>
           Send
         </button>
-        <button
-          type="button"
-          className="turnaround-btn"
-          onClick={() => handleTurnaround("manual")}
-          disabled={!canTurnaround}
-          title="Re-plan the last response (also triggered by a head shake / sharp nod)"
-        >
-          ↻ Not quite right
-        </button>
       </div>
     </div>
   );

 import { useState, useRef, useEffect, useCallback } from "react";
+import type {
+  Affect,
+  Candidate,
+  ChatMessage,
+  LatencyLog,
+  SensingState,
+} from "../types";
+import {
+  sendPick,
+  sendTurnaround,
+  streamChat,
+  streamRegenerate,
+} from "../lib/api";
 import { EvalPanel } from "./EvalPanel";
+const STRATEGY_LABELS: Record<string, string> = {
+  broad: "broad — all memories",
+  focused: "focused — top memory",
+  serendipitous: "serendipitous — other memory",
+  side_index: "like last time",
+  present_good: "feeling good",
+  present_fine: "doing okay",
+  present_rough: "not great",
+  pending: "",
+};
 interface Props {
   userId: string | null;
   personaName: string;
 const TURNAROUND_WINDOW_MS = 5000;
+// Batches token deltas per (msgIdx, candIdx) and flushes them in a single
+// setState call per animation frame. Streaming tokens at 30-60/s × 3 candidates
+// otherwise causes a rerender per token. Non-token events (start/done/complete)
+// flush the pending deltas first to preserve ordering.
+//
+// INVARIANT: keys are message indices into the messages[] array. Callers must
+// ensure no message is inserted *before* a streaming message for the duration
+// of its stream — appending to the end is fine, mid-list insert is not. Today
+// every path appends to the end; if that changes, switch to a stable message
+// id (e.g. the placeholder's runId or a freshly-minted uuid).
+function useTokenBatcher(
+  setMessages: React.Dispatch<React.SetStateAction<ChatMessage[]>>,
+) {
+  // Lazy-init refs to avoid allocating a fresh Map on every render.
+  const pending = useRef<Map<number, Map<number, string>> | null>(null);
+  if (pending.current === null) pending.current = new Map();
+  const rafId = useRef<number | null>(null);
+  const flush = useCallback(() => {
+    rafId.current = null;
+    const batch = pending.current;
+    if (!batch || batch.size === 0) return;
+    pending.current = new Map();
+    setMessages((prev) =>
+      prev.map((m, i) => {
+        const perCand = batch.get(i);
+        if (!perCand) return m;
+        const cands = [...(m.candidates ?? [])];
+        for (const [ci, delta] of perCand) {
+          if (cands[ci]) {
+            cands[ci] = { ...cands[ci], text: cands[ci].text + delta };
+          }
+        }
+        return { ...m, candidates: cands };
+      }),
+    );
+  }, [setMessages]);
+  const queueToken = useCallback(
+    (msgIdx: number, candIdx: number, delta: string) => {
+      const batch = pending.current!;
+      let perMsg = batch.get(msgIdx);
+      if (!perMsg) {
+        perMsg = new Map();
+        batch.set(msgIdx, perMsg);
+      }
+      perMsg.set(candIdx, (perMsg.get(candIdx) ?? "") + delta);
+      if (rafId.current === null) {
+        rafId.current = window.requestAnimationFrame(flush);
+      }
+    },
+    [flush],
+  );
+  const flushNow = useCallback(() => {
+    if (rafId.current !== null) {
+      window.cancelAnimationFrame(rafId.current);
+      rafId.current = null;
+    }
+    flush();
+  }, [flush]);
+  // Cancel any pending rAF on unmount — otherwise a persona switch mid-stream
+  // leaves a scheduled flush that calls setMessages against the new state.
+  useEffect(() => {
+    return () => {
+      if (rafId.current !== null) {
+        window.cancelAnimationFrame(rafId.current);
+        rafId.current = null;
+      }
+      pending.current = null;
+    };
+  }, []);
+  return { queueToken, flushNow };
+}
 export function ChatPanel({
   userId,
   personaName,
   const [input, setInput] = useState("");
   const [loading, setLoading] = useState(false);
   const [turnaroundLoading, setTurnaroundLoading] = useState(false);
+  const [regenerateLoading, setRegenerateLoading] = useState(false);
+  const { queueToken, flushNow } = useTokenBatcher(setMessages);
   const bottomRef = useRef<HTMLDivElement>(null);
   const lastResponseTsRef = useRef<number>(0);
   const lastTurnIdRef = useRef<number | null>(null);
           const next = [...prev];
           for (let i = next.length - 1; i >= 0; i--) {
             if (next[i].role === "aac_user" && !next[i].isTurnaround) {
+              next[i] = { ...next[i], rephrased: true, picked: true };
               break;
             }
           }
             turnId: res.turn_id,
             evalScores: res.eval_scores ?? null,
             isTurnaround: true,
+            candidates: res.candidates ?? [],
+            picked: true,
           });
           return next;
         });
     ]
   );
+  const handleRegenerate = useCallback(
+    async (msgIdx: number) => {
+      if (!userId || !backendReady || regenerateLoading || loading) return;
+      const msg = messages[msgIdx];
+      if (!msg || !msg.candidates || msg.picked || msg.turnId === undefined) return;
+      const currentRound = msg.candidates;
+      const priorRounds = msg.rejectedRounds ?? [];
+      const rejected_texts = [
+        ...priorRounds.flat().map((c) => c.text),
+        ...currentRound.map((c) => c.text),
+      ];
+      setRegenerateLoading(true);
+      // Move the current round into rejectedRounds + clear candidates so the
+      // UI shows empty-card placeholders while streams fill in.
+      setMessages((prev) =>
+        prev.map((m, i) =>
+          i === msgIdx
+            ? {
+                ...m,
+                candidates: [],
+                rejectedRounds: [...priorRounds, currentRound],
+                picked: false,
+              }
+            : m,
+        ),
+      );
+      const updateMsg = (
+        updater: (m: ChatMessage) => ChatMessage,
+      ) => {
+        setMessages((prev) =>
+          prev.map((m, i) => (i === msgIdx ? updater(m) : m)),
+        );
+      };
+      try {
+        await streamRegenerate(
+          {
+            user_id: userId,
+            turn_id: msg.turnId,
+            rejected_texts,
+          },
+          (evt) => {
+            if (evt.type === "token") {
+              queueToken(msgIdx, evt.idx, evt.delta);
+              return;
+            }
+            flushNow();
+            if (evt.type === "candidate_start") {
+              updateMsg((m) => {
+                const cands = [...(m.candidates ?? [])];
+                while (cands.length <= evt.idx) {
+                  cands.push({
+                    text: "",
+                    strategy: "pending",
+                    grounded_buckets: [],
+                  });
+                }
+                cands[evt.idx] = {
+                  text: "",
+                  strategy: evt.strategy,
+                  grounded_buckets: evt.grounded_buckets,
+                };
+                return { ...m, candidates: cands };
+              });
+            } else if (evt.type === "candidate_done") {
+              updateMsg((m) => {
+                const cands = [...(m.candidates ?? [])];
+                if (cands[evt.idx]) {
+                  cands[evt.idx] = { ...cands[evt.idx], text: evt.text };
+                }
+                return { ...m, candidates: cands };
+              });
+            } else if (evt.type === "complete") {
+              const res = evt.response;
+              lastTurnIdRef.current = res.turn_id;
+              updateMsg((m) => ({
+                ...m,
+                content: res.response,
+                latency: res.latency,
+                affect: res.affect,
+                runId: res.run_id,
+                turnId: res.turn_id,
+                evalScores: res.eval_scores ?? null,
+                candidates: res.candidates ?? m.candidates ?? [],
+                picked: false,
+              }));
+              onLatency(res.latency);
+            }
+          },
+        );
+      } catch (e) {
+        flushNow();
+        console.warn("streamRegenerate failed", e);
+      } finally {
+        setRegenerateLoading(false);
+      }
+    },
+    [
+      userId,
+      backendReady,
+      regenerateLoading,
+      loading,
+      messages,
+      setMessages,
+      queueToken,
+      flushNow,
+      onLatency,
+    ]
+  );
   useEffect(() => {
     if (
       sensing.headSignal !== "HEAD_NOD_DISSATISFIED" &&
     ) {
       return;
     }
+    // If the most recent AAC message has an open picker, head-signal means
+    // "regenerate" — the user hasn't committed, so there's nothing to
+    // "rephrase" yet.
+    let openPickerIdx = -1;
+    for (let i = messages.length - 1; i >= 0; i--) {
+      const m = messages[i];
+      if (m.role !== "aac_user") continue;
+      if (!m.picked && (m.candidates?.length ?? 0) > 1) openPickerIdx = i;
+      break;
+    }
+    if (openPickerIdx !== -1) {
+      handleRegenerate(openPickerIdx);
+      onHeadSignalConsumed();
+      return;
+    }
     const targetTurnId = lastTurnIdRef.current;
     const eligible =
       targetTurnId !== null &&
     // detection fired, then clear it. (Instant clear made detection invisible.)
     const id = window.setTimeout(() => onHeadSignalConsumed(), 1500);
     return () => window.clearTimeout(id);
+  }, [
+    sensing.headSignal,
+    handleTurnaround,
+    handleRegenerate,
+    onHeadSignalConsumed,
+    messages,
+  ]);
   async function handleSend() {
     if (!input.trim() || !userId || !backendReady || loading) return;
     const query = input.trim();
     setInput("");
     setLoading(true);
     const airText = sensing.airWrittenText || null;
+    // Push the partner bubble, and a placeholder AAC message we'll fill in
+    // progressively. We need the placeholder's index to target updates — use
+    // a ref captured from the setter so we don't rely on stale state.
+    let placeholderIdx = -1;
+    setMessages((prev) => {
+      const next = [
         ...prev,
+        { role: "partner" as const, content: query },
         {
+          role: "aac_user" as const,
+          content: "",
+          candidates: [] as Candidate[],
+          picked: false,
         },
+      ];
+      placeholderIdx = next.length - 1;
+      return next;
+    });
+    const updatePlaceholder = (
+      updater: (m: ChatMessage) => ChatMessage,
+    ) => {
+      setMessages((prev) =>
+        prev.map((m, i) => (i === placeholderIdx ? updater(m) : m)),
+      );
+    };
+    try {
+      await streamChat(
         {
+          user_id: userId,
+          query,
+          affect_override: affectOverride ?? sensing.affect,
+          gesture_tag: sensing.gestureTag,
+          gaze_bucket: sensing.gazeBucket,
+          air_written_text: airText,
+          head_signal: sensing.headSignal,
         },
+        (evt) => {
+          if (evt.type === "token") {
+            queueToken(placeholderIdx, evt.idx, evt.delta);
+            return;
+          }
+          // Any non-token event must see the latest text — flush the queue first.
+          flushNow();
+          if (evt.type === "candidate_start") {
+            updatePlaceholder((m) => {
+              const cands = [...(m.candidates ?? [])];
+              while (cands.length <= evt.idx) {
+                cands.push({
+                  text: "",
+                  strategy: "pending",
+                  grounded_buckets: [],
+                });
+              }
+              cands[evt.idx] = {
+                text: "",
+                strategy: evt.strategy,
+                grounded_buckets: evt.grounded_buckets,
+              };
+              return { ...m, candidates: cands };
+            });
+          } else if (evt.type === "candidate_done") {
+            updatePlaceholder((m) => {
+              const cands = [...(m.candidates ?? [])];
+              if (cands[evt.idx]) {
+                cands[evt.idx] = { ...cands[evt.idx], text: evt.text };
+              }
+              return { ...m, candidates: cands };
+            });
+          } else if (evt.type === "complete") {
+            const res = evt.response;
+            lastTurnIdRef.current = res.turn_id;
+            updatePlaceholder((m) => ({
+              ...m,
+              content: res.response,
+              latency: res.latency,
+              affect: res.affect,
+              runId: res.run_id,
+              turnId: res.turn_id,
+              evalScores: res.eval_scores ?? null,
+              candidates: res.candidates ?? m.candidates ?? [],
+              picked: (res.candidates ?? []).length <= 1,
+            }));
+            onLatency(res.latency);
+            lastResponseTsRef.current = performance.now();
+          }
+        },
+      );
+    } catch (e) {
+      flushNow();
+      updatePlaceholder((m) => ({
+        ...m,
+        content: `Error: ${e instanceof Error ? e.message : "request failed"}`,
+      }));
     } finally {
       if (airText) onAirTextConsumed();
       setLoading(false);
     }
   }
+  const handlePick = useCallback(
+    async (msgIdx: number, candIdx: number) => {
+      const msg = messages[msgIdx];
+      if (!msg || !msg.candidates || !msg.runId || !userId) return;
+      if (msg.picked) return;
+      const picked = msg.candidates[candIdx];
+      if (!picked) return;
+      setMessages((prev) =>
+        prev.map((m, i) =>
+          i === msgIdx
+            ? {
+                ...m,
+                content: picked.text,
+                picked: true,
+                pickedIdx: candIdx,
+              }
+            : m
+        )
+      );
+      try {
+        await sendPick({
+          run_id: msg.runId,
+          user_id: userId,
+          picked_idx: candIdx,
+        });
+      } catch (e) {
+        console.warn("sendPick failed", e);
+      }
+    },
+    [messages, setMessages, userId]
+  );
   return (
     <div className="chat-panel">
         Talking as: {personaName || "select a persona"}
       </div>
       <div className="chat-messages">
+        {messages.map((msg, i) => {
+          const hasRegenerated = (msg.rejectedRounds?.length ?? 0) > 0;
+          const showPicker =
+            msg.role === "aac_user" &&
+            !msg.picked &&
+            !!msg.candidates &&
+            (msg.candidates.length > 1 || hasRegenerated);
+          if (showPicker) {
+            const priorRounds = msg.rejectedRounds ?? [];
+            return (
+              <div key={i} className="chat-bubble aac_user picker">
+                <span className="chat-role">
+                  AAC User
+                  <span className="badge badge-picker">
+                    pick one ({msg.candidates!.length} options)
+                  </span>
+                </span>
+                {priorRounds.map((round, ri) => (
+                  <div key={`r${ri}`} className="candidate-list rejected-round">
+                    <div className="rejected-round-label">
+                      rejected round {ri + 1}
+                    </div>
+                    {round.map((cand, ci) => (
+                      <div key={ci} className="candidate-card rejected">
+                        <div className="candidate-strategy">
+                          {STRATEGY_LABELS[cand.strategy] ?? cand.strategy}
+                        </div>
+                        <div className="candidate-text">{cand.text}</div>
+                      </div>
+                    ))}
+                  </div>
+                ))}
+                <div className="candidate-list">
+                  {msg.candidates!.map((cand, ci) => (
+                    <button
+                      key={ci}
+                      type="button"
+                      className="candidate-card"
+                      onClick={() => handlePick(i, ci)}
+                      disabled={regenerateLoading}
+                      title="Click to send this one"
+                    >
+                      <div className="candidate-strategy">
+                        {STRATEGY_LABELS[cand.strategy] ?? cand.strategy}
+                      </div>
+                      <div className="candidate-text">{cand.text}</div>
+                    </button>
+                  ))}
+                  <button
+                    type="button"
+                    className="candidate-card try-again"
+                    onClick={() => handleRegenerate(i)}
+                    disabled={regenerateLoading}
+                    title="None of these fit — generate fresh options"
+                  >
+                    <div className="candidate-strategy">try again</div>
+                    <div className="candidate-text">
+                      {regenerateLoading
+                        ? "Regenerating…"
+                        : "↻ None of these fit — try different angles"}
+                    </div>
+                  </button>
+                </div>
+              </div>
+            );
+          }
+          return (
+            <div
+              key={i}
+              className={`chat-bubble ${msg.role}${
+                msg.rephrased ? " rephrased" : ""
+              }${msg.isTurnaround ? " turnaround" : ""}`}
+            >
+              <span className="chat-role">
+                {msg.role === "partner" ? "Partner" : "AAC User"}
+                {msg.rephrased && (
+                  <span className="badge badge-rephrased"> rephrased</span>
+                )}
+                {msg.isTurnaround && (
+                  <span className="badge badge-turnaround"> ↻ turnaround</span>
+                )}
+                {msg.picked && msg.pickedIdx !== undefined && msg.candidates && msg.candidates[msg.pickedIdx] && (
+                  <span className="badge badge-picked">
+                    ✓ {STRATEGY_LABELS[msg.candidates[msg.pickedIdx].strategy] ?? msg.candidates[msg.pickedIdx].strategy}
+                  </span>
+                )}
+              </span>
+              <p>{msg.content}</p>
+              {msg.role === "aac_user" && msg.runId && userId && (
+                <EvalPanel
+                  runId={msg.runId}
+                  userId={userId}
+                  latencyTotal={msg.latency?.t_total ?? 0}
+                  evalScores={msg.evalScores ?? null}
+                />
               )}
+            </div>
+          );
+        })}
         {turnaroundLoading && (
           <div className="chat-bubble aac_user loading">
             <span className="chat-role">AAC User</span>
         <button onClick={handleSend} disabled={!userId || loading || !backendReady || !input.trim()}>
           Send
         </button>
       </div>
     </div>
   );

frontend/src/lib/api.ts CHANGED Viewed

@@ -44,6 +44,102 @@ export async function resetSession(userId: string): Promise<void> {
   if (!res.ok) throw new Error(`API error: ${res.status}`);
 }
 export async function submitRating(args: {
   run_id: string;
   user_id: string;

   if (!res.ok) throw new Error(`API error: ${res.status}`);
 }
+export type StreamEvent =
+  | { type: "candidate_start"; idx: number; strategy: string; grounded_buckets: string[] }
+  | { type: "token"; idx: number; delta: string }
+  | { type: "candidate_done"; idx: number; text: string }
+  | { type: "candidate_error"; idx: number; error: string }
+  | { type: "complete"; response: ChatResponse }
+  | { type: "error"; message: string };
+async function readSSE(
+  res: Response,
+  onEvent: (evt: StreamEvent) => void,
+): Promise<void> {
+  if (!res.body) throw new Error("no response body");
+  const reader = res.body.getReader();
+  const decoder = new TextDecoder();
+  let buffer = "";
+  const emitFrame = (frame: string) => {
+    const line = frame.split("\n").find((l) => l.startsWith("data:"));
+    if (!line) return;
+    const json = line.slice(5).trim();
+    if (!json) return;
+    try {
+      onEvent(JSON.parse(json) as StreamEvent);
+    } catch (e) {
+      console.warn("SSE parse failed", e, json.slice(0, 200));
+    }
+  };
+  while (true) {
+    const { done, value } = await reader.read();
+    if (done) break;
+    buffer += decoder.decode(value, { stream: true });
+    // SSE frames are separated by blank lines.
+    const parts = buffer.split("\n\n");
+    buffer = parts.pop() ?? "";
+    for (const part of parts) emitFrame(part);
+  }
+  // Server closed cleanly but the final frame didn't end with \n\n —
+  // emit whatever remains so the terminal event isn't dropped.
+  if (buffer.trim()) emitFrame(buffer);
+}
+export async function streamChat(
+  req: ChatRequest,
+  onEvent: (evt: StreamEvent) => void,
+): Promise<void> {
+  const res = await fetch(`${API_BASE}/chat/stream`, {
+    method: "POST",
+    headers: { "Content-Type": "application/json" },
+    body: JSON.stringify(req),
+  });
+  if (!res.ok) throw new Error(`API error: ${res.status}`);
+  await readSSE(res, onEvent);
+}
+export async function streamRegenerate(
+  args: { user_id: string; turn_id: number; rejected_texts: string[] },
+  onEvent: (evt: StreamEvent) => void,
+): Promise<void> {
+  const res = await fetch(`${API_BASE}/chat/regenerate/stream`, {
+    method: "POST",
+    headers: { "Content-Type": "application/json" },
+    body: JSON.stringify(args),
+  });
+  if (!res.ok) throw new Error(`API error: ${res.status}`);
+  await readSSE(res, onEvent);
+}
+export async function sendRegenerate(args: {
+  user_id: string;
+  turn_id: number;
+  rejected_texts: string[];
+}): Promise<ChatResponse> {
+  const res = await fetch(`${API_BASE}/chat/regenerate`, {
+    method: "POST",
+    headers: { "Content-Type": "application/json" },
+    body: JSON.stringify(args),
+  });
+  if (!res.ok) throw new Error(`API error: ${res.status}`);
+  return res.json();
+}
+export async function sendPick(args: {
+  run_id: string;
+  user_id: string;
+  picked_idx: number;
+}): Promise<void> {
+  const res = await fetch(`${API_BASE}/chat/pick`, {
+    method: "POST",
+    headers: { "Content-Type": "application/json" },
+    body: JSON.stringify(args),
+  });
+  if (!res.ok) throw new Error(`API error: ${res.status}`);
+}
 export async function submitRating(args: {
   run_id: string;
   user_id: string;

frontend/src/types.ts CHANGED Viewed

@@ -66,10 +66,23 @@ export interface EvalScores {
   gaze_alignment: number;
 }
 export interface ChatResponse {
   user_id: string;
   query: string;
   response: string;
   affect: string;
   llm_tier: string;
   retrieval_mode: string;
@@ -90,4 +103,10 @@ export interface ChatMessage {
   rephrased?: boolean;
   isTurnaround?: boolean;
   evalScores?: EvalScores | null;
 }

   gaze_alignment: number;
 }
+export type CandidateStrategy =
+  | "broad"
+  | "focused"
+  | "serendipitous"
+  | "side_index";
+export interface Candidate {
+  text: string;
+  strategy: CandidateStrategy | string;
+  grounded_buckets: string[];
+}
 export interface ChatResponse {
   user_id: string;
   query: string;
   response: string;
+  candidates: Candidate[];
   affect: string;
   llm_tier: string;
   retrieval_mode: string;
   rephrased?: boolean;
   isTurnaround?: boolean;
   evalScores?: EvalScores | null;
+  candidates?: Candidate[];
+  // picked becomes true after the user clicks one — also locks in `content` to the picked text
+  picked?: boolean;
+  pickedIdx?: number;
+  // Candidates from prior regeneration rounds — rendered struck-through above the active picker
+  rejectedRounds?: Candidate[][];
 }