Spaces:
Sleeping
Streaming candidate picker + side-index feedback loops
Browse files- Planner fans out 3 candidates with distinct grounding strategies
(broad/focused/serendipitous for memory; good/fine/rough for
present-state). Streams tokens via SSE through /chat/stream and
/chat/regenerate/stream.
- Picker UI: stackable cards per candidate, "try again" regenerates
with prior options marked rejected. Head-shake during open picker
triggers regenerate; post-pick still triggers turnaround.
- Side-index at data/pick_index/<uid>/ stores (query → picked text +
buckets). Feeds back into generation as a prior_pick retrieved
chunk and blends into bucket_priors at weight 0.3 (transient).
- Concurrency: RLock on pick_index cache, Lock on planner completed[].
rAF-batched token rendering. SSE reader drains trailing buffer.
Empty-candidate fallback surfaces actionable text + logs.
- .gitignore +1 -0
- README.md +10 -6
- backend/api/main.py +407 -1
- backend/generation/llm_client.py +51 -0
- backend/main.py +1 -0
- backend/pipeline/graph.py +16 -0
- backend/pipeline/nodes/feedback.py +5 -0
- backend/pipeline/nodes/planner.py +468 -28
- backend/pipeline/nodes/retrieval.py +83 -0
- backend/pipeline/state.py +9 -1
- backend/retrieval/pick_index.py +124 -0
- frontend/src/App.css +104 -0
- frontend/src/components/ChatPanel.tsx +467 -81
- frontend/src/lib/api.ts +96 -0
- frontend/src/types.ts +19 -0
|
@@ -18,6 +18,7 @@ env/
|
|
| 18 |
|
| 19 |
# Data — indexes are rebuilt from source; do NOT commit binaries
|
| 20 |
data/vector_store/
|
|
|
|
| 21 |
|
| 22 |
# Per-turn JSONL logs (contain user conversation content)
|
| 23 |
logs/
|
|
|
|
| 18 |
|
| 19 |
# Data — indexes are rebuilt from source; do NOT commit binaries
|
| 20 |
data/vector_store/
|
| 21 |
+
data/pick_index/
|
| 22 |
|
| 23 |
# Per-turn JSONL logs (contain user conversation content)
|
| 24 |
logs/
|
|
@@ -386,7 +386,6 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
|
|
| 386 |
### Dataset
|
| 387 |
|
| 388 |
- [x] **[Core]** Memories carry three chunk types per persona — `narrative`, `social_post`, `chat_log` — each with a `bucket` label. Type is preserved through the vector-store metadata and feeds the P(type) session prior.
|
| 389 |
-
- [ ] **[Core]** Write down the data schema somewhere so evals can reuse it
|
| 390 |
|
| 391 |
### Sensing (frontend)
|
| 392 |
|
|
@@ -427,10 +426,10 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
|
|
| 427 |
|
| 428 |
### Generation
|
| 429 |
|
| 430 |
-
- [
|
| 431 |
-
- [
|
| 432 |
-
- [
|
| 433 |
-
- [x] LLM temperature bumped from 0.4 → 0.8
|
| 434 |
|
| 435 |
### Evals
|
| 436 |
|
|
@@ -450,10 +449,15 @@ Scoring runs synchronously on the `/chat` response path and the `eval_scores` di
|
|
| 450 |
- [x] **[Eval]** Multimodal alignment — affect scored by positive/negative lexicon overlap vs. target sentiment, gesture by opener-phrase regex (THUMBS_UP/THUMBS_DOWN/WAVING), gaze by fraction of retrieved chunks matching the looked-at bucket
|
| 451 |
- [x] **[Eval]** Authenticity — per-turn stars under each assistant bubble, POST to `/feedback/rating`, logged with `run_id + rater_id`
|
| 452 |
- [ ] **[Eval]** For the live in-class eval: figure out the actual session — who rates (partners + experts per spec), how many turns each, what gets shown to them. The Likert form is the easy part; the protocol isn't written down anywhere
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 453 |
|
| 454 |
### Cleanup
|
| 455 |
|
| 456 |
-
- [ ] move the affect → `StyleDirective` config (`_AFFECT_CONFIG` in [intent.py](backend/pipeline/nodes/intent.py)) and the gesture directives ([labels.py](backend/sensing/labels.py)) out of code into a yaml
|
| 457 |
- [x] delete `backend/sensing/` (dead code, sensing is in frontend) — done, only `labels.py` remains
|
| 458 |
- [x] per-persona affect overrides (`_PERSONA_TONE_OVERRIDES`) deleted — redundant with `stylistic_preferences` in the new persona JSONs
|
| 459 |
|
|
|
|
| 386 |
### Dataset
|
| 387 |
|
| 388 |
- [x] **[Core]** Memories carry three chunk types per persona — `narrative`, `social_post`, `chat_log` — each with a `bucket` label. Type is preserved through the vector-store metadata and feeds the P(type) session prior.
|
|
|
|
| 389 |
|
| 390 |
### Sensing (frontend)
|
| 391 |
|
|
|
|
| 426 |
|
| 427 |
### Generation
|
| 428 |
|
| 429 |
+
- [x] **[Core]** API returns 3 candidates (plus an optional side-index hit) on `/chat` — see `candidates` in [backend/api/main.py](backend/api/main.py) `ChatResponse`. Planner fans out three grounding strategies in parallel threads and dedupes identical outputs: **broad** (all retrieved personal chunks), **focused** (top chunk only), and **serendipitous** (random non-top chunks) — see `_pick_strategy_chunks` in [backend/pipeline/nodes/planner.py](backend/pipeline/nodes/planner.py). Turnaround/present-state retries skip the fan-out and regenerate a single response.
|
| 430 |
+
- [x] **[Core]** Frontend picker shows stacked candidate cards with a strategy label under each; click to commit, which strikes the rest, locks the AAC bubble to the chosen text, and fires `POST /chat/pick`. One-candidate responses render as a normal bubble. See `handlePick` + `.candidate-list` in [frontend/src/components/ChatPanel.tsx](frontend/src/components/ChatPanel.tsx).
|
| 431 |
+
- [x] **[Bonus]** Side-index at `data/pick_index/<uid>/` stores `(query embedding → picked text, strategy, picked_buckets)` after every pick. Two feedback loops into generation: (1) the retrieval node injects the previously-picked text as a `source: "prior_pick"` chunk rendered in a "you answered like this before" block — the three LLM candidates all see it and riff on it; (2) retrieval blends cumulative `bucket_pick_counts` into this turn's `bucket_priors` at weight 0.3 (transient — doesn't persist across turns), so users who historically pick family memories bias retrieval toward family without overriding the session prior. The raw picked text is also still surfaced as a standalone `side_index` candidate. See [backend/retrieval/pick_index.py](backend/retrieval/pick_index.py), `_blend_pick_history_into_priors` + `_prepend_prior_pick` in [backend/pipeline/nodes/retrieval.py](backend/pipeline/nodes/retrieval.py), and the prior-pick block in `_build_user` in [backend/pipeline/nodes/planner.py](backend/pipeline/nodes/planner.py).
|
| 432 |
+
- [x] LLM temperature bumped from 0.4 → 0.8, then pulled back to 0.7 once chunk-variation became the primary diversity axis. With three different grounding strategies feeding three parallel calls, sampling noise matters less than which memories are in the context window.
|
| 433 |
|
| 434 |
### Evals
|
| 435 |
|
|
|
|
| 449 |
- [x] **[Eval]** Multimodal alignment — affect scored by positive/negative lexicon overlap vs. target sentiment, gesture by opener-phrase regex (THUMBS_UP/THUMBS_DOWN/WAVING), gaze by fraction of retrieved chunks matching the looked-at bucket
|
| 450 |
- [x] **[Eval]** Authenticity — per-turn stars under each assistant bubble, POST to `/feedback/rating`, logged with `run_id + rater_id`
|
| 451 |
- [ ] **[Eval]** For the live in-class eval: figure out the actual session — who rates (partners + experts per spec), how many turns each, what gets shown to them. The Likert form is the easy part; the protocol isn't written down anywhere
|
| 452 |
+
- [ ] **[Eval]** Relevance score — one NLI call per turn asking "does the response address the partner's query?" Fills the biggest current gap: a perfectly grounded but off-topic reply scores 100% grounded today and we'd never catch it
|
| 453 |
+
- [ ] **[Eval]** Candidate diversity — mean pairwise cosine distance among the 3 candidates in a picker round. Low diversity = picker showing three paraphrases of the same answer (the "aloha" problem), which is a signal that retrieval or temperature needs tuning for that query
|
| 454 |
+
- [ ] **[Eval]** Picker-aware metrics from `turns.jsonl` + `picks.jsonl`: which strategy wins most (`broad` vs `focused` vs `serendipitous` vs `side_index`), pick rate (% of turns where user clicked a card), regenerate rate (% of turns where user clicked "try again"). All computable offline, no runtime cost
|
| 455 |
+
- [ ] **[Eval]** Score alternate candidates too, not just the selected one. Right now `compute_evals` only scores `selected_response`; scoring all 3 would let us measure whether the picker actually improves quality over taking candidate 0 blindly
|
| 456 |
+
- [ ] **[Eval]** UI coverage gap: `compute_evals` returns 12 fields but `EvalPanel` renders only 5 pills (latency, grounded, affect, gesture, gaze). Hallucination rate, overall multimodal_alignment, SLO target/margin are computed and logged but never surfaced in the bubble. Decide what belongs as a pill vs a tooltip-only number vs an offline-only log field
|
| 457 |
+
- [ ] **[Eval]** On pill hover, tooltip should explain *how* the number was computed, not just what it means. Today the `title` attributes say "Groundedness: fraction of response sentences supported by retrieved memories" — which is the definition but not the math. Want: "5/8 sentences had NLI entailment prob ≥ 0.5 against the retrieved chunks" for groundedness; "3/4 positive-lexicon words matched HAPPY target" for affect; raw scorer inputs + thresholds exposed inline so the number isn't a black box
|
| 458 |
|
| 459 |
### Cleanup
|
| 460 |
|
|
|
|
| 461 |
- [x] delete `backend/sensing/` (dead code, sensing is in frontend) — done, only `labels.py` remains
|
| 462 |
- [x] per-persona affect overrides (`_PERSONA_TONE_OVERRIDES`) deleted — redundant with `stylistic_preferences` in the new persona JSONs
|
| 463 |
|
|
@@ -9,6 +9,7 @@ from pathlib import Path
|
|
| 9 |
|
| 10 |
from fastapi import FastAPI, HTTPException
|
| 11 |
from fastapi.middleware.cors import CORSMiddleware
|
|
|
|
| 12 |
from pydantic import BaseModel, Field
|
| 13 |
|
| 14 |
from backend.config.settings import settings
|
|
@@ -18,11 +19,12 @@ from backend.generation.llm_client import ( # active_model used by /debug/confi
|
|
| 18 |
get_client,
|
| 19 |
)
|
| 20 |
from backend.guardrails.checks import check_input
|
| 21 |
-
from backend.pipeline.graph import run_pipeline
|
| 22 |
from backend.pipeline.intent_kind import classify_intent_kind
|
| 23 |
from backend.pipeline.nodes import feedback as feedback_node
|
| 24 |
from backend.pipeline.nodes import planner as planner_node
|
| 25 |
from backend.pipeline.state import PipelineState
|
|
|
|
| 26 |
from backend.retrieval.priors import BUCKETS, CHUNK_TYPES, uniform
|
| 27 |
from backend.retrieval.vector_store import _get_embedder, retrieve
|
| 28 |
|
|
@@ -83,10 +85,17 @@ class TurnaroundRequest(BaseModel):
|
|
| 83 |
head_signal: str | None = None
|
| 84 |
|
| 85 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
class ChatResponse(BaseModel):
|
| 87 |
user_id: str
|
| 88 |
query: str
|
| 89 |
response: str
|
|
|
|
| 90 |
affect: str
|
| 91 |
llm_tier: str
|
| 92 |
llm_model: str
|
|
@@ -98,6 +107,18 @@ class ChatResponse(BaseModel):
|
|
| 98 |
eval_scores: dict | None = None
|
| 99 |
|
| 100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
class RatingRequest(BaseModel):
|
| 102 |
run_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
|
| 103 |
user_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
|
|
@@ -170,6 +191,7 @@ def _build_initial_state(req: ChatRequest, session: dict) -> PipelineState:
|
|
| 170 |
retrieval_mode_used="",
|
| 171 |
augmented_prompt=None,
|
| 172 |
candidates=[],
|
|
|
|
| 173 |
selected_response=None,
|
| 174 |
llm_tier_used="",
|
| 175 |
llm_model_used="",
|
|
@@ -330,6 +352,7 @@ def chat(req: ChatRequest):
|
|
| 330 |
user_id=req.user_id,
|
| 331 |
query=req.query,
|
| 332 |
response=guard["fallback"],
|
|
|
|
| 333 |
affect="NEUTRAL",
|
| 334 |
llm_tier="none",
|
| 335 |
llm_model="none",
|
|
@@ -368,6 +391,7 @@ def chat(req: ChatRequest):
|
|
| 368 |
user_id=req.user_id,
|
| 369 |
query=req.query,
|
| 370 |
response=result["selected_response"] or "",
|
|
|
|
| 371 |
affect=affect_emotion,
|
| 372 |
llm_tier=result.get("llm_tier_used", "unknown"),
|
| 373 |
llm_model=result.get("llm_model_used", "unknown"),
|
|
@@ -380,6 +404,105 @@ def chat(req: ChatRequest):
|
|
| 380 |
)
|
| 381 |
|
| 382 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 383 |
@app.post("/chat/turnaround", response_model=ChatResponse)
|
| 384 |
def chat_turnaround(req: TurnaroundRequest):
|
| 385 |
if req.user_id not in _sessions:
|
|
@@ -470,6 +593,289 @@ def chat_turnaround(req: TurnaroundRequest):
|
|
| 470 |
user_id=req.user_id,
|
| 471 |
query=replan_state["raw_query"],
|
| 472 |
response=replan_state["selected_response"] or "",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 473 |
affect=affect_emotion,
|
| 474 |
llm_tier=replan_state.get("llm_tier_used", "unknown"),
|
| 475 |
llm_model=replan_state.get("llm_model_used", "unknown"),
|
|
|
|
| 9 |
|
| 10 |
from fastapi import FastAPI, HTTPException
|
| 11 |
from fastapi.middleware.cors import CORSMiddleware
|
| 12 |
+
from fastapi.responses import StreamingResponse
|
| 13 |
from pydantic import BaseModel, Field
|
| 14 |
|
| 15 |
from backend.config.settings import settings
|
|
|
|
| 19 |
get_client,
|
| 20 |
)
|
| 21 |
from backend.guardrails.checks import check_input
|
| 22 |
+
from backend.pipeline.graph import choose_planner_tier, run_pipeline, run_until_planner
|
| 23 |
from backend.pipeline.intent_kind import classify_intent_kind
|
| 24 |
from backend.pipeline.nodes import feedback as feedback_node
|
| 25 |
from backend.pipeline.nodes import planner as planner_node
|
| 26 |
from backend.pipeline.state import PipelineState
|
| 27 |
+
from backend.retrieval import pick_index
|
| 28 |
from backend.retrieval.priors import BUCKETS, CHUNK_TYPES, uniform
|
| 29 |
from backend.retrieval.vector_store import _get_embedder, retrieve
|
| 30 |
|
|
|
|
| 85 |
head_signal: str | None = None
|
| 86 |
|
| 87 |
|
| 88 |
+
class CandidateOut(BaseModel):
|
| 89 |
+
text: str
|
| 90 |
+
strategy: str
|
| 91 |
+
grounded_buckets: list[str] = []
|
| 92 |
+
|
| 93 |
+
|
| 94 |
class ChatResponse(BaseModel):
|
| 95 |
user_id: str
|
| 96 |
query: str
|
| 97 |
response: str
|
| 98 |
+
candidates: list[CandidateOut] = []
|
| 99 |
affect: str
|
| 100 |
llm_tier: str
|
| 101 |
llm_model: str
|
|
|
|
| 107 |
eval_scores: dict | None = None
|
| 108 |
|
| 109 |
|
| 110 |
+
class PickRequest(BaseModel):
|
| 111 |
+
run_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
|
| 112 |
+
user_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
|
| 113 |
+
picked_idx: int = Field(ge=0, le=10)
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
class RegenerateRequest(BaseModel):
|
| 117 |
+
user_id: str
|
| 118 |
+
turn_id: int | None = None
|
| 119 |
+
rejected_texts: list[str] = Field(default_factory=list, max_length=20)
|
| 120 |
+
|
| 121 |
+
|
| 122 |
class RatingRequest(BaseModel):
|
| 123 |
run_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
|
| 124 |
user_id: str = Field(min_length=1, max_length=64, pattern=_ID_PATTERN)
|
|
|
|
| 191 |
retrieval_mode_used="",
|
| 192 |
augmented_prompt=None,
|
| 193 |
candidates=[],
|
| 194 |
+
rejected_candidates=[],
|
| 195 |
selected_response=None,
|
| 196 |
llm_tier_used="",
|
| 197 |
llm_model_used="",
|
|
|
|
| 352 |
user_id=req.user_id,
|
| 353 |
query=req.query,
|
| 354 |
response=guard["fallback"],
|
| 355 |
+
candidates=[],
|
| 356 |
affect="NEUTRAL",
|
| 357 |
llm_tier="none",
|
| 358 |
llm_model="none",
|
|
|
|
| 391 |
user_id=req.user_id,
|
| 392 |
query=req.query,
|
| 393 |
response=result["selected_response"] or "",
|
| 394 |
+
candidates=[CandidateOut(**c) for c in result.get("candidates") or []],
|
| 395 |
affect=affect_emotion,
|
| 396 |
llm_tier=result.get("llm_tier_used", "unknown"),
|
| 397 |
llm_model=result.get("llm_model_used", "unknown"),
|
|
|
|
| 404 |
)
|
| 405 |
|
| 406 |
|
| 407 |
+
@app.post("/chat/stream")
|
| 408 |
+
def chat_stream(req: ChatRequest):
|
| 409 |
+
"""Server-Sent Events version of /chat. Runs intent + retrieval synchronously,
|
| 410 |
+
then streams planner candidate tokens as they arrive. Final event carries the
|
| 411 |
+
full ChatResponse-shaped payload.
|
| 412 |
+
"""
|
| 413 |
+
guard = check_input(req.query)
|
| 414 |
+
if not guard["allowed"]:
|
| 415 |
+
# Mirror the non-stream /chat early-exit.
|
| 416 |
+
payload = {
|
| 417 |
+
"user_id": req.user_id,
|
| 418 |
+
"query": req.query,
|
| 419 |
+
"response": guard["fallback"],
|
| 420 |
+
"candidates": [],
|
| 421 |
+
"affect": "NEUTRAL",
|
| 422 |
+
"llm_tier": "none",
|
| 423 |
+
"llm_model": "none",
|
| 424 |
+
"retrieval_mode": "none",
|
| 425 |
+
"latency": {},
|
| 426 |
+
"guardrail_passed": False,
|
| 427 |
+
"turn_id": 0,
|
| 428 |
+
"run_id": None,
|
| 429 |
+
"eval_scores": None,
|
| 430 |
+
}
|
| 431 |
+
|
| 432 |
+
def _one_event():
|
| 433 |
+
yield _sse({"type": "complete", "response": payload})
|
| 434 |
+
|
| 435 |
+
return StreamingResponse(_one_event(), media_type="text/event-stream")
|
| 436 |
+
|
| 437 |
+
session = _get_or_init_session(req.user_id)
|
| 438 |
+
initial_state = _build_initial_state(req, session)
|
| 439 |
+
|
| 440 |
+
def _gen():
|
| 441 |
+
state = run_until_planner(initial_state)
|
| 442 |
+
tier = choose_planner_tier(state)
|
| 443 |
+
|
| 444 |
+
completion: dict | None = None
|
| 445 |
+
for evt in planner_node._run_stream(state, tier=tier):
|
| 446 |
+
if evt["type"] == "complete":
|
| 447 |
+
completion = evt["planner_update"]
|
| 448 |
+
break
|
| 449 |
+
yield _sse(evt)
|
| 450 |
+
|
| 451 |
+
if completion is None:
|
| 452 |
+
yield _sse({"type": "error", "message": "planner produced no completion"})
|
| 453 |
+
return
|
| 454 |
+
|
| 455 |
+
state.update(completion) # type: ignore[typeddict-item]
|
| 456 |
+
state.update(feedback_node.run(state)) # type: ignore[typeddict-item]
|
| 457 |
+
|
| 458 |
+
session["session_history"] = state["session_history"]
|
| 459 |
+
session["bucket_priors"] = state["bucket_priors"]
|
| 460 |
+
session["type_priors"] = state["type_priors"]
|
| 461 |
+
session["last_state"] = state
|
| 462 |
+
|
| 463 |
+
affect_emotion = (state.get("affect") or {}).get("emotion", "NEUTRAL")
|
| 464 |
+
run_id = state.get("run_id")
|
| 465 |
+
|
| 466 |
+
eval_scores = _compute_and_persist_evals(
|
| 467 |
+
run_id=run_id,
|
| 468 |
+
user_id=req.user_id,
|
| 469 |
+
turn_id=state["turn_id"],
|
| 470 |
+
response=state["selected_response"] or "",
|
| 471 |
+
chunks=list(state.get("retrieved_chunks") or []),
|
| 472 |
+
latency_log=dict(state.get("latency_log") or {}),
|
| 473 |
+
affect=affect_emotion,
|
| 474 |
+
gesture_tag=req.gesture_tag,
|
| 475 |
+
gaze_bucket=req.gaze_bucket,
|
| 476 |
+
)
|
| 477 |
+
|
| 478 |
+
final = {
|
| 479 |
+
"user_id": req.user_id,
|
| 480 |
+
"query": req.query,
|
| 481 |
+
"response": state["selected_response"] or "",
|
| 482 |
+
"candidates": [dict(c) for c in state.get("candidates") or []],
|
| 483 |
+
"affect": affect_emotion,
|
| 484 |
+
"llm_tier": state.get("llm_tier_used", "unknown"),
|
| 485 |
+
"llm_model": state.get("llm_model_used", "unknown"),
|
| 486 |
+
"retrieval_mode": state.get("retrieval_mode_used", "unknown"),
|
| 487 |
+
"latency": state.get("latency_log") or {},
|
| 488 |
+
"guardrail_passed": state.get("guardrail_passed", True),
|
| 489 |
+
"run_id": run_id,
|
| 490 |
+
"turn_id": state["turn_id"],
|
| 491 |
+
"eval_scores": eval_scores,
|
| 492 |
+
}
|
| 493 |
+
yield _sse({"type": "complete", "response": final})
|
| 494 |
+
|
| 495 |
+
return StreamingResponse(
|
| 496 |
+
_gen(),
|
| 497 |
+
media_type="text/event-stream",
|
| 498 |
+
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
|
| 499 |
+
)
|
| 500 |
+
|
| 501 |
+
|
| 502 |
+
def _sse(data: dict) -> str:
|
| 503 |
+
return f"data: {json.dumps(data, ensure_ascii=False)}\n\n"
|
| 504 |
+
|
| 505 |
+
|
| 506 |
@app.post("/chat/turnaround", response_model=ChatResponse)
|
| 507 |
def chat_turnaround(req: TurnaroundRequest):
|
| 508 |
if req.user_id not in _sessions:
|
|
|
|
| 593 |
user_id=req.user_id,
|
| 594 |
query=replan_state["raw_query"],
|
| 595 |
response=replan_state["selected_response"] or "",
|
| 596 |
+
candidates=[CandidateOut(**c) for c in replan_state.get("candidates") or []],
|
| 597 |
+
affect=affect_emotion,
|
| 598 |
+
llm_tier=replan_state.get("llm_tier_used", "unknown"),
|
| 599 |
+
llm_model=replan_state.get("llm_model_used", "unknown"),
|
| 600 |
+
retrieval_mode=replan_state.get("retrieval_mode_used", "unknown"),
|
| 601 |
+
latency=replan_state.get("latency_log") or {},
|
| 602 |
+
guardrail_passed=replan_state.get("guardrail_passed", True),
|
| 603 |
+
run_id=run_id,
|
| 604 |
+
turn_id=replan_state["turn_id"],
|
| 605 |
+
eval_scores=eval_scores,
|
| 606 |
+
)
|
| 607 |
+
|
| 608 |
+
|
| 609 |
+
def _find_turn_from_jsonl(run_id: str) -> dict | None:
|
| 610 |
+
"""Scan turns.jsonl from the end for a matching run_id. Used as fallback
|
| 611 |
+
when the session's last_state has already moved on."""
|
| 612 |
+
path = Path(settings.logs_dir) / "turns.jsonl"
|
| 613 |
+
if not path.exists():
|
| 614 |
+
return None
|
| 615 |
+
try:
|
| 616 |
+
with open(path, encoding="utf-8") as f:
|
| 617 |
+
lines = f.readlines()
|
| 618 |
+
except OSError:
|
| 619 |
+
return None
|
| 620 |
+
for line in reversed(lines[-500:]): # bounded tail scan
|
| 621 |
+
try:
|
| 622 |
+
row = json.loads(line)
|
| 623 |
+
except json.JSONDecodeError:
|
| 624 |
+
continue
|
| 625 |
+
if row.get("run_id") == run_id:
|
| 626 |
+
return row
|
| 627 |
+
return None
|
| 628 |
+
|
| 629 |
+
|
| 630 |
+
@app.post("/chat/pick")
|
| 631 |
+
def pick_candidate(req: PickRequest):
|
| 632 |
+
if not _RUN_ID_RE.match(req.run_id):
|
| 633 |
+
raise HTTPException(status_code=400, detail="invalid run_id")
|
| 634 |
+
|
| 635 |
+
session = _sessions.get(req.user_id) or {}
|
| 636 |
+
last = session.get("last_state") or {}
|
| 637 |
+
candidates = last.get("candidates") or []
|
| 638 |
+
query_text = last.get("raw_query") or ""
|
| 639 |
+
|
| 640 |
+
# Fallback: last_state already advanced past this run_id — read from JSONL
|
| 641 |
+
if last.get("run_id") != req.run_id or not candidates:
|
| 642 |
+
row = _find_turn_from_jsonl(req.run_id)
|
| 643 |
+
if not row:
|
| 644 |
+
raise HTTPException(status_code=404, detail="turn not found")
|
| 645 |
+
candidates = row.get("candidates") or []
|
| 646 |
+
query_text = row.get("query") or query_text
|
| 647 |
+
|
| 648 |
+
if req.picked_idx >= len(candidates):
|
| 649 |
+
raise HTTPException(status_code=400, detail="picked_idx out of range")
|
| 650 |
+
|
| 651 |
+
picked = candidates[req.picked_idx]
|
| 652 |
+
picked_text = picked.get("text", "")
|
| 653 |
+
strategy = picked.get("strategy", "unknown")
|
| 654 |
+
picked_buckets = [
|
| 655 |
+
b for b in (picked.get("grounded_buckets") or []) if b and b != "open_domain"
|
| 656 |
+
]
|
| 657 |
+
|
| 658 |
+
if query_text and picked_text:
|
| 659 |
+
try:
|
| 660 |
+
pick_index.add(
|
| 661 |
+
query=query_text,
|
| 662 |
+
user_id=req.user_id,
|
| 663 |
+
strategy=strategy,
|
| 664 |
+
picked_text=picked_text,
|
| 665 |
+
picked_buckets=picked_buckets,
|
| 666 |
+
)
|
| 667 |
+
except Exception as exc:
|
| 668 |
+
_log.warning("pick_index add failed: %r", exc)
|
| 669 |
+
|
| 670 |
+
logs_dir = Path(settings.logs_dir)
|
| 671 |
+
logs_dir.mkdir(parents=True, exist_ok=True)
|
| 672 |
+
entry = {
|
| 673 |
+
"ts": time.time(),
|
| 674 |
+
"run_id": req.run_id,
|
| 675 |
+
"user_id": req.user_id,
|
| 676 |
+
"picked_idx": req.picked_idx,
|
| 677 |
+
"strategy": strategy,
|
| 678 |
+
"picked_text": picked_text,
|
| 679 |
+
"query": query_text,
|
| 680 |
+
}
|
| 681 |
+
with open(logs_dir / "picks.jsonl", "a", encoding="utf-8") as f:
|
| 682 |
+
f.write(json.dumps(entry, ensure_ascii=False) + "\n")
|
| 683 |
+
|
| 684 |
+
return {"status": "ok", "strategy": strategy}
|
| 685 |
+
|
| 686 |
+
|
| 687 |
+
@app.post("/chat/regenerate/stream")
|
| 688 |
+
def chat_regenerate_stream(req: RegenerateRequest):
|
| 689 |
+
"""Streaming regenerate — same as /chat/stream but reuses last_state and
|
| 690 |
+
marks all prior candidates as rejected."""
|
| 691 |
+
if req.user_id not in _sessions:
|
| 692 |
+
raise HTTPException(status_code=404, detail="no active session")
|
| 693 |
+
session = _sessions[req.user_id]
|
| 694 |
+
last: PipelineState | None = session.get("last_state")
|
| 695 |
+
if last is None:
|
| 696 |
+
raise HTTPException(status_code=409, detail="no prior turn to regenerate")
|
| 697 |
+
if req.turn_id is not None and req.turn_id != last["turn_id"]:
|
| 698 |
+
raise HTTPException(status_code=409, detail="stale turn_id")
|
| 699 |
+
|
| 700 |
+
gen_cfg = dict(last.get("generation_config") or {})
|
| 701 |
+
gen_cfg["persona_mod"] = "all_rejected"
|
| 702 |
+
gen_cfg.setdefault("tone_tag", "[TONE:TRY_DIFFERENT_ANGLE]")
|
| 703 |
+
|
| 704 |
+
prior_rejected = [c.get("text", "") for c in (last.get("candidates") or [])]
|
| 705 |
+
merged = (
|
| 706 |
+
list(last.get("rejected_candidates") or [])
|
| 707 |
+
+ [t for t in prior_rejected if t]
|
| 708 |
+
+ [t for t in req.rejected_texts if t]
|
| 709 |
+
)
|
| 710 |
+
seen: set[str] = set()
|
| 711 |
+
rejected: list[str] = []
|
| 712 |
+
for t in merged:
|
| 713 |
+
key = t.strip().lower()
|
| 714 |
+
if key and key not in seen:
|
| 715 |
+
seen.add(key)
|
| 716 |
+
rejected.append(t)
|
| 717 |
+
|
| 718 |
+
trimmed_history = list(last.get("session_history") or [])
|
| 719 |
+
if trimmed_history and trimmed_history[-1].get("role") == "aac_user":
|
| 720 |
+
trimmed_history.pop()
|
| 721 |
+
if trimmed_history and trimmed_history[-1].get("role") == "partner":
|
| 722 |
+
trimmed_history.pop()
|
| 723 |
+
|
| 724 |
+
replan_state: PipelineState = dict(last) # type: ignore[assignment]
|
| 725 |
+
replan_state["session_history"] = trimmed_history
|
| 726 |
+
replan_state["generation_config"] = gen_cfg
|
| 727 |
+
replan_state["rejected_candidates"] = rejected
|
| 728 |
+
replan_state["turnaround_triggered"] = False
|
| 729 |
+
replan_state["latency_log"] = {
|
| 730 |
+
"t_sensing": 0.0,
|
| 731 |
+
"t_intent": 0.0,
|
| 732 |
+
"t_retrieval": 0.0,
|
| 733 |
+
"t_generation": 0.0,
|
| 734 |
+
"t_total": 0.0,
|
| 735 |
+
}
|
| 736 |
+
|
| 737 |
+
def _gen():
|
| 738 |
+
completion: dict | None = None
|
| 739 |
+
for evt in planner_node._run_stream(replan_state, tier="primary"):
|
| 740 |
+
if evt["type"] == "complete":
|
| 741 |
+
completion = evt["planner_update"]
|
| 742 |
+
break
|
| 743 |
+
yield _sse(evt)
|
| 744 |
+
if completion is None:
|
| 745 |
+
yield _sse({"type": "error", "message": "planner produced no completion"})
|
| 746 |
+
return
|
| 747 |
+
replan_state.update(completion) # type: ignore[typeddict-item]
|
| 748 |
+
replan_state.update(feedback_node.run(replan_state)) # type: ignore[typeddict-item]
|
| 749 |
+
|
| 750 |
+
session["session_history"] = replan_state["session_history"]
|
| 751 |
+
session["bucket_priors"] = replan_state["bucket_priors"]
|
| 752 |
+
session["type_priors"] = replan_state["type_priors"]
|
| 753 |
+
session["last_state"] = replan_state
|
| 754 |
+
|
| 755 |
+
affect_emotion = (replan_state.get("affect") or {}).get("emotion", "NEUTRAL")
|
| 756 |
+
run_id = replan_state.get("run_id")
|
| 757 |
+
eval_scores = _compute_and_persist_evals(
|
| 758 |
+
run_id=run_id,
|
| 759 |
+
user_id=req.user_id,
|
| 760 |
+
turn_id=replan_state["turn_id"],
|
| 761 |
+
response=replan_state["selected_response"] or "",
|
| 762 |
+
chunks=list(replan_state.get("retrieved_chunks") or []),
|
| 763 |
+
latency_log=dict(replan_state.get("latency_log") or {}),
|
| 764 |
+
affect=affect_emotion,
|
| 765 |
+
gesture_tag=replan_state.get("gesture_tag"),
|
| 766 |
+
gaze_bucket=replan_state.get("gaze_bucket"),
|
| 767 |
+
)
|
| 768 |
+
final = {
|
| 769 |
+
"user_id": req.user_id,
|
| 770 |
+
"query": replan_state["raw_query"],
|
| 771 |
+
"response": replan_state["selected_response"] or "",
|
| 772 |
+
"candidates": [dict(c) for c in replan_state.get("candidates") or []],
|
| 773 |
+
"affect": affect_emotion,
|
| 774 |
+
"llm_tier": replan_state.get("llm_tier_used", "unknown"),
|
| 775 |
+
"llm_model": replan_state.get("llm_model_used", "unknown"),
|
| 776 |
+
"retrieval_mode": replan_state.get("retrieval_mode_used", "unknown"),
|
| 777 |
+
"latency": replan_state.get("latency_log") or {},
|
| 778 |
+
"guardrail_passed": replan_state.get("guardrail_passed", True),
|
| 779 |
+
"run_id": run_id,
|
| 780 |
+
"turn_id": replan_state["turn_id"],
|
| 781 |
+
"eval_scores": eval_scores,
|
| 782 |
+
}
|
| 783 |
+
yield _sse({"type": "complete", "response": final})
|
| 784 |
+
|
| 785 |
+
return StreamingResponse(
|
| 786 |
+
_gen(),
|
| 787 |
+
media_type="text/event-stream",
|
| 788 |
+
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
|
| 789 |
+
)
|
| 790 |
+
|
| 791 |
+
|
| 792 |
+
@app.post("/chat/regenerate", response_model=ChatResponse)
|
| 793 |
+
def chat_regenerate(req: RegenerateRequest):
|
| 794 |
+
"""Re-run the planner for the same turn with all prior candidates marked rejected.
|
| 795 |
+
Does NOT advance turn_id — same partner query, fresh fan-out of candidates.
|
| 796 |
+
"""
|
| 797 |
+
if req.user_id not in _sessions:
|
| 798 |
+
raise HTTPException(status_code=404, detail="no active session")
|
| 799 |
+
session = _sessions[req.user_id]
|
| 800 |
+
last: PipelineState | None = session.get("last_state")
|
| 801 |
+
if last is None:
|
| 802 |
+
raise HTTPException(status_code=409, detail="no prior turn to regenerate")
|
| 803 |
+
if req.turn_id is not None and req.turn_id != last["turn_id"]:
|
| 804 |
+
raise HTTPException(status_code=409, detail="stale turn_id")
|
| 805 |
+
|
| 806 |
+
gen_cfg = dict(last.get("generation_config") or {})
|
| 807 |
+
gen_cfg["persona_mod"] = "all_rejected"
|
| 808 |
+
gen_cfg.setdefault("tone_tag", "[TONE:TRY_DIFFERENT_ANGLE]")
|
| 809 |
+
|
| 810 |
+
prior_rejected = [c.get("text", "") for c in (last.get("candidates") or [])]
|
| 811 |
+
merged_rejected = (
|
| 812 |
+
list(last.get("rejected_candidates") or [])
|
| 813 |
+
+ [t for t in prior_rejected if t]
|
| 814 |
+
+ [t for t in req.rejected_texts if t]
|
| 815 |
+
)
|
| 816 |
+
# Dedupe while preserving order.
|
| 817 |
+
seen: set[str] = set()
|
| 818 |
+
rejected: list[str] = []
|
| 819 |
+
for t in merged_rejected:
|
| 820 |
+
key = t.strip().lower()
|
| 821 |
+
if key and key not in seen:
|
| 822 |
+
seen.add(key)
|
| 823 |
+
rejected.append(t)
|
| 824 |
+
|
| 825 |
+
# Strip the tail (partner, aac_user) so feedback doesn't stack duplicate
|
| 826 |
+
# history entries on every regenerate — the user hasn't committed yet.
|
| 827 |
+
trimmed_history = list(last.get("session_history") or [])
|
| 828 |
+
if trimmed_history and trimmed_history[-1].get("role") == "aac_user":
|
| 829 |
+
trimmed_history.pop()
|
| 830 |
+
if trimmed_history and trimmed_history[-1].get("role") == "partner":
|
| 831 |
+
trimmed_history.pop()
|
| 832 |
+
|
| 833 |
+
replan_state: PipelineState = dict(last) # type: ignore[assignment]
|
| 834 |
+
replan_state["session_history"] = trimmed_history
|
| 835 |
+
replan_state["generation_config"] = gen_cfg
|
| 836 |
+
replan_state["rejected_candidates"] = rejected
|
| 837 |
+
replan_state["turnaround_triggered"] = False # keep multi-shot
|
| 838 |
+
replan_state["latency_log"] = {
|
| 839 |
+
"t_sensing": 0.0,
|
| 840 |
+
"t_intent": 0.0,
|
| 841 |
+
"t_retrieval": 0.0,
|
| 842 |
+
"t_generation": 0.0,
|
| 843 |
+
"t_total": 0.0,
|
| 844 |
+
}
|
| 845 |
+
|
| 846 |
+
planner_update = planner_node.run_primary(replan_state)
|
| 847 |
+
replan_state.update(planner_update) # type: ignore[typeddict-item]
|
| 848 |
+
|
| 849 |
+
# Feedback node rewrites history + assigns a new run_id. Each regenerate
|
| 850 |
+
# is its own row in turns.jsonl for the eval record.
|
| 851 |
+
feedback_update = feedback_node.run(replan_state)
|
| 852 |
+
replan_state.update(feedback_update) # type: ignore[typeddict-item]
|
| 853 |
+
|
| 854 |
+
session["session_history"] = replan_state["session_history"]
|
| 855 |
+
session["bucket_priors"] = replan_state["bucket_priors"]
|
| 856 |
+
session["type_priors"] = replan_state["type_priors"]
|
| 857 |
+
session["last_state"] = replan_state
|
| 858 |
+
|
| 859 |
+
affect_emotion = (replan_state.get("affect") or {}).get("emotion", "NEUTRAL")
|
| 860 |
+
run_id = replan_state.get("run_id")
|
| 861 |
+
|
| 862 |
+
eval_scores = _compute_and_persist_evals(
|
| 863 |
+
run_id=run_id,
|
| 864 |
+
user_id=req.user_id,
|
| 865 |
+
turn_id=replan_state["turn_id"],
|
| 866 |
+
response=replan_state["selected_response"] or "",
|
| 867 |
+
chunks=list(replan_state.get("retrieved_chunks") or []),
|
| 868 |
+
latency_log=dict(replan_state.get("latency_log") or {}),
|
| 869 |
+
affect=affect_emotion,
|
| 870 |
+
gesture_tag=replan_state.get("gesture_tag"),
|
| 871 |
+
gaze_bucket=replan_state.get("gaze_bucket"),
|
| 872 |
+
)
|
| 873 |
+
|
| 874 |
+
return ChatResponse(
|
| 875 |
+
user_id=req.user_id,
|
| 876 |
+
query=replan_state["raw_query"],
|
| 877 |
+
response=replan_state["selected_response"] or "",
|
| 878 |
+
candidates=[CandidateOut(**c) for c in replan_state.get("candidates") or []],
|
| 879 |
affect=affect_emotion,
|
| 880 |
llm_tier=replan_state.get("llm_tier_used", "unknown"),
|
| 881 |
llm_model=replan_state.get("llm_model_used", "unknown"),
|
|
@@ -1,5 +1,6 @@
|
|
| 1 |
# Two-tier LLM client — primary / fallback, both Ollama Cloud over OpenAI-compatible HTTP.
|
| 2 |
import re
|
|
|
|
| 3 |
from functools import lru_cache
|
| 4 |
from typing import Any
|
| 5 |
|
|
@@ -87,6 +88,56 @@ def chat_complete(
|
|
| 87 |
return stripped
|
| 88 |
|
| 89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
def warmup(tier: str | None = None) -> None:
|
| 91 |
chat_complete(
|
| 92 |
messages=[{"role": "user", "content": "hi"}],
|
|
|
|
| 1 |
# Two-tier LLM client — primary / fallback, both Ollama Cloud over OpenAI-compatible HTTP.
|
| 2 |
import re
|
| 3 |
+
from collections.abc import Iterator
|
| 4 |
from functools import lru_cache
|
| 5 |
from typing import Any
|
| 6 |
|
|
|
|
| 88 |
return stripped
|
| 89 |
|
| 90 |
|
| 91 |
+
def chat_complete_stream(
|
| 92 |
+
messages: list[dict],
|
| 93 |
+
max_tokens: int,
|
| 94 |
+
tier: str | None = None,
|
| 95 |
+
temperature: float = 0.7,
|
| 96 |
+
**kwargs: Any,
|
| 97 |
+
) -> Iterator[str]:
|
| 98 |
+
"""Yield token deltas as they arrive. Thinking-mode stripping is applied
|
| 99 |
+
post-hoc on the buffered text by the caller — streaming <think>…</think>
|
| 100 |
+
into the UI would confuse the picker anyway.
|
| 101 |
+
"""
|
| 102 |
+
resolved_tier = tier or settings.active_llm_tier
|
| 103 |
+
model = active_model(resolved_tier)
|
| 104 |
+
client = get_client(resolved_tier)
|
| 105 |
+
|
| 106 |
+
patched_messages = messages
|
| 107 |
+
extra_body: dict[str, Any] = kwargs.pop("extra_body", {})
|
| 108 |
+
|
| 109 |
+
if settings.thinking_mode == "suppress":
|
| 110 |
+
patched_messages = _apply_no_think(messages)
|
| 111 |
+
|
| 112 |
+
effective_max_tokens = max_tokens
|
| 113 |
+
if settings.thinking_mode in ("strip", "full"):
|
| 114 |
+
effective_max_tokens = max_tokens + settings.thinking_token_budget
|
| 115 |
+
|
| 116 |
+
stream = client.chat.completions.create(
|
| 117 |
+
model=model,
|
| 118 |
+
messages=patched_messages,
|
| 119 |
+
max_tokens=effective_max_tokens,
|
| 120 |
+
temperature=temperature,
|
| 121 |
+
stream=True,
|
| 122 |
+
extra_body=extra_body or None,
|
| 123 |
+
**kwargs,
|
| 124 |
+
)
|
| 125 |
+
for chunk in stream:
|
| 126 |
+
if not chunk.choices:
|
| 127 |
+
continue
|
| 128 |
+
delta = chunk.choices[0].delta
|
| 129 |
+
piece = getattr(delta, "content", None) or ""
|
| 130 |
+
if piece:
|
| 131 |
+
yield piece
|
| 132 |
+
|
| 133 |
+
|
| 134 |
+
def finalize_streamed(text: str) -> str:
|
| 135 |
+
"""Apply the same post-processing chat_complete does once a stream is done."""
|
| 136 |
+
if settings.thinking_mode in ("off", "strip"):
|
| 137 |
+
text = _strip_think_tags(text)
|
| 138 |
+
return text.strip()
|
| 139 |
+
|
| 140 |
+
|
| 141 |
def warmup(tier: str | None = None) -> None:
|
| 142 |
chat_complete(
|
| 143 |
messages=[{"role": "user", "content": "hi"}],
|
|
@@ -180,6 +180,7 @@ def main() -> None:
|
|
| 180 |
retrieval_mode_used="",
|
| 181 |
augmented_prompt=None,
|
| 182 |
candidates=[],
|
|
|
|
| 183 |
selected_response=None,
|
| 184 |
llm_tier_used="",
|
| 185 |
latency_log={
|
|
|
|
| 180 |
retrieval_mode_used="",
|
| 181 |
augmented_prompt=None,
|
| 182 |
candidates=[],
|
| 183 |
+
rejected_candidates=[],
|
| 184 |
selected_response=None,
|
| 185 |
llm_tier_used="",
|
| 186 |
latency_log={
|
|
@@ -34,3 +34,19 @@ def run_pipeline(state: PipelineState) -> PipelineState:
|
|
| 34 |
|
| 35 |
_merge(state, feedback.run(state))
|
| 36 |
return state
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
_merge(state, feedback.run(state))
|
| 36 |
return state
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
def run_until_planner(state: PipelineState) -> PipelineState:
|
| 40 |
+
"""Run intent + retrieval only. Used by the streaming endpoint so it can
|
| 41 |
+
then drive the planner's token stream itself and call feedback at the end.
|
| 42 |
+
"""
|
| 43 |
+
_merge(state, intent.run(state))
|
| 44 |
+
if _route_by_affect(state) == "fast":
|
| 45 |
+
_merge(state, retrieval.run_fast(state))
|
| 46 |
+
else:
|
| 47 |
+
_merge(state, retrieval.run_full(state))
|
| 48 |
+
return state
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
def choose_planner_tier(state: PipelineState) -> str:
|
| 52 |
+
return _route_by_latency(state)
|
|
@@ -39,12 +39,14 @@ def _log_to_jsonl(
|
|
| 39 |
latency = state.get("latency_log") or {}
|
| 40 |
affect = (state.get("affect") or {}).get("emotion", "UNKNOWN")
|
| 41 |
chunks = state.get("retrieved_chunks") or []
|
|
|
|
| 42 |
|
| 43 |
entry = {
|
| 44 |
"run_id": run_id,
|
| 45 |
"ts": time.time(),
|
| 46 |
"user_id": state["user_id"],
|
| 47 |
"turn_id": state["turn_id"],
|
|
|
|
| 48 |
"llm_tier": state.get("llm_tier_used", "unknown"),
|
| 49 |
"retrieval_mode": state.get("retrieval_mode_used", "unknown"),
|
| 50 |
"affect": affect,
|
|
@@ -57,6 +59,7 @@ def _log_to_jsonl(
|
|
| 57 |
),
|
| 58 |
"num_contextual": sum(1 for c in chunks if c.get("source") == "contextual"),
|
| 59 |
"num_open_domain": sum(1 for c in chunks if c.get("source") == "open_domain"),
|
|
|
|
| 60 |
"latency": {
|
| 61 |
"t_sensing": latency.get("t_sensing", 0.0),
|
| 62 |
"t_intent": latency.get("t_intent", 0.0),
|
|
@@ -65,6 +68,8 @@ def _log_to_jsonl(
|
|
| 65 |
"t_total": latency.get("t_total", 0.0),
|
| 66 |
},
|
| 67 |
"response": state.get("selected_response") or "",
|
|
|
|
|
|
|
| 68 |
"bucket_priors_after": bucket_priors_after,
|
| 69 |
"type_priors_after": type_priors_after,
|
| 70 |
}
|
|
|
|
| 39 |
latency = state.get("latency_log") or {}
|
| 40 |
affect = (state.get("affect") or {}).get("emotion", "UNKNOWN")
|
| 41 |
chunks = state.get("retrieved_chunks") or []
|
| 42 |
+
candidates = state.get("candidates") or []
|
| 43 |
|
| 44 |
entry = {
|
| 45 |
"run_id": run_id,
|
| 46 |
"ts": time.time(),
|
| 47 |
"user_id": state["user_id"],
|
| 48 |
"turn_id": state["turn_id"],
|
| 49 |
+
"query": state["raw_query"],
|
| 50 |
"llm_tier": state.get("llm_tier_used", "unknown"),
|
| 51 |
"retrieval_mode": state.get("retrieval_mode_used", "unknown"),
|
| 52 |
"affect": affect,
|
|
|
|
| 59 |
),
|
| 60 |
"num_contextual": sum(1 for c in chunks if c.get("source") == "contextual"),
|
| 61 |
"num_open_domain": sum(1 for c in chunks if c.get("source") == "open_domain"),
|
| 62 |
+
"num_prior_pick": sum(1 for c in chunks if c.get("source") == "prior_pick"),
|
| 63 |
"latency": {
|
| 64 |
"t_sensing": latency.get("t_sensing", 0.0),
|
| 65 |
"t_intent": latency.get("t_intent", 0.0),
|
|
|
|
| 68 |
"t_total": latency.get("t_total", 0.0),
|
| 69 |
},
|
| 70 |
"response": state.get("selected_response") or "",
|
| 71 |
+
"candidates": [dict(c) for c in candidates],
|
| 72 |
+
"n_candidates": len(candidates),
|
| 73 |
"bucket_priors_after": bucket_priors_after,
|
| 74 |
"type_priors_after": type_priors_after,
|
| 75 |
}
|
|
@@ -1,12 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import time
|
|
|
|
| 2 |
|
| 3 |
from backend.config.settings import settings
|
| 4 |
-
from backend.generation.llm_client import
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
from backend.guardrails.checks import check_output
|
| 6 |
from backend.pipeline.intent_kind import classify_intent_kind
|
| 7 |
-
from backend.pipeline.state import PipelineState, StyleDirective
|
|
|
|
| 8 |
from backend.sensing.labels import GESTURE_DIRECTIVES
|
| 9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
_PERSONA_MOD_INSTRUCTIONS = {
|
| 11 |
"amplify_quirks": "Amplify your characteristic style and personality.",
|
| 12 |
"suppress_humor": "Be direct and supportive. Suppress humor.",
|
|
@@ -30,6 +50,12 @@ _PERSONA_MOD_INSTRUCTIONS = {
|
|
| 30 |
"read (if you said 'good', try 'not great') or honestly admit "
|
| 31 |
"you're not sure how you feel right now. Do NOT invent details."
|
| 32 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
}
|
| 34 |
|
| 35 |
|
|
@@ -41,7 +67,23 @@ def run_fallback(state: PipelineState) -> dict:
|
|
| 41 |
return _run(state, tier="fallback")
|
| 42 |
|
| 43 |
|
| 44 |
-
def
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
t0 = time.perf_counter()
|
| 46 |
|
| 47 |
profile = state["persona_profile"]
|
|
@@ -57,31 +99,386 @@ def _run(state: PipelineState, tier: str) -> dict:
|
|
| 57 |
rejected_response: str | None = None
|
| 58 |
if turnaround_triggered:
|
| 59 |
rejected_response = state.get("selected_response")
|
|
|
|
| 60 |
intent_kind = classify_intent_kind(state.get("intent_route"))
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
)
|
| 81 |
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
t_gen = time.perf_counter() - t0
|
| 87 |
latency_log = dict(state.get("latency_log") or {})
|
|
@@ -94,15 +491,32 @@ def _run(state: PipelineState, tier: str) -> dict:
|
|
| 94 |
4,
|
| 95 |
)
|
| 96 |
|
| 97 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
return {
|
| 99 |
"augmented_prompt": augmented_prompt,
|
| 100 |
-
"candidates":
|
| 101 |
"selected_response": selected,
|
| 102 |
"llm_tier_used": tier,
|
| 103 |
"llm_model_used": active_model(tier),
|
| 104 |
"latency_log": latency_log,
|
| 105 |
-
"guardrail_passed":
|
| 106 |
}
|
| 107 |
|
| 108 |
|
|
@@ -128,6 +542,7 @@ def _build_messages(
|
|
| 128 |
gesture_tag: str | None = None,
|
| 129 |
air_written_text: str | None = None,
|
| 130 |
rejected_response: str | None = None,
|
|
|
|
| 131 |
intent_kind: str = "memory",
|
| 132 |
affect: str = "NEUTRAL",
|
| 133 |
) -> list[dict]:
|
|
@@ -146,6 +561,7 @@ def _build_messages(
|
|
| 146 |
air_written_text,
|
| 147 |
profile["name"],
|
| 148 |
rejected_response=rejected_response,
|
|
|
|
| 149 |
intent_kind=intent_kind,
|
| 150 |
affect=affect,
|
| 151 |
)
|
|
@@ -206,12 +622,14 @@ def _build_user(
|
|
| 206 |
persona_name: str,
|
| 207 |
*,
|
| 208 |
rejected_response: str | None = None,
|
|
|
|
| 209 |
intent_kind: str = "memory",
|
| 210 |
affect: str = "NEUTRAL",
|
| 211 |
) -> str:
|
| 212 |
personal_chunks = [c for c in chunks if c.get("source", "personal") == "personal"]
|
| 213 |
contextual_chunks = [c for c in chunks if c.get("source") == "contextual"]
|
| 214 |
open_domain_chunks = [c for c in chunks if c.get("source") == "open_domain"]
|
|
|
|
| 215 |
|
| 216 |
memory_block = (
|
| 217 |
"\n".join(
|
|
@@ -220,6 +638,11 @@ def _build_user(
|
|
| 220 |
)
|
| 221 |
or " (no memories retrieved)"
|
| 222 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 223 |
contextual_block = (
|
| 224 |
"\n".join(f" {c['text']}" for c in contextual_chunks)
|
| 225 |
or " (nothing relevant from this session)"
|
|
@@ -277,6 +700,15 @@ def _build_user(
|
|
| 277 |
f"\nYour previous reply (which you need to replace, not repeat): "
|
| 278 |
f'"{safe_rejected}"'
|
| 279 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 280 |
|
| 281 |
if intent_kind == "present_state":
|
| 282 |
affect_hint = _AFFECT_HINTS.get(affect, _AFFECT_HINTS["NEUTRAL"])
|
|
@@ -299,8 +731,16 @@ Reply as {persona_name} in 1–2 sentences, first person.
|
|
| 299 |
- If the affect read is NEUTRAL or doesn't match what you'd say, it's better to say "I'm not sure" or "honestly, I don't really know right now" than to invent.
|
| 300 |
- Do NOT use autobiographical facts (job, family, hobbies) unless the partner asked."""
|
| 301 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 302 |
return f"""\
|
| 303 |
-
{directive_block}{air_writing_block}{turnaround_line}{persona_instruction_line}
|
| 304 |
|
| 305 |
Personal memories:
|
| 306 |
{memory_block}
|
|
|
|
| 1 |
+
import concurrent.futures
|
| 2 |
+
import queue
|
| 3 |
+
import random
|
| 4 |
+
import threading
|
| 5 |
import time
|
| 6 |
+
from collections.abc import Iterator
|
| 7 |
|
| 8 |
from backend.config.settings import settings
|
| 9 |
+
from backend.generation.llm_client import (
|
| 10 |
+
active_model,
|
| 11 |
+
chat_complete,
|
| 12 |
+
chat_complete_stream,
|
| 13 |
+
finalize_streamed,
|
| 14 |
+
)
|
| 15 |
from backend.guardrails.checks import check_output
|
| 16 |
from backend.pipeline.intent_kind import classify_intent_kind
|
| 17 |
+
from backend.pipeline.state import Candidate, PipelineState, StyleDirective
|
| 18 |
+
from backend.retrieval import pick_index
|
| 19 |
from backend.sensing.labels import GESTURE_DIRECTIVES
|
| 20 |
|
| 21 |
+
# For present-state fan-out: three fixed emotional reads the persona can
|
| 22 |
+
# project, so the user can pick among "good / fine / not great" rather than
|
| 23 |
+
# three paraphrases of one mood.
|
| 24 |
+
_PRESENT_STATE_STRATEGIES = [
|
| 25 |
+
("present_good", "HAPPY"),
|
| 26 |
+
("present_fine", "NEUTRAL"),
|
| 27 |
+
("present_rough", "FRUSTRATED"),
|
| 28 |
+
]
|
| 29 |
+
|
| 30 |
_PERSONA_MOD_INSTRUCTIONS = {
|
| 31 |
"amplify_quirks": "Amplify your characteristic style and personality.",
|
| 32 |
"suppress_humor": "Be direct and supportive. Suppress humor.",
|
|
|
|
| 50 |
"read (if you said 'good', try 'not great') or honestly admit "
|
| 51 |
"you're not sure how you feel right now. Do NOT invent details."
|
| 52 |
),
|
| 53 |
+
"all_rejected": (
|
| 54 |
+
"The user rejected every option you gave last time. Try a "
|
| 55 |
+
"meaningfully different angle — different memory focus, different "
|
| 56 |
+
"emotional register, or admit you don't have a clean answer. Do "
|
| 57 |
+
"NOT re-use wording from the rejected options."
|
| 58 |
+
),
|
| 59 |
}
|
| 60 |
|
| 61 |
|
|
|
|
| 67 |
return _run(state, tier="fallback")
|
| 68 |
|
| 69 |
|
| 70 |
+
def run_primary_stream(state: PipelineState) -> Iterator[dict]:
|
| 71 |
+
"""Token-level streaming variant of the planner.
|
| 72 |
+
|
| 73 |
+
Yields events as tokens arrive across all concurrent candidate streams:
|
| 74 |
+
{"type": "candidate_start", "idx": 0, "strategy": "broad", "grounded_buckets": [...]}
|
| 75 |
+
{"type": "token", "idx": 0, "delta": "Hello"}
|
| 76 |
+
{"type": "candidate_done", "idx": 0, "text": "Hello world."}
|
| 77 |
+
{"type": "side_index", "text": "..."} (optional, at start if there's a hit)
|
| 78 |
+
{"type": "complete", "candidates": [...], "selected_response": "...", ... final state dict}
|
| 79 |
+
"""
|
| 80 |
+
yield from _run_stream(state, tier="primary")
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
_STREAM_SENTINEL = object()
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
def _run_stream(state: PipelineState, tier: str) -> Iterator[dict]:
|
| 87 |
t0 = time.perf_counter()
|
| 88 |
|
| 89 |
profile = state["persona_profile"]
|
|
|
|
| 99 |
rejected_response: str | None = None
|
| 100 |
if turnaround_triggered:
|
| 101 |
rejected_response = state.get("selected_response")
|
| 102 |
+
rejected_candidates: list[str] = list(state.get("rejected_candidates") or [])
|
| 103 |
intent_kind = classify_intent_kind(state.get("intent_route"))
|
| 104 |
+
max_tokens = gen_cfg.get("max_tokens", settings.max_tokens_neutral)
|
| 105 |
+
|
| 106 |
+
# Turnaround rephrases are single-shot; everything else fans out.
|
| 107 |
+
# Present-state varies affect (good/fine/rough), memory questions vary
|
| 108 |
+
# which chunks are primary (broad/focused/serendipitous).
|
| 109 |
+
single_shot = turnaround_triggered
|
| 110 |
+
is_present_state = intent_kind == "present_state"
|
| 111 |
+
if single_shot:
|
| 112 |
+
strategies: list[tuple[str, str | None]] = [("focused", None)]
|
| 113 |
+
elif is_present_state:
|
| 114 |
+
strategies = list(_PRESENT_STATE_STRATEGIES)
|
| 115 |
+
else:
|
| 116 |
+
strategies = [
|
| 117 |
+
("broad", None),
|
| 118 |
+
("focused", None),
|
| 119 |
+
("serendipitous", None),
|
| 120 |
+
]
|
| 121 |
+
# Higher temp on regenerate; also bump for present-state since three
|
| 122 |
+
# strategies share the same (empty) grounding and need sampling noise.
|
| 123 |
+
base_temp = 1.0 if (rejected_candidates or is_present_state) else 0.7
|
| 124 |
+
|
| 125 |
+
# Optional side-index hit — surface as an extra card right away, not generated.
|
| 126 |
+
side_index_candidate: Candidate | None = None
|
| 127 |
+
if not single_shot and not is_present_state:
|
| 128 |
+
try:
|
| 129 |
+
hit = pick_index.lookup(
|
| 130 |
+
query=state["raw_query"],
|
| 131 |
+
user_id=state["user_id"],
|
| 132 |
+
threshold=0.85,
|
| 133 |
+
)
|
| 134 |
+
except Exception as exc:
|
| 135 |
+
print(f"[planner] pick_index lookup failed: {exc!r}")
|
| 136 |
+
hit = None
|
| 137 |
+
if hit:
|
| 138 |
+
text = (hit.get("picked_text") or "").strip()
|
| 139 |
+
if text:
|
| 140 |
+
side_index_candidate = Candidate(
|
| 141 |
+
text=text,
|
| 142 |
+
strategy="side_index",
|
| 143 |
+
grounded_buckets=[],
|
| 144 |
+
)
|
| 145 |
+
|
| 146 |
+
# Pre-announce each candidate slot so the UI can draw empty cards immediately.
|
| 147 |
+
cards: list[dict] = []
|
| 148 |
+
if side_index_candidate:
|
| 149 |
+
cards.append({"strategy": "side_index", "grounded_buckets": []})
|
| 150 |
+
for strategy_name, _affect_override in strategies:
|
| 151 |
+
if is_present_state:
|
| 152 |
+
card_buckets: list[str] = []
|
| 153 |
+
else:
|
| 154 |
+
strategy_chunks = _pick_strategy_chunks(list(chunks), strategy_name)
|
| 155 |
+
card_buckets = [c.get("bucket", "") for c in strategy_chunks]
|
| 156 |
+
cards.append(
|
| 157 |
+
{
|
| 158 |
+
"strategy": strategy_name,
|
| 159 |
+
"grounded_buckets": card_buckets,
|
| 160 |
+
}
|
| 161 |
+
)
|
| 162 |
+
for idx, card in enumerate(cards):
|
| 163 |
+
yield {
|
| 164 |
+
"type": "candidate_start",
|
| 165 |
+
"idx": idx,
|
| 166 |
+
"strategy": card["strategy"],
|
| 167 |
+
"grounded_buckets": card["grounded_buckets"],
|
| 168 |
+
}
|
| 169 |
+
if side_index_candidate is not None:
|
| 170 |
+
yield {
|
| 171 |
+
"type": "candidate_done",
|
| 172 |
+
"idx": 0,
|
| 173 |
+
"text": side_index_candidate["text"],
|
| 174 |
+
}
|
| 175 |
+
|
| 176 |
+
# Spawn a worker thread per strategy. Each one streams tokens into a shared
|
| 177 |
+
# queue; the generator forwards them as SSE events.
|
| 178 |
+
llm_cards_offset = 1 if side_index_candidate else 0
|
| 179 |
+
evt_queue: queue.Queue[dict | object] = queue.Queue()
|
| 180 |
+
completed: list[Candidate | None] = [None] * len(strategies)
|
| 181 |
+
completed_lock = threading.Lock()
|
| 182 |
+
|
| 183 |
+
def _worker(slot: int, strategy: str, affect_override: str | None) -> None:
|
| 184 |
+
if is_present_state:
|
| 185 |
+
strategy_chunks = [] # present-state has no memory grounding
|
| 186 |
+
else:
|
| 187 |
+
strategy_chunks = _pick_strategy_chunks(list(chunks), strategy)
|
| 188 |
+
effective_affect = affect_override if affect_override is not None else affect
|
| 189 |
+
messages = _build_messages(
|
| 190 |
+
profile,
|
| 191 |
+
strategy_chunks,
|
| 192 |
+
history,
|
| 193 |
+
state["raw_query"],
|
| 194 |
+
style,
|
| 195 |
+
gen_cfg,
|
| 196 |
+
gesture_tag=gesture_tag,
|
| 197 |
+
air_written_text=air_written_text,
|
| 198 |
+
rejected_response=rejected_response,
|
| 199 |
+
rejected_candidates=rejected_candidates,
|
| 200 |
+
intent_kind=intent_kind,
|
| 201 |
+
affect=effective_affect,
|
| 202 |
+
)
|
| 203 |
+
buf: list[str] = []
|
| 204 |
+
try:
|
| 205 |
+
for piece in chat_complete_stream(
|
| 206 |
+
messages=messages,
|
| 207 |
+
max_tokens=max_tokens,
|
| 208 |
+
temperature=base_temp,
|
| 209 |
+
tier=tier,
|
| 210 |
+
):
|
| 211 |
+
buf.append(piece)
|
| 212 |
+
evt_queue.put(
|
| 213 |
+
{
|
| 214 |
+
"type": "token",
|
| 215 |
+
"idx": llm_cards_offset + slot,
|
| 216 |
+
"delta": piece,
|
| 217 |
+
}
|
| 218 |
+
)
|
| 219 |
+
except Exception as exc:
|
| 220 |
+
evt_queue.put(
|
| 221 |
+
{
|
| 222 |
+
"type": "candidate_error",
|
| 223 |
+
"idx": llm_cards_offset + slot,
|
| 224 |
+
"error": repr(exc),
|
| 225 |
+
}
|
| 226 |
+
)
|
| 227 |
+
with completed_lock:
|
| 228 |
+
completed[slot] = None
|
| 229 |
+
evt_queue.put(_STREAM_SENTINEL)
|
| 230 |
+
return
|
| 231 |
+
|
| 232 |
+
final = finalize_streamed("".join(buf))
|
| 233 |
+
guard = check_output(final, strategy_chunks)
|
| 234 |
+
if not guard["passed"]:
|
| 235 |
+
final = guard["fallback"]
|
| 236 |
+
cand = Candidate(
|
| 237 |
+
text=final,
|
| 238 |
+
strategy=strategy,
|
| 239 |
+
grounded_buckets=[c.get("bucket", "") for c in strategy_chunks],
|
| 240 |
+
)
|
| 241 |
+
with completed_lock:
|
| 242 |
+
completed[slot] = cand
|
| 243 |
+
evt_queue.put(
|
| 244 |
+
{
|
| 245 |
+
"type": "candidate_done",
|
| 246 |
+
"idx": llm_cards_offset + slot,
|
| 247 |
+
"text": final,
|
| 248 |
+
}
|
| 249 |
+
)
|
| 250 |
+
evt_queue.put(_STREAM_SENTINEL)
|
| 251 |
+
|
| 252 |
+
threads = [
|
| 253 |
+
threading.Thread(target=_worker, args=(i, s, a), daemon=True)
|
| 254 |
+
for i, (s, a) in enumerate(strategies)
|
| 255 |
+
]
|
| 256 |
+
for t in threads:
|
| 257 |
+
t.start()
|
| 258 |
+
|
| 259 |
+
remaining = len(threads)
|
| 260 |
+
while remaining > 0:
|
| 261 |
+
evt = evt_queue.get()
|
| 262 |
+
if evt is _STREAM_SENTINEL:
|
| 263 |
+
remaining -= 1
|
| 264 |
+
continue
|
| 265 |
+
yield evt # type: ignore[misc]
|
| 266 |
+
|
| 267 |
+
for t in threads:
|
| 268 |
+
t.join()
|
| 269 |
+
|
| 270 |
+
with completed_lock:
|
| 271 |
+
llm_cands = [c for c in completed if c is not None]
|
| 272 |
+
all_cands: list[Candidate] = []
|
| 273 |
+
if side_index_candidate is not None:
|
| 274 |
+
all_cands.append(side_index_candidate)
|
| 275 |
+
all_cands.extend(llm_cands)
|
| 276 |
+
|
| 277 |
+
# De-dupe against rejected + each other.
|
| 278 |
+
seen: set[str] = {r.strip().lower() for r in rejected_candidates if r}
|
| 279 |
+
uniq: list[Candidate] = []
|
| 280 |
+
for c in all_cands:
|
| 281 |
+
key = c["text"].strip().lower()
|
| 282 |
+
if key and key not in seen:
|
| 283 |
+
seen.add(key)
|
| 284 |
+
uniq.append(c)
|
| 285 |
+
if not uniq:
|
| 286 |
+
# Every candidate was a dup-of-rejected or guardrail-rejected. Surface
|
| 287 |
+
# a non-empty placeholder so the UI isn't showing a blank bubble and
|
| 288 |
+
# the user knows they can regenerate. Logged so we notice if this ever
|
| 289 |
+
# fires in practice — it means something upstream collapsed.
|
| 290 |
+
print(
|
| 291 |
+
f"[planner] empty-candidate fallback fired "
|
| 292 |
+
f"user={state.get('user_id')!r} turn_id={state.get('turn_id')} "
|
| 293 |
+
f"raw_query={state.get('raw_query', '')[:80]!r}"
|
| 294 |
+
)
|
| 295 |
+
uniq = all_cands[:1] or [
|
| 296 |
+
Candidate(
|
| 297 |
+
text="I'm not sure how to answer that — try asking in a different way.",
|
| 298 |
+
strategy="empty",
|
| 299 |
+
grounded_buckets=[],
|
| 300 |
+
)
|
| 301 |
+
]
|
| 302 |
+
|
| 303 |
+
selected = uniq[0]["text"]
|
| 304 |
|
| 305 |
+
t_gen = time.perf_counter() - t0
|
| 306 |
+
latency_log = dict(state.get("latency_log") or {})
|
| 307 |
+
latency_log["t_generation"] = round(t_gen, 4)
|
| 308 |
+
latency_log["t_total"] = round(
|
| 309 |
+
latency_log.get("t_sensing", 0)
|
| 310 |
+
+ latency_log.get("t_intent", 0)
|
| 311 |
+
+ latency_log.get("t_retrieval", 0)
|
| 312 |
+
+ t_gen,
|
| 313 |
+
4,
|
| 314 |
)
|
| 315 |
|
| 316 |
+
yield {
|
| 317 |
+
"type": "complete",
|
| 318 |
+
"planner_update": {
|
| 319 |
+
"augmented_prompt": None, # skipping for streaming — not worth rebuilding
|
| 320 |
+
"candidates": uniq,
|
| 321 |
+
"selected_response": selected,
|
| 322 |
+
"llm_tier_used": tier,
|
| 323 |
+
"llm_model_used": active_model(tier),
|
| 324 |
+
"latency_log": latency_log,
|
| 325 |
+
"guardrail_passed": True,
|
| 326 |
+
},
|
| 327 |
+
}
|
| 328 |
+
|
| 329 |
+
|
| 330 |
+
def _pick_strategy_chunks(all_chunks: list[dict], strategy: str) -> list[dict]:
|
| 331 |
+
"""Select which chunks become the *primary* grounding for a candidate.
|
| 332 |
+
Non-personal chunks (contextual, open_domain) always pass through —
|
| 333 |
+
they're small and query-grounded, not memory variation.
|
| 334 |
+
"""
|
| 335 |
+
personal = [c for c in all_chunks if c.get("source", "personal") == "personal"]
|
| 336 |
+
others = [c for c in all_chunks if c.get("source", "personal") != "personal"]
|
| 337 |
+
|
| 338 |
+
if not personal:
|
| 339 |
+
return all_chunks
|
| 340 |
+
|
| 341 |
+
if strategy == "broad":
|
| 342 |
+
chosen = personal
|
| 343 |
+
elif strategy == "focused":
|
| 344 |
+
chosen = personal[:1]
|
| 345 |
+
elif strategy == "serendipitous":
|
| 346 |
+
if len(personal) >= 2:
|
| 347 |
+
pool = personal[1:]
|
| 348 |
+
k = min(len(pool), max(1, len(personal) - 1))
|
| 349 |
+
chosen = random.sample(pool, k)
|
| 350 |
+
else:
|
| 351 |
+
chosen = personal
|
| 352 |
+
else:
|
| 353 |
+
chosen = personal
|
| 354 |
+
|
| 355 |
+
return chosen + others
|
| 356 |
+
|
| 357 |
+
|
| 358 |
+
def _run(state: PipelineState, tier: str) -> dict:
|
| 359 |
+
t0 = time.perf_counter()
|
| 360 |
+
|
| 361 |
+
profile = state["persona_profile"]
|
| 362 |
+
affect = (state.get("affect") or {}).get("emotion", "NEUTRAL")
|
| 363 |
+
gen_cfg = state.get("generation_config") or {}
|
| 364 |
+
chunks = state.get("retrieved_chunks") or []
|
| 365 |
+
history = (state.get("session_history") or [])[-20:]
|
| 366 |
+
|
| 367 |
+
style: StyleDirective = gen_cfg["style"]
|
| 368 |
+
gesture_tag = state.get("gesture_tag")
|
| 369 |
+
air_written_text = state.get("air_written_text")
|
| 370 |
+
turnaround_triggered = state.get("turnaround_triggered", False)
|
| 371 |
+
rejected_response: str | None = None
|
| 372 |
+
if turnaround_triggered:
|
| 373 |
+
rejected_response = state.get("selected_response")
|
| 374 |
+
rejected_candidates: list[str] = list(state.get("rejected_candidates") or [])
|
| 375 |
+
intent_kind = classify_intent_kind(state.get("intent_route"))
|
| 376 |
+
max_tokens = gen_cfg.get("max_tokens", settings.max_tokens_neutral)
|
| 377 |
+
|
| 378 |
+
# Turnaround rephrases are single-shot; everything else fans out. Present-
|
| 379 |
+
# state varies affect (good/fine/rough), memory questions vary chunks
|
| 380 |
+
# (broad/focused/serendipitous).
|
| 381 |
+
single_shot = turnaround_triggered
|
| 382 |
+
is_present_state = intent_kind == "present_state"
|
| 383 |
+
if single_shot:
|
| 384 |
+
strategies_cfg: list[tuple[str, str | None]] = [("focused", None)]
|
| 385 |
+
elif is_present_state:
|
| 386 |
+
strategies_cfg = list(_PRESENT_STATE_STRATEGIES)
|
| 387 |
+
else:
|
| 388 |
+
strategies_cfg = [
|
| 389 |
+
("broad", None),
|
| 390 |
+
("focused", None),
|
| 391 |
+
("serendipitous", None),
|
| 392 |
+
]
|
| 393 |
+
|
| 394 |
+
base_temp = 1.0 if (rejected_candidates or is_present_state) else 0.7
|
| 395 |
+
|
| 396 |
+
def _gen_one(cfg: tuple[str, str | None]) -> Candidate:
|
| 397 |
+
strategy, affect_override = cfg
|
| 398 |
+
if is_present_state:
|
| 399 |
+
strategy_chunks: list[dict] = []
|
| 400 |
+
else:
|
| 401 |
+
strategy_chunks = _pick_strategy_chunks(list(chunks), strategy)
|
| 402 |
+
effective_affect = affect_override if affect_override is not None else affect
|
| 403 |
+
messages = _build_messages(
|
| 404 |
+
profile,
|
| 405 |
+
strategy_chunks,
|
| 406 |
+
history,
|
| 407 |
+
state["raw_query"],
|
| 408 |
+
style,
|
| 409 |
+
gen_cfg,
|
| 410 |
+
gesture_tag=gesture_tag,
|
| 411 |
+
air_written_text=air_written_text,
|
| 412 |
+
rejected_response=rejected_response,
|
| 413 |
+
rejected_candidates=rejected_candidates,
|
| 414 |
+
intent_kind=intent_kind,
|
| 415 |
+
affect=effective_affect,
|
| 416 |
+
)
|
| 417 |
+
text = chat_complete(
|
| 418 |
+
messages=messages,
|
| 419 |
+
max_tokens=max_tokens,
|
| 420 |
+
temperature=base_temp,
|
| 421 |
+
tier=tier,
|
| 422 |
+
)
|
| 423 |
+
guard = check_output(text, strategy_chunks)
|
| 424 |
+
if not guard["passed"]:
|
| 425 |
+
text = guard["fallback"]
|
| 426 |
+
return Candidate(
|
| 427 |
+
text=text,
|
| 428 |
+
strategy=strategy,
|
| 429 |
+
grounded_buckets=[c.get("bucket", "") for c in strategy_chunks],
|
| 430 |
+
)
|
| 431 |
+
|
| 432 |
+
if len(strategies_cfg) == 1:
|
| 433 |
+
candidates = [_gen_one(strategies_cfg[0])]
|
| 434 |
+
else:
|
| 435 |
+
with concurrent.futures.ThreadPoolExecutor(
|
| 436 |
+
max_workers=len(strategies_cfg)
|
| 437 |
+
) as pool:
|
| 438 |
+
candidates = list(pool.map(_gen_one, strategies_cfg))
|
| 439 |
+
|
| 440 |
+
# Side-index hit: if the user has picked a similar query before, surface the
|
| 441 |
+
# previously-picked text as an extra candidate. Not generated by the LLM;
|
| 442 |
+
# skipped on single-shot (turnaround/present-state) so rephrases always
|
| 443 |
+
# produce fresh text.
|
| 444 |
+
if not single_shot and not is_present_state:
|
| 445 |
+
try:
|
| 446 |
+
hit = pick_index.lookup(
|
| 447 |
+
query=state["raw_query"],
|
| 448 |
+
user_id=state["user_id"],
|
| 449 |
+
threshold=0.85,
|
| 450 |
+
)
|
| 451 |
+
except Exception as exc:
|
| 452 |
+
print(f"[planner] pick_index lookup failed: {exc!r}")
|
| 453 |
+
hit = None
|
| 454 |
+
if hit:
|
| 455 |
+
text = hit.get("picked_text", "").strip()
|
| 456 |
+
if text and text.lower() not in {
|
| 457 |
+
c["text"].strip().lower() for c in candidates
|
| 458 |
+
}:
|
| 459 |
+
candidates.insert(
|
| 460 |
+
0,
|
| 461 |
+
Candidate(
|
| 462 |
+
text=text,
|
| 463 |
+
strategy="side_index",
|
| 464 |
+
grounded_buckets=[],
|
| 465 |
+
),
|
| 466 |
+
)
|
| 467 |
+
|
| 468 |
+
# De-dupe by normalised text — if two strategies produced the same response,
|
| 469 |
+
# keep the first. Also exclude anything the user already rejected this turn.
|
| 470 |
+
# Don't retry; latency budget matters more than N=3 on the dot.
|
| 471 |
+
seen: set[str] = {r.strip().lower() for r in rejected_candidates if r}
|
| 472 |
+
uniq: list[Candidate] = []
|
| 473 |
+
for c in candidates:
|
| 474 |
+
key = c["text"].strip().lower()
|
| 475 |
+
if key and key not in seen:
|
| 476 |
+
seen.add(key)
|
| 477 |
+
uniq.append(c)
|
| 478 |
+
if not uniq:
|
| 479 |
+
uniq = candidates[:1] # every guardrail rejected — fall back to the first
|
| 480 |
+
|
| 481 |
+
selected = uniq[0]["text"]
|
| 482 |
|
| 483 |
t_gen = time.perf_counter() - t0
|
| 484 |
latency_log = dict(state.get("latency_log") or {})
|
|
|
|
| 491 |
4,
|
| 492 |
)
|
| 493 |
|
| 494 |
+
# Represent the default-candidate prompt in augmented_prompt for logging.
|
| 495 |
+
default_strategy_chunks = _pick_strategy_chunks(list(chunks), uniq[0]["strategy"])
|
| 496 |
+
default_messages = _build_messages(
|
| 497 |
+
profile,
|
| 498 |
+
default_strategy_chunks,
|
| 499 |
+
history,
|
| 500 |
+
state["raw_query"],
|
| 501 |
+
style,
|
| 502 |
+
gen_cfg,
|
| 503 |
+
gesture_tag=gesture_tag,
|
| 504 |
+
air_written_text=air_written_text,
|
| 505 |
+
rejected_response=rejected_response,
|
| 506 |
+
intent_kind=intent_kind,
|
| 507 |
+
affect=affect,
|
| 508 |
+
)
|
| 509 |
+
augmented_prompt = "\n\n".join(
|
| 510 |
+
f"[{m['role']}] {m['content']}" for m in default_messages
|
| 511 |
+
)
|
| 512 |
return {
|
| 513 |
"augmented_prompt": augmented_prompt,
|
| 514 |
+
"candidates": uniq,
|
| 515 |
"selected_response": selected,
|
| 516 |
"llm_tier_used": tier,
|
| 517 |
"llm_model_used": active_model(tier),
|
| 518 |
"latency_log": latency_log,
|
| 519 |
+
"guardrail_passed": True,
|
| 520 |
}
|
| 521 |
|
| 522 |
|
|
|
|
| 542 |
gesture_tag: str | None = None,
|
| 543 |
air_written_text: str | None = None,
|
| 544 |
rejected_response: str | None = None,
|
| 545 |
+
rejected_candidates: list[str] | None = None,
|
| 546 |
intent_kind: str = "memory",
|
| 547 |
affect: str = "NEUTRAL",
|
| 548 |
) -> list[dict]:
|
|
|
|
| 561 |
air_written_text,
|
| 562 |
profile["name"],
|
| 563 |
rejected_response=rejected_response,
|
| 564 |
+
rejected_candidates=rejected_candidates,
|
| 565 |
intent_kind=intent_kind,
|
| 566 |
affect=affect,
|
| 567 |
)
|
|
|
|
| 622 |
persona_name: str,
|
| 623 |
*,
|
| 624 |
rejected_response: str | None = None,
|
| 625 |
+
rejected_candidates: list[str] | None = None,
|
| 626 |
intent_kind: str = "memory",
|
| 627 |
affect: str = "NEUTRAL",
|
| 628 |
) -> str:
|
| 629 |
personal_chunks = [c for c in chunks if c.get("source", "personal") == "personal"]
|
| 630 |
contextual_chunks = [c for c in chunks if c.get("source") == "contextual"]
|
| 631 |
open_domain_chunks = [c for c in chunks if c.get("source") == "open_domain"]
|
| 632 |
+
prior_pick_chunks = [c for c in chunks if c.get("source") == "prior_pick"]
|
| 633 |
|
| 634 |
memory_block = (
|
| 635 |
"\n".join(
|
|
|
|
| 638 |
)
|
| 639 |
or " (no memories retrieved)"
|
| 640 |
)
|
| 641 |
+
prior_pick_block = (
|
| 642 |
+
"\n".join(f" {c['text']}" for c in prior_pick_chunks)
|
| 643 |
+
if prior_pick_chunks
|
| 644 |
+
else ""
|
| 645 |
+
)
|
| 646 |
contextual_block = (
|
| 647 |
"\n".join(f" {c['text']}" for c in contextual_chunks)
|
| 648 |
or " (nothing relevant from this session)"
|
|
|
|
| 700 |
f"\nYour previous reply (which you need to replace, not repeat): "
|
| 701 |
f'"{safe_rejected}"'
|
| 702 |
)
|
| 703 |
+
if rejected_candidates:
|
| 704 |
+
safe_list = [
|
| 705 |
+
r.replace('"', "'").replace("\n", " ")[:300] for r in rejected_candidates
|
| 706 |
+
][:10]
|
| 707 |
+
rejected_block = "\n".join(f' - "{r}"' for r in safe_list)
|
| 708 |
+
turnaround_line += (
|
| 709 |
+
f"\nThe user rejected these options you gave last time "
|
| 710 |
+
f"(do NOT re-use their wording or angle):\n{rejected_block}"
|
| 711 |
+
)
|
| 712 |
|
| 713 |
if intent_kind == "present_state":
|
| 714 |
affect_hint = _AFFECT_HINTS.get(affect, _AFFECT_HINTS["NEUTRAL"])
|
|
|
|
| 731 |
- If the affect read is NEUTRAL or doesn't match what you'd say, it's better to say "I'm not sure" or "honestly, I don't really know right now" than to invent.
|
| 732 |
- Do NOT use autobiographical facts (job, family, hobbies) unless the partner asked."""
|
| 733 |
|
| 734 |
+
prior_pick_section = (
|
| 735 |
+
f"\n\nWhen asked this kind of thing before, you answered like:\n{prior_pick_block}\n"
|
| 736 |
+
"Treat this as your own prior voice — re-use the phrasing if it still fits, "
|
| 737 |
+
"or stay in the same register if you'd answer slightly differently now."
|
| 738 |
+
if prior_pick_block
|
| 739 |
+
else ""
|
| 740 |
+
)
|
| 741 |
+
|
| 742 |
return f"""\
|
| 743 |
+
{directive_block}{air_writing_block}{turnaround_line}{persona_instruction_line}{prior_pick_section}
|
| 744 |
|
| 745 |
Personal memories:
|
| 746 |
{memory_block}
|
|
@@ -8,10 +8,17 @@ import torch
|
|
| 8 |
from backend.config.settings import settings
|
| 9 |
from backend.pipeline.intent_kind import is_present_state_only
|
| 10 |
from backend.pipeline.state import PipelineState, RetrievedChunk, SubIntent
|
|
|
|
| 11 |
from backend.retrieval.contextual import retrieve_from_history
|
|
|
|
| 12 |
from backend.retrieval.reranker import build_context_vector, mmr_rerank
|
| 13 |
from backend.retrieval.vector_store import get_device, get_embedder, retrieve
|
| 14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
_OPEN_DOMAIN_STUB_TEXT = (
|
| 16 |
"(no external knowledge source wired — answer from general knowledge)"
|
| 17 |
)
|
|
@@ -22,9 +29,14 @@ def run_fast(state: PipelineState) -> dict:
|
|
| 22 |
t0 = time.perf_counter()
|
| 23 |
if is_present_state_only(state.get("intent_route")):
|
| 24 |
return _build_return(state, [], "skipped_present_state", t0, 0.0)
|
|
|
|
|
|
|
| 25 |
final_k = settings.retrieval_fast_k
|
| 26 |
pool_k = settings.rerank_fast_pool_k
|
| 27 |
chunks, t_rerank = _dispatch_all(state, pool_k=pool_k, final_k=final_k)
|
|
|
|
|
|
|
|
|
|
| 28 |
return _build_return(state, chunks, "fast", t0, t_rerank)
|
| 29 |
|
| 30 |
|
|
@@ -33,12 +45,83 @@ def run_full(state: PipelineState) -> dict:
|
|
| 33 |
t0 = time.perf_counter()
|
| 34 |
if is_present_state_only(state.get("intent_route")):
|
| 35 |
return _build_return(state, [], "skipped_present_state", t0, 0.0)
|
|
|
|
|
|
|
| 36 |
final_k = settings.retrieval_rerank_k
|
| 37 |
pool_k = settings.rerank_pool_k
|
| 38 |
chunks, t_rerank = _dispatch_all(state, pool_k=pool_k, final_k=final_k)
|
|
|
|
|
|
|
|
|
|
| 39 |
return _build_return(state, chunks, "full", t0, t_rerank)
|
| 40 |
|
| 41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
def _dispatch_all(
|
| 43 |
state: PipelineState, pool_k: int, final_k: int
|
| 44 |
) -> tuple[list[RetrievedChunk], float]:
|
|
|
|
| 8 |
from backend.config.settings import settings
|
| 9 |
from backend.pipeline.intent_kind import is_present_state_only
|
| 10 |
from backend.pipeline.state import PipelineState, RetrievedChunk, SubIntent
|
| 11 |
+
from backend.retrieval import pick_index
|
| 12 |
from backend.retrieval.contextual import retrieve_from_history
|
| 13 |
+
from backend.retrieval.priors import BUCKETS
|
| 14 |
from backend.retrieval.reranker import build_context_vector, mmr_rerank
|
| 15 |
from backend.retrieval.vector_store import get_device, get_embedder, retrieve
|
| 16 |
|
| 17 |
+
# Weight of the pick-history bucket prior, relative to the session bucket prior.
|
| 18 |
+
# 0.3 means: a user who always picks "family" over "medical" gets a noticeable
|
| 19 |
+
# but not overwhelming nudge — session-in-progress signals still dominate.
|
| 20 |
+
_PICK_PRIOR_WEIGHT = 0.3
|
| 21 |
+
|
| 22 |
_OPEN_DOMAIN_STUB_TEXT = (
|
| 23 |
"(no external knowledge source wired — answer from general knowledge)"
|
| 24 |
)
|
|
|
|
| 29 |
t0 = time.perf_counter()
|
| 30 |
if is_present_state_only(state.get("intent_route")):
|
| 31 |
return _build_return(state, [], "skipped_present_state", t0, 0.0)
|
| 32 |
+
session_priors = state.get("bucket_priors")
|
| 33 |
+
_blend_pick_history_into_priors(state)
|
| 34 |
final_k = settings.retrieval_fast_k
|
| 35 |
pool_k = settings.rerank_fast_pool_k
|
| 36 |
chunks, t_rerank = _dispatch_all(state, pool_k=pool_k, final_k=final_k)
|
| 37 |
+
if session_priors is not None:
|
| 38 |
+
state["bucket_priors"] = session_priors # blend was transient
|
| 39 |
+
chunks = _prepend_prior_pick(state, chunks)
|
| 40 |
return _build_return(state, chunks, "fast", t0, t_rerank)
|
| 41 |
|
| 42 |
|
|
|
|
| 45 |
t0 = time.perf_counter()
|
| 46 |
if is_present_state_only(state.get("intent_route")):
|
| 47 |
return _build_return(state, [], "skipped_present_state", t0, 0.0)
|
| 48 |
+
session_priors = state.get("bucket_priors")
|
| 49 |
+
_blend_pick_history_into_priors(state)
|
| 50 |
final_k = settings.retrieval_rerank_k
|
| 51 |
pool_k = settings.rerank_pool_k
|
| 52 |
chunks, t_rerank = _dispatch_all(state, pool_k=pool_k, final_k=final_k)
|
| 53 |
+
if session_priors is not None:
|
| 54 |
+
state["bucket_priors"] = session_priors # blend was transient
|
| 55 |
+
chunks = _prepend_prior_pick(state, chunks)
|
| 56 |
return _build_return(state, chunks, "full", t0, t_rerank)
|
| 57 |
|
| 58 |
|
| 59 |
+
def _blend_pick_history_into_priors(state: PipelineState) -> None:
|
| 60 |
+
"""Mix cumulative bucket-pick counts into this turn's bucket_priors.
|
| 61 |
+
|
| 62 |
+
Mutates state in-place. Session priors still dominate; pick history adds
|
| 63 |
+
a small, steady bias toward buckets the user has historically picked.
|
| 64 |
+
"""
|
| 65 |
+
try:
|
| 66 |
+
counts = pick_index.bucket_pick_counts(state["user_id"])
|
| 67 |
+
except Exception as exc:
|
| 68 |
+
print(f"[retrieval] pick_index.bucket_pick_counts failed: {exc!r}")
|
| 69 |
+
return
|
| 70 |
+
if not counts:
|
| 71 |
+
return
|
| 72 |
+
total = sum(counts.values())
|
| 73 |
+
if total <= 0:
|
| 74 |
+
return
|
| 75 |
+
pick_dist = {b: counts.get(b, 0.0) / total for b in BUCKETS}
|
| 76 |
+
session_priors = state.get("bucket_priors") or {}
|
| 77 |
+
if not session_priors:
|
| 78 |
+
session_priors = {b: 1.0 / len(BUCKETS) for b in BUCKETS}
|
| 79 |
+
blended = {
|
| 80 |
+
b: (1 - _PICK_PRIOR_WEIGHT) * session_priors.get(b, 0.0)
|
| 81 |
+
+ _PICK_PRIOR_WEIGHT * pick_dist[b]
|
| 82 |
+
for b in BUCKETS
|
| 83 |
+
}
|
| 84 |
+
s = sum(blended.values())
|
| 85 |
+
if s > 0:
|
| 86 |
+
blended = {b: v / s for b, v in blended.items()}
|
| 87 |
+
state["bucket_priors"] = blended
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
def _prepend_prior_pick(
|
| 91 |
+
state: PipelineState, chunks: list[RetrievedChunk]
|
| 92 |
+
) -> list[RetrievedChunk]:
|
| 93 |
+
"""On a side-index hit, surface the previously-picked text as a special
|
| 94 |
+
chunk the LLM sees in its grounding block. Not deduped against personal
|
| 95 |
+
chunks — the prior pick is phrased in the persona's voice and is useful
|
| 96 |
+
even when similar memories are present.
|
| 97 |
+
"""
|
| 98 |
+
try:
|
| 99 |
+
hit = pick_index.lookup(
|
| 100 |
+
query=state["raw_query"], user_id=state["user_id"], threshold=0.85
|
| 101 |
+
)
|
| 102 |
+
except Exception as exc:
|
| 103 |
+
print(f"[retrieval] pick_index.lookup failed: {exc!r}")
|
| 104 |
+
return chunks
|
| 105 |
+
if not hit:
|
| 106 |
+
return chunks
|
| 107 |
+
text = (hit.get("picked_text") or "").strip()
|
| 108 |
+
if not text:
|
| 109 |
+
return chunks
|
| 110 |
+
# Avoid injecting an identical chunk twice (the side_index-strategy
|
| 111 |
+
# candidate in the planner handles that path separately).
|
| 112 |
+
if any(c.get("text") == text for c in chunks):
|
| 113 |
+
return chunks
|
| 114 |
+
prior = RetrievedChunk(
|
| 115 |
+
text=text,
|
| 116 |
+
bucket="prior_pick",
|
| 117 |
+
type="narrative",
|
| 118 |
+
user="",
|
| 119 |
+
score=float(hit.get("match_score", 0.0)),
|
| 120 |
+
source="prior_pick",
|
| 121 |
+
)
|
| 122 |
+
return [prior] + list(chunks)
|
| 123 |
+
|
| 124 |
+
|
| 125 |
def _dispatch_all(
|
| 126 |
state: PipelineState, pool_k: int, final_k: int
|
| 127 |
) -> tuple[list[RetrievedChunk], float]:
|
|
@@ -75,6 +75,13 @@ class LatencyLog(TypedDict):
|
|
| 75 |
# ── Main pipeline state ────────────────────────────────────────────────────────
|
| 76 |
|
| 77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
class PipelineState(TypedDict):
|
| 79 |
# ── Session context (set at turn start, stable across nodes) ──────────────
|
| 80 |
user_id: str
|
|
@@ -103,7 +110,8 @@ class PipelineState(TypedDict):
|
|
| 103 |
|
| 104 |
# ── L4: Generation outputs ────────────────────────────────────────────────
|
| 105 |
augmented_prompt: str | None
|
| 106 |
-
candidates: list[
|
|
|
|
| 107 |
selected_response: str | None
|
| 108 |
llm_tier_used: str # "primary" | "fallback"
|
| 109 |
llm_model_used: str # actual model name (e.g. "gemma4:31b-cloud")
|
|
|
|
| 75 |
# ── Main pipeline state ────────────────────────────────────────────────────────
|
| 76 |
|
| 77 |
|
| 78 |
+
class Candidate(TypedDict):
|
| 79 |
+
text: str
|
| 80 |
+
strategy: str # "broad" | "focused" | "serendipitous" | "side_index"
|
| 81 |
+
# chunks fed as the primary grounding for this candidate (bucket/type only)
|
| 82 |
+
grounded_buckets: list[str]
|
| 83 |
+
|
| 84 |
+
|
| 85 |
class PipelineState(TypedDict):
|
| 86 |
# ── Session context (set at turn start, stable across nodes) ──────────────
|
| 87 |
user_id: str
|
|
|
|
| 110 |
|
| 111 |
# ── L4: Generation outputs ────────────────────────────────────────────────
|
| 112 |
augmented_prompt: str | None
|
| 113 |
+
candidates: list[Candidate] # 2-3 candidate responses w/ strategy metadata
|
| 114 |
+
rejected_candidates: list[str] # texts the user already dismissed this turn
|
| 115 |
selected_response: str | None
|
| 116 |
llm_tier_used: str # "primary" | "fallback"
|
| 117 |
llm_model_used: str # actual model name (e.g. "gemma4:31b-cloud")
|
|
@@ -0,0 +1,124 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import json
|
| 2 |
+
import threading
|
| 3 |
+
import time
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
|
| 6 |
+
import torch
|
| 7 |
+
|
| 8 |
+
from backend.config.settings import settings
|
| 9 |
+
from backend.retrieval.vector_store import get_device, get_embedder
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
def _store_path(user_id: str) -> Path:
|
| 13 |
+
return settings.data_dir / "pick_index" / user_id
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
# Guards the module-level cache. `add()` and `lookup()` can be called
|
| 17 |
+
# concurrently from an SSE handler and the /chat/pick POST — without this,
|
| 18 |
+
# a concurrent add mid-lookup could see a partially-built tensor.
|
| 19 |
+
_cache_lock = threading.RLock()
|
| 20 |
+
_cache: dict[str, tuple[torch.Tensor, list[dict]]] = {}
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
def _load(user_id: str) -> tuple[torch.Tensor, list[dict]]:
|
| 24 |
+
with _cache_lock:
|
| 25 |
+
if user_id in _cache:
|
| 26 |
+
return _cache[user_id]
|
| 27 |
+
p = _store_path(user_id)
|
| 28 |
+
if not (p / "vectors.pt").exists():
|
| 29 |
+
empty = torch.empty((0, 0), device=get_device())
|
| 30 |
+
_cache[user_id] = (empty, [])
|
| 31 |
+
return _cache[user_id]
|
| 32 |
+
vecs = torch.load(
|
| 33 |
+
p / "vectors.pt", map_location=get_device(), weights_only=True
|
| 34 |
+
)
|
| 35 |
+
with open(p / "entries.json") as f:
|
| 36 |
+
entries = json.load(f)
|
| 37 |
+
_cache[user_id] = (vecs, entries)
|
| 38 |
+
return _cache[user_id]
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
def _persist(user_id: str, vecs: torch.Tensor, entries: list[dict]) -> None:
|
| 42 |
+
p = _store_path(user_id)
|
| 43 |
+
p.mkdir(parents=True, exist_ok=True)
|
| 44 |
+
torch.save(vecs.detach().cpu(), p / "vectors.pt")
|
| 45 |
+
with open(p / "entries.json", "w") as f:
|
| 46 |
+
json.dump(entries, f, indent=2)
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
def lookup(query: str, user_id: str, threshold: float = 0.85) -> dict | None:
|
| 50 |
+
# Snapshot the (vecs, entries) tuple under the lock so a concurrent add()
|
| 51 |
+
# can't swap it out mid-search. Read-only work on the snapshot is safe.
|
| 52 |
+
with _cache_lock:
|
| 53 |
+
vecs, entries = _load(user_id)
|
| 54 |
+
if vecs.numel() == 0 or not entries:
|
| 55 |
+
return None
|
| 56 |
+
embedder = get_embedder()
|
| 57 |
+
q = embedder.encode(
|
| 58 |
+
[query],
|
| 59 |
+
convert_to_tensor=True,
|
| 60 |
+
normalize_embeddings=True,
|
| 61 |
+
device=get_device(),
|
| 62 |
+
)[0]
|
| 63 |
+
scores = vecs @ q
|
| 64 |
+
top_score, top_idx = torch.max(scores, dim=0)
|
| 65 |
+
score = float(top_score)
|
| 66 |
+
if score < threshold:
|
| 67 |
+
return None
|
| 68 |
+
hit = dict(entries[int(top_idx)])
|
| 69 |
+
hit["match_score"] = score
|
| 70 |
+
return hit
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
def add(
|
| 74 |
+
query: str,
|
| 75 |
+
user_id: str,
|
| 76 |
+
strategy: str,
|
| 77 |
+
picked_text: str,
|
| 78 |
+
picked_buckets: list[str] | None = None,
|
| 79 |
+
) -> None:
|
| 80 |
+
embedder = get_embedder()
|
| 81 |
+
q = embedder.encode(
|
| 82 |
+
[query],
|
| 83 |
+
convert_to_tensor=True,
|
| 84 |
+
normalize_embeddings=True,
|
| 85 |
+
device=get_device(),
|
| 86 |
+
) # (1, D)
|
| 87 |
+
# The whole read-modify-write is locked so two concurrent adds can't
|
| 88 |
+
# both read the same `vecs`, each concat their own vector, and clobber
|
| 89 |
+
# each other on writeback.
|
| 90 |
+
with _cache_lock:
|
| 91 |
+
vecs, entries = _load(user_id)
|
| 92 |
+
new_vecs = q if vecs.numel() == 0 else torch.cat([vecs, q], dim=0)
|
| 93 |
+
new_entries = list(entries) + [
|
| 94 |
+
{
|
| 95 |
+
"query": query,
|
| 96 |
+
"strategy": strategy,
|
| 97 |
+
"picked_text": picked_text,
|
| 98 |
+
"picked_buckets": picked_buckets or [],
|
| 99 |
+
"ts": time.time(),
|
| 100 |
+
}
|
| 101 |
+
]
|
| 102 |
+
_cache[user_id] = (new_vecs, new_entries)
|
| 103 |
+
_persist(user_id, new_vecs, new_entries)
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
def bucket_pick_counts(user_id: str) -> dict[str, float]:
|
| 107 |
+
"""Cumulative pick counts per bucket for this user.
|
| 108 |
+
|
| 109 |
+
Each pick contributes 1.0 mass split evenly across the buckets grounding
|
| 110 |
+
the picked candidate. Used by retrieval to bias bucket priors toward
|
| 111 |
+
memories the user has historically preferred.
|
| 112 |
+
"""
|
| 113 |
+
with _cache_lock:
|
| 114 |
+
_, entries = _load(user_id)
|
| 115 |
+
entries_snapshot = list(entries)
|
| 116 |
+
counts: dict[str, float] = {}
|
| 117 |
+
for e in entries_snapshot:
|
| 118 |
+
buckets = [b for b in (e.get("picked_buckets") or []) if b]
|
| 119 |
+
if not buckets:
|
| 120 |
+
continue
|
| 121 |
+
share = 1.0 / len(buckets)
|
| 122 |
+
for b in buckets:
|
| 123 |
+
counts[b] = counts.get(b, 0.0) + share
|
| 124 |
+
return counts
|
|
@@ -410,6 +410,110 @@ input[type="text"]:hover {
|
|
| 410 |
color: #ffffff;
|
| 411 |
}
|
| 412 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 413 |
.turnaround-btn {
|
| 414 |
background: transparent !important;
|
| 415 |
color: var(--accent) !important;
|
|
|
|
| 410 |
color: #ffffff;
|
| 411 |
}
|
| 412 |
|
| 413 |
+
.badge-picker {
|
| 414 |
+
background: rgba(0, 0, 0, 0.08);
|
| 415 |
+
color: var(--text);
|
| 416 |
+
}
|
| 417 |
+
|
| 418 |
+
.badge-picked {
|
| 419 |
+
background: rgba(46, 160, 67, 0.18);
|
| 420 |
+
color: #2ea043;
|
| 421 |
+
}
|
| 422 |
+
|
| 423 |
+
.chat-bubble.picker {
|
| 424 |
+
background: transparent;
|
| 425 |
+
padding: 4px 0 0 0;
|
| 426 |
+
max-width: 85%;
|
| 427 |
+
}
|
| 428 |
+
|
| 429 |
+
.candidate-list {
|
| 430 |
+
display: flex;
|
| 431 |
+
flex-direction: column;
|
| 432 |
+
gap: 6px;
|
| 433 |
+
margin-top: 6px;
|
| 434 |
+
}
|
| 435 |
+
|
| 436 |
+
.candidate-card {
|
| 437 |
+
text-align: left;
|
| 438 |
+
background: var(--surface);
|
| 439 |
+
color: var(--text);
|
| 440 |
+
border: 1px solid rgba(0, 0, 0, 0.12);
|
| 441 |
+
border-radius: 10px;
|
| 442 |
+
padding: 10px 12px;
|
| 443 |
+
cursor: pointer;
|
| 444 |
+
transition: border-color 120ms ease, background 120ms ease, transform 80ms ease;
|
| 445 |
+
}
|
| 446 |
+
|
| 447 |
+
.candidate-card:hover {
|
| 448 |
+
border-color: var(--accent);
|
| 449 |
+
background: rgba(59, 130, 246, 0.05);
|
| 450 |
+
}
|
| 451 |
+
|
| 452 |
+
.candidate-card:active {
|
| 453 |
+
transform: scale(0.995);
|
| 454 |
+
}
|
| 455 |
+
|
| 456 |
+
.candidate-strategy {
|
| 457 |
+
font-size: 11px;
|
| 458 |
+
color: rgba(0, 0, 0, 0.55);
|
| 459 |
+
margin-bottom: 4px;
|
| 460 |
+
text-transform: lowercase;
|
| 461 |
+
letter-spacing: 0.02em;
|
| 462 |
+
}
|
| 463 |
+
|
| 464 |
+
.candidate-text {
|
| 465 |
+
font-size: 14px;
|
| 466 |
+
line-height: 1.4;
|
| 467 |
+
}
|
| 468 |
+
|
| 469 |
+
.candidate-list.rejected-round {
|
| 470 |
+
opacity: 0.55;
|
| 471 |
+
margin-bottom: 4px;
|
| 472 |
+
}
|
| 473 |
+
|
| 474 |
+
.rejected-round-label {
|
| 475 |
+
font-size: 10px;
|
| 476 |
+
color: rgba(0, 0, 0, 0.45);
|
| 477 |
+
text-transform: uppercase;
|
| 478 |
+
letter-spacing: 0.05em;
|
| 479 |
+
margin-bottom: 2px;
|
| 480 |
+
}
|
| 481 |
+
|
| 482 |
+
.candidate-card.rejected {
|
| 483 |
+
cursor: default;
|
| 484 |
+
background: rgba(0, 0, 0, 0.03);
|
| 485 |
+
border-color: rgba(0, 0, 0, 0.08);
|
| 486 |
+
}
|
| 487 |
+
|
| 488 |
+
.candidate-card.rejected .candidate-text {
|
| 489 |
+
text-decoration: line-through;
|
| 490 |
+
color: rgba(0, 0, 0, 0.55);
|
| 491 |
+
}
|
| 492 |
+
|
| 493 |
+
.candidate-card.rejected:hover {
|
| 494 |
+
border-color: rgba(0, 0, 0, 0.08);
|
| 495 |
+
background: rgba(0, 0, 0, 0.03);
|
| 496 |
+
}
|
| 497 |
+
|
| 498 |
+
.candidate-card.try-again {
|
| 499 |
+
border-style: dashed;
|
| 500 |
+
border-color: var(--accent);
|
| 501 |
+
background: rgba(59, 130, 246, 0.03);
|
| 502 |
+
}
|
| 503 |
+
|
| 504 |
+
.candidate-card.try-again .candidate-strategy {
|
| 505 |
+
color: var(--accent);
|
| 506 |
+
}
|
| 507 |
+
|
| 508 |
+
.candidate-card.try-again:hover:not(:disabled) {
|
| 509 |
+
background: rgba(59, 130, 246, 0.09);
|
| 510 |
+
}
|
| 511 |
+
|
| 512 |
+
.candidate-card:disabled {
|
| 513 |
+
opacity: 0.5;
|
| 514 |
+
cursor: wait;
|
| 515 |
+
}
|
| 516 |
+
|
| 517 |
.turnaround-btn {
|
| 518 |
background: transparent !important;
|
| 519 |
color: var(--accent) !important;
|
|
@@ -1,8 +1,30 @@
|
|
| 1 |
import { useState, useRef, useEffect, useCallback } from "react";
|
| 2 |
-
import type {
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
import { EvalPanel } from "./EvalPanel";
|
| 5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
interface Props {
|
| 7 |
userId: string | null;
|
| 8 |
personaName: string;
|
|
@@ -18,6 +40,83 @@ interface Props {
|
|
| 18 |
|
| 19 |
const TURNAROUND_WINDOW_MS = 5000;
|
| 20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
export function ChatPanel({
|
| 22 |
userId,
|
| 23 |
personaName,
|
|
@@ -33,6 +132,8 @@ export function ChatPanel({
|
|
| 33 |
const [input, setInput] = useState("");
|
| 34 |
const [loading, setLoading] = useState(false);
|
| 35 |
const [turnaroundLoading, setTurnaroundLoading] = useState(false);
|
|
|
|
|
|
|
| 36 |
const bottomRef = useRef<HTMLDivElement>(null);
|
| 37 |
const lastResponseTsRef = useRef<number>(0);
|
| 38 |
const lastTurnIdRef = useRef<number | null>(null);
|
|
@@ -76,7 +177,7 @@ export function ChatPanel({
|
|
| 76 |
const next = [...prev];
|
| 77 |
for (let i = next.length - 1; i >= 0; i--) {
|
| 78 |
if (next[i].role === "aac_user" && !next[i].isTurnaround) {
|
| 79 |
-
next[i] = { ...next[i], rephrased: true };
|
| 80 |
break;
|
| 81 |
}
|
| 82 |
}
|
|
@@ -89,6 +190,8 @@ export function ChatPanel({
|
|
| 89 |
turnId: res.turn_id,
|
| 90 |
evalScores: res.eval_scores ?? null,
|
| 91 |
isTurnaround: true,
|
|
|
|
|
|
|
| 92 |
});
|
| 93 |
return next;
|
| 94 |
});
|
|
@@ -123,6 +226,120 @@ export function ChatPanel({
|
|
| 123 |
]
|
| 124 |
);
|
| 125 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
useEffect(() => {
|
| 127 |
if (
|
| 128 |
sensing.headSignal !== "HEAD_NOD_DISSATISFIED" &&
|
|
@@ -130,6 +347,23 @@ export function ChatPanel({
|
|
| 130 |
) {
|
| 131 |
return;
|
| 132 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
const targetTurnId = lastTurnIdRef.current;
|
| 134 |
const eligible =
|
| 135 |
targetTurnId !== null &&
|
|
@@ -145,63 +379,156 @@ export function ChatPanel({
|
|
| 145 |
// detection fired, then clear it. (Instant clear made detection invisible.)
|
| 146 |
const id = window.setTimeout(() => onHeadSignalConsumed(), 1500);
|
| 147 |
return () => window.clearTimeout(id);
|
| 148 |
-
}, [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 149 |
|
| 150 |
async function handleSend() {
|
| 151 |
if (!input.trim() || !userId || !backendReady || loading) return;
|
| 152 |
|
| 153 |
const query = input.trim();
|
| 154 |
setInput("");
|
| 155 |
-
setMessages((prev) => [...prev, { role: "partner", content: query }]);
|
| 156 |
setLoading(true);
|
| 157 |
|
| 158 |
const airText = sensing.airWrittenText || null;
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
air_written_text: airText,
|
| 167 |
-
head_signal: sensing.headSignal,
|
| 168 |
-
});
|
| 169 |
-
|
| 170 |
-
lastTurnIdRef.current = res.turn_id;
|
| 171 |
-
setMessages((prev) => [
|
| 172 |
...prev,
|
|
|
|
| 173 |
{
|
| 174 |
-
role: "aac_user",
|
| 175 |
-
content:
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
runId: res.run_id,
|
| 179 |
-
turnId: res.turn_id,
|
| 180 |
-
evalScores: res.eval_scores ?? null,
|
| 181 |
},
|
| 182 |
-
]
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
}
|
| 186 |
-
|
| 187 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 188 |
{
|
| 189 |
-
|
| 190 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 191 |
},
|
| 192 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
} finally {
|
| 194 |
if (airText) onAirTextConsumed();
|
| 195 |
setLoading(false);
|
| 196 |
}
|
| 197 |
}
|
| 198 |
|
| 199 |
-
const
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 205 |
|
| 206 |
return (
|
| 207 |
<div className="chat-panel">
|
|
@@ -209,39 +536,107 @@ export function ChatPanel({
|
|
| 209 |
Talking as: {personaName || "select a persona"}
|
| 210 |
</div>
|
| 211 |
<div className="chat-messages">
|
| 212 |
-
{messages.map((msg, i) =>
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 226 |
)}
|
| 227 |
-
</
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
<EvalPanel
|
| 231 |
-
runId={msg.runId}
|
| 232 |
-
userId={userId}
|
| 233 |
-
latencyTotal={msg.latency?.t_total ?? 0}
|
| 234 |
-
evalScores={msg.evalScores ?? null}
|
| 235 |
-
/>
|
| 236 |
-
)}
|
| 237 |
-
</div>
|
| 238 |
-
))}
|
| 239 |
-
{loading && (
|
| 240 |
-
<div className="chat-bubble aac_user loading">
|
| 241 |
-
<span className="chat-role">AAC User</span>
|
| 242 |
-
<p>Generating...</p>
|
| 243 |
-
</div>
|
| 244 |
-
)}
|
| 245 |
{turnaroundLoading && (
|
| 246 |
<div className="chat-bubble aac_user loading">
|
| 247 |
<span className="chat-role">AAC User</span>
|
|
@@ -262,15 +657,6 @@ export function ChatPanel({
|
|
| 262 |
<button onClick={handleSend} disabled={!userId || loading || !backendReady || !input.trim()}>
|
| 263 |
Send
|
| 264 |
</button>
|
| 265 |
-
<button
|
| 266 |
-
type="button"
|
| 267 |
-
className="turnaround-btn"
|
| 268 |
-
onClick={() => handleTurnaround("manual")}
|
| 269 |
-
disabled={!canTurnaround}
|
| 270 |
-
title="Re-plan the last response (also triggered by a head shake / sharp nod)"
|
| 271 |
-
>
|
| 272 |
-
↻ Not quite right
|
| 273 |
-
</button>
|
| 274 |
</div>
|
| 275 |
</div>
|
| 276 |
);
|
|
|
|
| 1 |
import { useState, useRef, useEffect, useCallback } from "react";
|
| 2 |
+
import type {
|
| 3 |
+
Affect,
|
| 4 |
+
Candidate,
|
| 5 |
+
ChatMessage,
|
| 6 |
+
LatencyLog,
|
| 7 |
+
SensingState,
|
| 8 |
+
} from "../types";
|
| 9 |
+
import {
|
| 10 |
+
sendPick,
|
| 11 |
+
sendTurnaround,
|
| 12 |
+
streamChat,
|
| 13 |
+
streamRegenerate,
|
| 14 |
+
} from "../lib/api";
|
| 15 |
import { EvalPanel } from "./EvalPanel";
|
| 16 |
|
| 17 |
+
const STRATEGY_LABELS: Record<string, string> = {
|
| 18 |
+
broad: "broad — all memories",
|
| 19 |
+
focused: "focused — top memory",
|
| 20 |
+
serendipitous: "serendipitous — other memory",
|
| 21 |
+
side_index: "like last time",
|
| 22 |
+
present_good: "feeling good",
|
| 23 |
+
present_fine: "doing okay",
|
| 24 |
+
present_rough: "not great",
|
| 25 |
+
pending: "",
|
| 26 |
+
};
|
| 27 |
+
|
| 28 |
interface Props {
|
| 29 |
userId: string | null;
|
| 30 |
personaName: string;
|
|
|
|
| 40 |
|
| 41 |
const TURNAROUND_WINDOW_MS = 5000;
|
| 42 |
|
| 43 |
+
// Batches token deltas per (msgIdx, candIdx) and flushes them in a single
|
| 44 |
+
// setState call per animation frame. Streaming tokens at 30-60/s × 3 candidates
|
| 45 |
+
// otherwise causes a rerender per token. Non-token events (start/done/complete)
|
| 46 |
+
// flush the pending deltas first to preserve ordering.
|
| 47 |
+
//
|
| 48 |
+
// INVARIANT: keys are message indices into the messages[] array. Callers must
|
| 49 |
+
// ensure no message is inserted *before* a streaming message for the duration
|
| 50 |
+
// of its stream — appending to the end is fine, mid-list insert is not. Today
|
| 51 |
+
// every path appends to the end; if that changes, switch to a stable message
|
| 52 |
+
// id (e.g. the placeholder's runId or a freshly-minted uuid).
|
| 53 |
+
function useTokenBatcher(
|
| 54 |
+
setMessages: React.Dispatch<React.SetStateAction<ChatMessage[]>>,
|
| 55 |
+
) {
|
| 56 |
+
// Lazy-init refs to avoid allocating a fresh Map on every render.
|
| 57 |
+
const pending = useRef<Map<number, Map<number, string>> | null>(null);
|
| 58 |
+
if (pending.current === null) pending.current = new Map();
|
| 59 |
+
const rafId = useRef<number | null>(null);
|
| 60 |
+
|
| 61 |
+
const flush = useCallback(() => {
|
| 62 |
+
rafId.current = null;
|
| 63 |
+
const batch = pending.current;
|
| 64 |
+
if (!batch || batch.size === 0) return;
|
| 65 |
+
pending.current = new Map();
|
| 66 |
+
setMessages((prev) =>
|
| 67 |
+
prev.map((m, i) => {
|
| 68 |
+
const perCand = batch.get(i);
|
| 69 |
+
if (!perCand) return m;
|
| 70 |
+
const cands = [...(m.candidates ?? [])];
|
| 71 |
+
for (const [ci, delta] of perCand) {
|
| 72 |
+
if (cands[ci]) {
|
| 73 |
+
cands[ci] = { ...cands[ci], text: cands[ci].text + delta };
|
| 74 |
+
}
|
| 75 |
+
}
|
| 76 |
+
return { ...m, candidates: cands };
|
| 77 |
+
}),
|
| 78 |
+
);
|
| 79 |
+
}, [setMessages]);
|
| 80 |
+
|
| 81 |
+
const queueToken = useCallback(
|
| 82 |
+
(msgIdx: number, candIdx: number, delta: string) => {
|
| 83 |
+
const batch = pending.current!;
|
| 84 |
+
let perMsg = batch.get(msgIdx);
|
| 85 |
+
if (!perMsg) {
|
| 86 |
+
perMsg = new Map();
|
| 87 |
+
batch.set(msgIdx, perMsg);
|
| 88 |
+
}
|
| 89 |
+
perMsg.set(candIdx, (perMsg.get(candIdx) ?? "") + delta);
|
| 90 |
+
if (rafId.current === null) {
|
| 91 |
+
rafId.current = window.requestAnimationFrame(flush);
|
| 92 |
+
}
|
| 93 |
+
},
|
| 94 |
+
[flush],
|
| 95 |
+
);
|
| 96 |
+
|
| 97 |
+
const flushNow = useCallback(() => {
|
| 98 |
+
if (rafId.current !== null) {
|
| 99 |
+
window.cancelAnimationFrame(rafId.current);
|
| 100 |
+
rafId.current = null;
|
| 101 |
+
}
|
| 102 |
+
flush();
|
| 103 |
+
}, [flush]);
|
| 104 |
+
|
| 105 |
+
// Cancel any pending rAF on unmount — otherwise a persona switch mid-stream
|
| 106 |
+
// leaves a scheduled flush that calls setMessages against the new state.
|
| 107 |
+
useEffect(() => {
|
| 108 |
+
return () => {
|
| 109 |
+
if (rafId.current !== null) {
|
| 110 |
+
window.cancelAnimationFrame(rafId.current);
|
| 111 |
+
rafId.current = null;
|
| 112 |
+
}
|
| 113 |
+
pending.current = null;
|
| 114 |
+
};
|
| 115 |
+
}, []);
|
| 116 |
+
|
| 117 |
+
return { queueToken, flushNow };
|
| 118 |
+
}
|
| 119 |
+
|
| 120 |
export function ChatPanel({
|
| 121 |
userId,
|
| 122 |
personaName,
|
|
|
|
| 132 |
const [input, setInput] = useState("");
|
| 133 |
const [loading, setLoading] = useState(false);
|
| 134 |
const [turnaroundLoading, setTurnaroundLoading] = useState(false);
|
| 135 |
+
const [regenerateLoading, setRegenerateLoading] = useState(false);
|
| 136 |
+
const { queueToken, flushNow } = useTokenBatcher(setMessages);
|
| 137 |
const bottomRef = useRef<HTMLDivElement>(null);
|
| 138 |
const lastResponseTsRef = useRef<number>(0);
|
| 139 |
const lastTurnIdRef = useRef<number | null>(null);
|
|
|
|
| 177 |
const next = [...prev];
|
| 178 |
for (let i = next.length - 1; i >= 0; i--) {
|
| 179 |
if (next[i].role === "aac_user" && !next[i].isTurnaround) {
|
| 180 |
+
next[i] = { ...next[i], rephrased: true, picked: true };
|
| 181 |
break;
|
| 182 |
}
|
| 183 |
}
|
|
|
|
| 190 |
turnId: res.turn_id,
|
| 191 |
evalScores: res.eval_scores ?? null,
|
| 192 |
isTurnaround: true,
|
| 193 |
+
candidates: res.candidates ?? [],
|
| 194 |
+
picked: true,
|
| 195 |
});
|
| 196 |
return next;
|
| 197 |
});
|
|
|
|
| 226 |
]
|
| 227 |
);
|
| 228 |
|
| 229 |
+
const handleRegenerate = useCallback(
|
| 230 |
+
async (msgIdx: number) => {
|
| 231 |
+
if (!userId || !backendReady || regenerateLoading || loading) return;
|
| 232 |
+
const msg = messages[msgIdx];
|
| 233 |
+
if (!msg || !msg.candidates || msg.picked || msg.turnId === undefined) return;
|
| 234 |
+
|
| 235 |
+
const currentRound = msg.candidates;
|
| 236 |
+
const priorRounds = msg.rejectedRounds ?? [];
|
| 237 |
+
const rejected_texts = [
|
| 238 |
+
...priorRounds.flat().map((c) => c.text),
|
| 239 |
+
...currentRound.map((c) => c.text),
|
| 240 |
+
];
|
| 241 |
+
|
| 242 |
+
setRegenerateLoading(true);
|
| 243 |
+
|
| 244 |
+
// Move the current round into rejectedRounds + clear candidates so the
|
| 245 |
+
// UI shows empty-card placeholders while streams fill in.
|
| 246 |
+
setMessages((prev) =>
|
| 247 |
+
prev.map((m, i) =>
|
| 248 |
+
i === msgIdx
|
| 249 |
+
? {
|
| 250 |
+
...m,
|
| 251 |
+
candidates: [],
|
| 252 |
+
rejectedRounds: [...priorRounds, currentRound],
|
| 253 |
+
picked: false,
|
| 254 |
+
}
|
| 255 |
+
: m,
|
| 256 |
+
),
|
| 257 |
+
);
|
| 258 |
+
|
| 259 |
+
const updateMsg = (
|
| 260 |
+
updater: (m: ChatMessage) => ChatMessage,
|
| 261 |
+
) => {
|
| 262 |
+
setMessages((prev) =>
|
| 263 |
+
prev.map((m, i) => (i === msgIdx ? updater(m) : m)),
|
| 264 |
+
);
|
| 265 |
+
};
|
| 266 |
+
|
| 267 |
+
try {
|
| 268 |
+
await streamRegenerate(
|
| 269 |
+
{
|
| 270 |
+
user_id: userId,
|
| 271 |
+
turn_id: msg.turnId,
|
| 272 |
+
rejected_texts,
|
| 273 |
+
},
|
| 274 |
+
(evt) => {
|
| 275 |
+
if (evt.type === "token") {
|
| 276 |
+
queueToken(msgIdx, evt.idx, evt.delta);
|
| 277 |
+
return;
|
| 278 |
+
}
|
| 279 |
+
flushNow();
|
| 280 |
+
if (evt.type === "candidate_start") {
|
| 281 |
+
updateMsg((m) => {
|
| 282 |
+
const cands = [...(m.candidates ?? [])];
|
| 283 |
+
while (cands.length <= evt.idx) {
|
| 284 |
+
cands.push({
|
| 285 |
+
text: "",
|
| 286 |
+
strategy: "pending",
|
| 287 |
+
grounded_buckets: [],
|
| 288 |
+
});
|
| 289 |
+
}
|
| 290 |
+
cands[evt.idx] = {
|
| 291 |
+
text: "",
|
| 292 |
+
strategy: evt.strategy,
|
| 293 |
+
grounded_buckets: evt.grounded_buckets,
|
| 294 |
+
};
|
| 295 |
+
return { ...m, candidates: cands };
|
| 296 |
+
});
|
| 297 |
+
} else if (evt.type === "candidate_done") {
|
| 298 |
+
updateMsg((m) => {
|
| 299 |
+
const cands = [...(m.candidates ?? [])];
|
| 300 |
+
if (cands[evt.idx]) {
|
| 301 |
+
cands[evt.idx] = { ...cands[evt.idx], text: evt.text };
|
| 302 |
+
}
|
| 303 |
+
return { ...m, candidates: cands };
|
| 304 |
+
});
|
| 305 |
+
} else if (evt.type === "complete") {
|
| 306 |
+
const res = evt.response;
|
| 307 |
+
lastTurnIdRef.current = res.turn_id;
|
| 308 |
+
updateMsg((m) => ({
|
| 309 |
+
...m,
|
| 310 |
+
content: res.response,
|
| 311 |
+
latency: res.latency,
|
| 312 |
+
affect: res.affect,
|
| 313 |
+
runId: res.run_id,
|
| 314 |
+
turnId: res.turn_id,
|
| 315 |
+
evalScores: res.eval_scores ?? null,
|
| 316 |
+
candidates: res.candidates ?? m.candidates ?? [],
|
| 317 |
+
picked: false,
|
| 318 |
+
}));
|
| 319 |
+
onLatency(res.latency);
|
| 320 |
+
}
|
| 321 |
+
},
|
| 322 |
+
);
|
| 323 |
+
} catch (e) {
|
| 324 |
+
flushNow();
|
| 325 |
+
console.warn("streamRegenerate failed", e);
|
| 326 |
+
} finally {
|
| 327 |
+
setRegenerateLoading(false);
|
| 328 |
+
}
|
| 329 |
+
},
|
| 330 |
+
[
|
| 331 |
+
userId,
|
| 332 |
+
backendReady,
|
| 333 |
+
regenerateLoading,
|
| 334 |
+
loading,
|
| 335 |
+
messages,
|
| 336 |
+
setMessages,
|
| 337 |
+
queueToken,
|
| 338 |
+
flushNow,
|
| 339 |
+
onLatency,
|
| 340 |
+
]
|
| 341 |
+
);
|
| 342 |
+
|
| 343 |
useEffect(() => {
|
| 344 |
if (
|
| 345 |
sensing.headSignal !== "HEAD_NOD_DISSATISFIED" &&
|
|
|
|
| 347 |
) {
|
| 348 |
return;
|
| 349 |
}
|
| 350 |
+
|
| 351 |
+
// If the most recent AAC message has an open picker, head-signal means
|
| 352 |
+
// "regenerate" — the user hasn't committed, so there's nothing to
|
| 353 |
+
// "rephrase" yet.
|
| 354 |
+
let openPickerIdx = -1;
|
| 355 |
+
for (let i = messages.length - 1; i >= 0; i--) {
|
| 356 |
+
const m = messages[i];
|
| 357 |
+
if (m.role !== "aac_user") continue;
|
| 358 |
+
if (!m.picked && (m.candidates?.length ?? 0) > 1) openPickerIdx = i;
|
| 359 |
+
break;
|
| 360 |
+
}
|
| 361 |
+
if (openPickerIdx !== -1) {
|
| 362 |
+
handleRegenerate(openPickerIdx);
|
| 363 |
+
onHeadSignalConsumed();
|
| 364 |
+
return;
|
| 365 |
+
}
|
| 366 |
+
|
| 367 |
const targetTurnId = lastTurnIdRef.current;
|
| 368 |
const eligible =
|
| 369 |
targetTurnId !== null &&
|
|
|
|
| 379 |
// detection fired, then clear it. (Instant clear made detection invisible.)
|
| 380 |
const id = window.setTimeout(() => onHeadSignalConsumed(), 1500);
|
| 381 |
return () => window.clearTimeout(id);
|
| 382 |
+
}, [
|
| 383 |
+
sensing.headSignal,
|
| 384 |
+
handleTurnaround,
|
| 385 |
+
handleRegenerate,
|
| 386 |
+
onHeadSignalConsumed,
|
| 387 |
+
messages,
|
| 388 |
+
]);
|
| 389 |
|
| 390 |
async function handleSend() {
|
| 391 |
if (!input.trim() || !userId || !backendReady || loading) return;
|
| 392 |
|
| 393 |
const query = input.trim();
|
| 394 |
setInput("");
|
|
|
|
| 395 |
setLoading(true);
|
| 396 |
|
| 397 |
const airText = sensing.airWrittenText || null;
|
| 398 |
+
|
| 399 |
+
// Push the partner bubble, and a placeholder AAC message we'll fill in
|
| 400 |
+
// progressively. We need the placeholder's index to target updates — use
|
| 401 |
+
// a ref captured from the setter so we don't rely on stale state.
|
| 402 |
+
let placeholderIdx = -1;
|
| 403 |
+
setMessages((prev) => {
|
| 404 |
+
const next = [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 405 |
...prev,
|
| 406 |
+
{ role: "partner" as const, content: query },
|
| 407 |
{
|
| 408 |
+
role: "aac_user" as const,
|
| 409 |
+
content: "",
|
| 410 |
+
candidates: [] as Candidate[],
|
| 411 |
+
picked: false,
|
|
|
|
|
|
|
|
|
|
| 412 |
},
|
| 413 |
+
];
|
| 414 |
+
placeholderIdx = next.length - 1;
|
| 415 |
+
return next;
|
| 416 |
+
});
|
| 417 |
+
|
| 418 |
+
const updatePlaceholder = (
|
| 419 |
+
updater: (m: ChatMessage) => ChatMessage,
|
| 420 |
+
) => {
|
| 421 |
+
setMessages((prev) =>
|
| 422 |
+
prev.map((m, i) => (i === placeholderIdx ? updater(m) : m)),
|
| 423 |
+
);
|
| 424 |
+
};
|
| 425 |
+
|
| 426 |
+
try {
|
| 427 |
+
await streamChat(
|
| 428 |
{
|
| 429 |
+
user_id: userId,
|
| 430 |
+
query,
|
| 431 |
+
affect_override: affectOverride ?? sensing.affect,
|
| 432 |
+
gesture_tag: sensing.gestureTag,
|
| 433 |
+
gaze_bucket: sensing.gazeBucket,
|
| 434 |
+
air_written_text: airText,
|
| 435 |
+
head_signal: sensing.headSignal,
|
| 436 |
},
|
| 437 |
+
(evt) => {
|
| 438 |
+
if (evt.type === "token") {
|
| 439 |
+
queueToken(placeholderIdx, evt.idx, evt.delta);
|
| 440 |
+
return;
|
| 441 |
+
}
|
| 442 |
+
// Any non-token event must see the latest text — flush the queue first.
|
| 443 |
+
flushNow();
|
| 444 |
+
if (evt.type === "candidate_start") {
|
| 445 |
+
updatePlaceholder((m) => {
|
| 446 |
+
const cands = [...(m.candidates ?? [])];
|
| 447 |
+
while (cands.length <= evt.idx) {
|
| 448 |
+
cands.push({
|
| 449 |
+
text: "",
|
| 450 |
+
strategy: "pending",
|
| 451 |
+
grounded_buckets: [],
|
| 452 |
+
});
|
| 453 |
+
}
|
| 454 |
+
cands[evt.idx] = {
|
| 455 |
+
text: "",
|
| 456 |
+
strategy: evt.strategy,
|
| 457 |
+
grounded_buckets: evt.grounded_buckets,
|
| 458 |
+
};
|
| 459 |
+
return { ...m, candidates: cands };
|
| 460 |
+
});
|
| 461 |
+
} else if (evt.type === "candidate_done") {
|
| 462 |
+
updatePlaceholder((m) => {
|
| 463 |
+
const cands = [...(m.candidates ?? [])];
|
| 464 |
+
if (cands[evt.idx]) {
|
| 465 |
+
cands[evt.idx] = { ...cands[evt.idx], text: evt.text };
|
| 466 |
+
}
|
| 467 |
+
return { ...m, candidates: cands };
|
| 468 |
+
});
|
| 469 |
+
} else if (evt.type === "complete") {
|
| 470 |
+
const res = evt.response;
|
| 471 |
+
lastTurnIdRef.current = res.turn_id;
|
| 472 |
+
updatePlaceholder((m) => ({
|
| 473 |
+
...m,
|
| 474 |
+
content: res.response,
|
| 475 |
+
latency: res.latency,
|
| 476 |
+
affect: res.affect,
|
| 477 |
+
runId: res.run_id,
|
| 478 |
+
turnId: res.turn_id,
|
| 479 |
+
evalScores: res.eval_scores ?? null,
|
| 480 |
+
candidates: res.candidates ?? m.candidates ?? [],
|
| 481 |
+
picked: (res.candidates ?? []).length <= 1,
|
| 482 |
+
}));
|
| 483 |
+
onLatency(res.latency);
|
| 484 |
+
lastResponseTsRef.current = performance.now();
|
| 485 |
+
}
|
| 486 |
+
},
|
| 487 |
+
);
|
| 488 |
+
} catch (e) {
|
| 489 |
+
flushNow();
|
| 490 |
+
updatePlaceholder((m) => ({
|
| 491 |
+
...m,
|
| 492 |
+
content: `Error: ${e instanceof Error ? e.message : "request failed"}`,
|
| 493 |
+
}));
|
| 494 |
} finally {
|
| 495 |
if (airText) onAirTextConsumed();
|
| 496 |
setLoading(false);
|
| 497 |
}
|
| 498 |
}
|
| 499 |
|
| 500 |
+
const handlePick = useCallback(
|
| 501 |
+
async (msgIdx: number, candIdx: number) => {
|
| 502 |
+
const msg = messages[msgIdx];
|
| 503 |
+
if (!msg || !msg.candidates || !msg.runId || !userId) return;
|
| 504 |
+
if (msg.picked) return;
|
| 505 |
+
const picked = msg.candidates[candIdx];
|
| 506 |
+
if (!picked) return;
|
| 507 |
+
|
| 508 |
+
setMessages((prev) =>
|
| 509 |
+
prev.map((m, i) =>
|
| 510 |
+
i === msgIdx
|
| 511 |
+
? {
|
| 512 |
+
...m,
|
| 513 |
+
content: picked.text,
|
| 514 |
+
picked: true,
|
| 515 |
+
pickedIdx: candIdx,
|
| 516 |
+
}
|
| 517 |
+
: m
|
| 518 |
+
)
|
| 519 |
+
);
|
| 520 |
+
try {
|
| 521 |
+
await sendPick({
|
| 522 |
+
run_id: msg.runId,
|
| 523 |
+
user_id: userId,
|
| 524 |
+
picked_idx: candIdx,
|
| 525 |
+
});
|
| 526 |
+
} catch (e) {
|
| 527 |
+
console.warn("sendPick failed", e);
|
| 528 |
+
}
|
| 529 |
+
},
|
| 530 |
+
[messages, setMessages, userId]
|
| 531 |
+
);
|
| 532 |
|
| 533 |
return (
|
| 534 |
<div className="chat-panel">
|
|
|
|
| 536 |
Talking as: {personaName || "select a persona"}
|
| 537 |
</div>
|
| 538 |
<div className="chat-messages">
|
| 539 |
+
{messages.map((msg, i) => {
|
| 540 |
+
const hasRegenerated = (msg.rejectedRounds?.length ?? 0) > 0;
|
| 541 |
+
const showPicker =
|
| 542 |
+
msg.role === "aac_user" &&
|
| 543 |
+
!msg.picked &&
|
| 544 |
+
!!msg.candidates &&
|
| 545 |
+
(msg.candidates.length > 1 || hasRegenerated);
|
| 546 |
+
|
| 547 |
+
if (showPicker) {
|
| 548 |
+
const priorRounds = msg.rejectedRounds ?? [];
|
| 549 |
+
return (
|
| 550 |
+
<div key={i} className="chat-bubble aac_user picker">
|
| 551 |
+
<span className="chat-role">
|
| 552 |
+
AAC User
|
| 553 |
+
<span className="badge badge-picker">
|
| 554 |
+
pick one ({msg.candidates!.length} options)
|
| 555 |
+
</span>
|
| 556 |
+
</span>
|
| 557 |
+
{priorRounds.map((round, ri) => (
|
| 558 |
+
<div key={`r${ri}`} className="candidate-list rejected-round">
|
| 559 |
+
<div className="rejected-round-label">
|
| 560 |
+
rejected round {ri + 1}
|
| 561 |
+
</div>
|
| 562 |
+
{round.map((cand, ci) => (
|
| 563 |
+
<div key={ci} className="candidate-card rejected">
|
| 564 |
+
<div className="candidate-strategy">
|
| 565 |
+
{STRATEGY_LABELS[cand.strategy] ?? cand.strategy}
|
| 566 |
+
</div>
|
| 567 |
+
<div className="candidate-text">{cand.text}</div>
|
| 568 |
+
</div>
|
| 569 |
+
))}
|
| 570 |
+
</div>
|
| 571 |
+
))}
|
| 572 |
+
<div className="candidate-list">
|
| 573 |
+
{msg.candidates!.map((cand, ci) => (
|
| 574 |
+
<button
|
| 575 |
+
key={ci}
|
| 576 |
+
type="button"
|
| 577 |
+
className="candidate-card"
|
| 578 |
+
onClick={() => handlePick(i, ci)}
|
| 579 |
+
disabled={regenerateLoading}
|
| 580 |
+
title="Click to send this one"
|
| 581 |
+
>
|
| 582 |
+
<div className="candidate-strategy">
|
| 583 |
+
{STRATEGY_LABELS[cand.strategy] ?? cand.strategy}
|
| 584 |
+
</div>
|
| 585 |
+
<div className="candidate-text">{cand.text}</div>
|
| 586 |
+
</button>
|
| 587 |
+
))}
|
| 588 |
+
<button
|
| 589 |
+
type="button"
|
| 590 |
+
className="candidate-card try-again"
|
| 591 |
+
onClick={() => handleRegenerate(i)}
|
| 592 |
+
disabled={regenerateLoading}
|
| 593 |
+
title="None of these fit — generate fresh options"
|
| 594 |
+
>
|
| 595 |
+
<div className="candidate-strategy">try again</div>
|
| 596 |
+
<div className="candidate-text">
|
| 597 |
+
{regenerateLoading
|
| 598 |
+
? "Regenerating…"
|
| 599 |
+
: "↻ None of these fit — try different angles"}
|
| 600 |
+
</div>
|
| 601 |
+
</button>
|
| 602 |
+
</div>
|
| 603 |
+
</div>
|
| 604 |
+
);
|
| 605 |
+
}
|
| 606 |
+
|
| 607 |
+
return (
|
| 608 |
+
<div
|
| 609 |
+
key={i}
|
| 610 |
+
className={`chat-bubble ${msg.role}${
|
| 611 |
+
msg.rephrased ? " rephrased" : ""
|
| 612 |
+
}${msg.isTurnaround ? " turnaround" : ""}`}
|
| 613 |
+
>
|
| 614 |
+
<span className="chat-role">
|
| 615 |
+
{msg.role === "partner" ? "Partner" : "AAC User"}
|
| 616 |
+
{msg.rephrased && (
|
| 617 |
+
<span className="badge badge-rephrased"> rephrased</span>
|
| 618 |
+
)}
|
| 619 |
+
{msg.isTurnaround && (
|
| 620 |
+
<span className="badge badge-turnaround"> ↻ turnaround</span>
|
| 621 |
+
)}
|
| 622 |
+
{msg.picked && msg.pickedIdx !== undefined && msg.candidates && msg.candidates[msg.pickedIdx] && (
|
| 623 |
+
<span className="badge badge-picked">
|
| 624 |
+
✓ {STRATEGY_LABELS[msg.candidates[msg.pickedIdx].strategy] ?? msg.candidates[msg.pickedIdx].strategy}
|
| 625 |
+
</span>
|
| 626 |
+
)}
|
| 627 |
+
</span>
|
| 628 |
+
<p>{msg.content}</p>
|
| 629 |
+
{msg.role === "aac_user" && msg.runId && userId && (
|
| 630 |
+
<EvalPanel
|
| 631 |
+
runId={msg.runId}
|
| 632 |
+
userId={userId}
|
| 633 |
+
latencyTotal={msg.latency?.t_total ?? 0}
|
| 634 |
+
evalScores={msg.evalScores ?? null}
|
| 635 |
+
/>
|
| 636 |
)}
|
| 637 |
+
</div>
|
| 638 |
+
);
|
| 639 |
+
})}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 640 |
{turnaroundLoading && (
|
| 641 |
<div className="chat-bubble aac_user loading">
|
| 642 |
<span className="chat-role">AAC User</span>
|
|
|
|
| 657 |
<button onClick={handleSend} disabled={!userId || loading || !backendReady || !input.trim()}>
|
| 658 |
Send
|
| 659 |
</button>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 660 |
</div>
|
| 661 |
</div>
|
| 662 |
);
|
|
@@ -44,6 +44,102 @@ export async function resetSession(userId: string): Promise<void> {
|
|
| 44 |
if (!res.ok) throw new Error(`API error: ${res.status}`);
|
| 45 |
}
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
export async function submitRating(args: {
|
| 48 |
run_id: string;
|
| 49 |
user_id: string;
|
|
|
|
| 44 |
if (!res.ok) throw new Error(`API error: ${res.status}`);
|
| 45 |
}
|
| 46 |
|
| 47 |
+
export type StreamEvent =
|
| 48 |
+
| { type: "candidate_start"; idx: number; strategy: string; grounded_buckets: string[] }
|
| 49 |
+
| { type: "token"; idx: number; delta: string }
|
| 50 |
+
| { type: "candidate_done"; idx: number; text: string }
|
| 51 |
+
| { type: "candidate_error"; idx: number; error: string }
|
| 52 |
+
| { type: "complete"; response: ChatResponse }
|
| 53 |
+
| { type: "error"; message: string };
|
| 54 |
+
|
| 55 |
+
async function readSSE(
|
| 56 |
+
res: Response,
|
| 57 |
+
onEvent: (evt: StreamEvent) => void,
|
| 58 |
+
): Promise<void> {
|
| 59 |
+
if (!res.body) throw new Error("no response body");
|
| 60 |
+
const reader = res.body.getReader();
|
| 61 |
+
const decoder = new TextDecoder();
|
| 62 |
+
let buffer = "";
|
| 63 |
+
|
| 64 |
+
const emitFrame = (frame: string) => {
|
| 65 |
+
const line = frame.split("\n").find((l) => l.startsWith("data:"));
|
| 66 |
+
if (!line) return;
|
| 67 |
+
const json = line.slice(5).trim();
|
| 68 |
+
if (!json) return;
|
| 69 |
+
try {
|
| 70 |
+
onEvent(JSON.parse(json) as StreamEvent);
|
| 71 |
+
} catch (e) {
|
| 72 |
+
console.warn("SSE parse failed", e, json.slice(0, 200));
|
| 73 |
+
}
|
| 74 |
+
};
|
| 75 |
+
|
| 76 |
+
while (true) {
|
| 77 |
+
const { done, value } = await reader.read();
|
| 78 |
+
if (done) break;
|
| 79 |
+
buffer += decoder.decode(value, { stream: true });
|
| 80 |
+
// SSE frames are separated by blank lines.
|
| 81 |
+
const parts = buffer.split("\n\n");
|
| 82 |
+
buffer = parts.pop() ?? "";
|
| 83 |
+
for (const part of parts) emitFrame(part);
|
| 84 |
+
}
|
| 85 |
+
// Server closed cleanly but the final frame didn't end with \n\n —
|
| 86 |
+
// emit whatever remains so the terminal event isn't dropped.
|
| 87 |
+
if (buffer.trim()) emitFrame(buffer);
|
| 88 |
+
}
|
| 89 |
+
|
| 90 |
+
export async function streamChat(
|
| 91 |
+
req: ChatRequest,
|
| 92 |
+
onEvent: (evt: StreamEvent) => void,
|
| 93 |
+
): Promise<void> {
|
| 94 |
+
const res = await fetch(`${API_BASE}/chat/stream`, {
|
| 95 |
+
method: "POST",
|
| 96 |
+
headers: { "Content-Type": "application/json" },
|
| 97 |
+
body: JSON.stringify(req),
|
| 98 |
+
});
|
| 99 |
+
if (!res.ok) throw new Error(`API error: ${res.status}`);
|
| 100 |
+
await readSSE(res, onEvent);
|
| 101 |
+
}
|
| 102 |
+
|
| 103 |
+
export async function streamRegenerate(
|
| 104 |
+
args: { user_id: string; turn_id: number; rejected_texts: string[] },
|
| 105 |
+
onEvent: (evt: StreamEvent) => void,
|
| 106 |
+
): Promise<void> {
|
| 107 |
+
const res = await fetch(`${API_BASE}/chat/regenerate/stream`, {
|
| 108 |
+
method: "POST",
|
| 109 |
+
headers: { "Content-Type": "application/json" },
|
| 110 |
+
body: JSON.stringify(args),
|
| 111 |
+
});
|
| 112 |
+
if (!res.ok) throw new Error(`API error: ${res.status}`);
|
| 113 |
+
await readSSE(res, onEvent);
|
| 114 |
+
}
|
| 115 |
+
|
| 116 |
+
export async function sendRegenerate(args: {
|
| 117 |
+
user_id: string;
|
| 118 |
+
turn_id: number;
|
| 119 |
+
rejected_texts: string[];
|
| 120 |
+
}): Promise<ChatResponse> {
|
| 121 |
+
const res = await fetch(`${API_BASE}/chat/regenerate`, {
|
| 122 |
+
method: "POST",
|
| 123 |
+
headers: { "Content-Type": "application/json" },
|
| 124 |
+
body: JSON.stringify(args),
|
| 125 |
+
});
|
| 126 |
+
if (!res.ok) throw new Error(`API error: ${res.status}`);
|
| 127 |
+
return res.json();
|
| 128 |
+
}
|
| 129 |
+
|
| 130 |
+
export async function sendPick(args: {
|
| 131 |
+
run_id: string;
|
| 132 |
+
user_id: string;
|
| 133 |
+
picked_idx: number;
|
| 134 |
+
}): Promise<void> {
|
| 135 |
+
const res = await fetch(`${API_BASE}/chat/pick`, {
|
| 136 |
+
method: "POST",
|
| 137 |
+
headers: { "Content-Type": "application/json" },
|
| 138 |
+
body: JSON.stringify(args),
|
| 139 |
+
});
|
| 140 |
+
if (!res.ok) throw new Error(`API error: ${res.status}`);
|
| 141 |
+
}
|
| 142 |
+
|
| 143 |
export async function submitRating(args: {
|
| 144 |
run_id: string;
|
| 145 |
user_id: string;
|
|
@@ -66,10 +66,23 @@ export interface EvalScores {
|
|
| 66 |
gaze_alignment: number;
|
| 67 |
}
|
| 68 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
export interface ChatResponse {
|
| 70 |
user_id: string;
|
| 71 |
query: string;
|
| 72 |
response: string;
|
|
|
|
| 73 |
affect: string;
|
| 74 |
llm_tier: string;
|
| 75 |
retrieval_mode: string;
|
|
@@ -90,4 +103,10 @@ export interface ChatMessage {
|
|
| 90 |
rephrased?: boolean;
|
| 91 |
isTurnaround?: boolean;
|
| 92 |
evalScores?: EvalScores | null;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
}
|
|
|
|
| 66 |
gaze_alignment: number;
|
| 67 |
}
|
| 68 |
|
| 69 |
+
export type CandidateStrategy =
|
| 70 |
+
| "broad"
|
| 71 |
+
| "focused"
|
| 72 |
+
| "serendipitous"
|
| 73 |
+
| "side_index";
|
| 74 |
+
|
| 75 |
+
export interface Candidate {
|
| 76 |
+
text: string;
|
| 77 |
+
strategy: CandidateStrategy | string;
|
| 78 |
+
grounded_buckets: string[];
|
| 79 |
+
}
|
| 80 |
+
|
| 81 |
export interface ChatResponse {
|
| 82 |
user_id: string;
|
| 83 |
query: string;
|
| 84 |
response: string;
|
| 85 |
+
candidates: Candidate[];
|
| 86 |
affect: string;
|
| 87 |
llm_tier: string;
|
| 88 |
retrieval_mode: string;
|
|
|
|
| 103 |
rephrased?: boolean;
|
| 104 |
isTurnaround?: boolean;
|
| 105 |
evalScores?: EvalScores | null;
|
| 106 |
+
candidates?: Candidate[];
|
| 107 |
+
// picked becomes true after the user clicks one — also locks in `content` to the picked text
|
| 108 |
+
picked?: boolean;
|
| 109 |
+
pickedIdx?: number;
|
| 110 |
+
// Candidates from prior regeneration rounds — rendered struck-through above the active picker
|
| 111 |
+
rejectedRounds?: Candidate[][];
|
| 112 |
}
|