shwetangisingh commited on
Commit
084a2f9
·
1 Parent(s): 978ca55

dropped more bloat

Browse files
.gitignore CHANGED
@@ -17,7 +17,7 @@ env/
17
  .env
18
 
19
  # Data — indexes are rebuilt from source; do NOT commit binaries
20
- data/faiss_store/
21
 
22
  # Per-turn JSONL logs (contain user conversation content)
23
  logs/
 
17
  .env
18
 
19
  # Data — indexes are rebuilt from source; do NOT commit binaries
20
+ data/vector_store/
21
 
22
  # Per-turn JSONL logs (contain user conversation content)
23
  logs/
CLAUDE.md CHANGED
@@ -110,7 +110,7 @@ Copy `.env.example` → `.env` and set:
110
  |------|---------|
111
  | `data/users.json` | Flat user index (id, name, condition, style) |
112
  | `data/memories/<uid>.json` | Full persona JSON with bucketed memories |
113
- | `data/faiss_store/<uid>/` | `vectors.pt` + `meta.json` — **rebuild after any persona edit** |
114
  | `data/generate_users.py` | Regenerates memories + users.json |
115
 
116
  ---
@@ -140,6 +140,6 @@ Copy `.env.example` → `.env` and set:
140
  - **Guardrail tuning**: edit signal lists in `backend/guardrails/checks.py`
141
  - **Affect → generation mapping**: `_AFFECT_CONFIG` in `backend/pipeline/nodes/intent.py`
142
  and `_PERSONA_TONE_OVERRIDES` in `backend/pipeline/nodes/planner.py`
143
- - Vector indexes in `data/faiss_store/` are gitignored — rebuilt from source JSONs
144
  via `python -m backend.retrieval.vector_store`
145
  - Frontend uses pnpm, Node 22+
 
110
  |------|---------|
111
  | `data/users.json` | Flat user index (id, name, condition, style) |
112
  | `data/memories/<uid>.json` | Full persona JSON with bucketed memories |
113
+ | `data/vector_store/<uid>/` | `vectors.pt` + `meta.json` — **rebuild after any persona edit** |
114
  | `data/generate_users.py` | Regenerates memories + users.json |
115
 
116
  ---
 
140
  - **Guardrail tuning**: edit signal lists in `backend/guardrails/checks.py`
141
  - **Affect → generation mapping**: `_AFFECT_CONFIG` in `backend/pipeline/nodes/intent.py`
142
  and `_PERSONA_TONE_OVERRIDES` in `backend/pipeline/nodes/planner.py`
143
+ - Vector indexes in `data/vector_store/` are gitignored — rebuilt from source JSONs
144
  via `python -m backend.retrieval.vector_store`
145
  - Frontend uses pnpm, Node 22+
README.md CHANGED
@@ -80,7 +80,7 @@ The setup script handles:
80
  - Python dependency installation
81
  - `.env` file creation from template
82
  - Vector index building (downloads BGE-small embedder on first run, saves
83
- per-user `vectors.pt` under `data/faiss_store/`)
84
  - Frontend dependency installation (pnpm)
85
 
86
  ---
@@ -160,7 +160,7 @@ multimodal_aac_chatbot/
160
  ├── data/
161
  │ ├── users.json Persona index
162
  │ ├── memories/ Per-persona memory JSONs
163
- │ └── faiss_store/ vectors.pt + meta.json (gitignored, rebuilt)
164
  ├── logs/ Per-turn JSONL logs (gitignored)
165
 
166
  ├── setup.sh One-time setup script
@@ -196,7 +196,7 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
196
  - [ ] **[Core]** Memories are only autobiographical narratives right now. Need more variety:
197
  - [ ] social media posts (voice-matched, synth with LLM)
198
  - [ ] past chat logs (synth with LLM)
199
- - [ ] update the generator script + rebuild faiss
200
  - [ ] tag chunks by type so retriever knows what it pulled
201
  - [ ] **[Core]** Write down the data schema somewhere so evals can reuse it
202
 
@@ -214,7 +214,7 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
214
 
215
  > Current state: routing is keyword-based, not LLM-based. The original LLM router (Pydantic-validated JSON) kept emitting the wrong shape with `gemma4:31b-cloud` and hitting the `max_tokens` truncation — 3 retries + hard fallback on every turn, ~30s of dead latency before generation. The keyword router (5 buckets matched against word lists in `intent.py`) handles the demo personas and adds ~0ms. Trade-off: stuck with the 5 hardcoded buckets (`family`, `medical`, `hobbies`, `daily_routine`, `social`) and can't tell `OPEN_DOMAIN` from `PERSONAL`. Fine for now since all personas only have personal memories. Revisit when Ollama Cloud ships `response_format=json_schema` or we add a tiny local classifier.
216
 
217
- - [ ] **[Core]** Personal / Contextual / Open-domain all hit the same FAISS index right now. Make them actually go different places — open-domain → web search (or stub), contextual → session memory
218
  - [ ] intent node is slow. Cache the prompt, use a tiny model for routing, parallelise the sub-queries
219
 
220
  ### Retrieval
@@ -230,7 +230,7 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
230
 
231
  - [ ] **[Core]** API returns one response. Should return multiple candidates so the user can pick (and so the next item works)
232
  - [ ] **[Core]** Frontend needs a candidate picker — show all the options, let the user click one, send the selection back
233
- - [ ] **[Bonus]** When user picks a candidate, save the `(query, picked)` pair to a side faiss index and check it first next turn
234
 
235
  ### Evals
236
 
 
80
  - Python dependency installation
81
  - `.env` file creation from template
82
  - Vector index building (downloads BGE-small embedder on first run, saves
83
+ per-user `vectors.pt` under `data/vector_store/`)
84
  - Frontend dependency installation (pnpm)
85
 
86
  ---
 
160
  ├── data/
161
  │ ├── users.json Persona index
162
  │ ├── memories/ Per-persona memory JSONs
163
+ │ └── vector_store/ vectors.pt + meta.json (gitignored, rebuilt)
164
  ├── logs/ Per-turn JSONL logs (gitignored)
165
 
166
  ├── setup.sh One-time setup script
 
196
  - [ ] **[Core]** Memories are only autobiographical narratives right now. Need more variety:
197
  - [ ] social media posts (voice-matched, synth with LLM)
198
  - [ ] past chat logs (synth with LLM)
199
+ - [ ] update the generator script + rebuild vector store
200
  - [ ] tag chunks by type so retriever knows what it pulled
201
  - [ ] **[Core]** Write down the data schema somewhere so evals can reuse it
202
 
 
214
 
215
  > Current state: routing is keyword-based, not LLM-based. The original LLM router (Pydantic-validated JSON) kept emitting the wrong shape with `gemma4:31b-cloud` and hitting the `max_tokens` truncation — 3 retries + hard fallback on every turn, ~30s of dead latency before generation. The keyword router (5 buckets matched against word lists in `intent.py`) handles the demo personas and adds ~0ms. Trade-off: stuck with the 5 hardcoded buckets (`family`, `medical`, `hobbies`, `daily_routine`, `social`) and can't tell `OPEN_DOMAIN` from `PERSONAL`. Fine for now since all personas only have personal memories. Revisit when Ollama Cloud ships `response_format=json_schema` or we add a tiny local classifier.
216
 
217
+ - [ ] **[Core]** Personal / Contextual / Open-domain all hit the same vector index right now. Make them actually go different places — open-domain → web search (or stub), contextual → session memory
218
  - [ ] intent node is slow. Cache the prompt, use a tiny model for routing, parallelise the sub-queries
219
 
220
  ### Retrieval
 
230
 
231
  - [ ] **[Core]** API returns one response. Should return multiple candidates so the user can pick (and so the next item works)
232
  - [ ] **[Core]** Frontend needs a candidate picker — show all the options, let the user click one, send the selection back
233
+ - [ ] **[Bonus]** When user picks a candidate, save the `(query, picked)` pair to a side vector index and check it first next turn
234
 
235
  ### Evals
236
 
backend/api/main.py CHANGED
@@ -175,7 +175,6 @@ def debug_config():
175
  "retrieval_rerank_k": settings.retrieval_rerank_k,
176
  "fallback_latency_threshold": settings.fallback_latency_threshold,
177
  "slo_target_s": settings.slo_target_s,
178
- "num_candidates": settings.num_candidates,
179
  }
180
 
181
 
 
175
  "retrieval_rerank_k": settings.retrieval_rerank_k,
176
  "fallback_latency_threshold": settings.fallback_latency_threshold,
177
  "slo_target_s": settings.slo_target_s,
 
178
  }
179
 
180
 
backend/config/settings.py CHANGED
@@ -10,7 +10,7 @@ class Settings(BaseSettings):
10
 
11
  # ── Paths ──────────────────────────────────────────────────────────────────
12
  data_dir: Path = Path("data")
13
- faiss_store_dir: Path = Path("data/faiss_store") # name kept for back-compat
14
  memories_dir: Path = Path("data/memories")
15
  users_json: Path = Path("data/users.json")
16
  logs_dir: Path = Path("logs")
@@ -45,7 +45,6 @@ class Settings(BaseSettings):
45
  max_tokens_neutral: int = 100
46
  max_tokens_frustrated: int = 60
47
  max_tokens_surprised: int = 80
48
- num_candidates: int = 2 # responses generated per turn for ranking
49
 
50
  # ── Sensing ───────────────────────────────────────────────────────────────
51
  affect_ema_alpha: float = 0.3 # exponential moving average smoothing
@@ -55,11 +54,6 @@ class Settings(BaseSettings):
55
  air_write_end_gap_ms: int = 200 # ms of stillness to end a stroke
56
  conflict_overlap_ms: int = 500 # audio + gesture co-occurrence window
57
 
58
- # ── Candidate ranking weights ───────────────────────────────────────────────
59
- rank_alpha: float = 0.4 # faithfulness weight
60
- rank_beta: float = 0.3 # style similarity weight
61
- rank_gamma: float = 0.3 # affect-match weight
62
-
63
  # ── Evaluation ────────────────────────────────────────────────────────────
64
  slo_target_s: float = 6.0 # max acceptable response latency (seconds)
65
 
 
10
 
11
  # ── Paths ──────────────────────────────────────────────────────────────────
12
  data_dir: Path = Path("data")
13
+ vector_store_dir: Path = Path("data/vector_store")
14
  memories_dir: Path = Path("data/memories")
15
  users_json: Path = Path("data/users.json")
16
  logs_dir: Path = Path("logs")
 
45
  max_tokens_neutral: int = 100
46
  max_tokens_frustrated: int = 60
47
  max_tokens_surprised: int = 80
 
48
 
49
  # ── Sensing ───────────────────────────────────────────────────────────────
50
  affect_ema_alpha: float = 0.3 # exponential moving average smoothing
 
54
  air_write_end_gap_ms: int = 200 # ms of stillness to end a stroke
55
  conflict_overlap_ms: int = 500 # audio + gesture co-occurrence window
56
 
 
 
 
 
 
57
  # ── Evaluation ────────────────────────────────────────────────────────────
58
  slo_target_s: float = 6.0 # max acceptable response latency (seconds)
59
 
backend/pipeline/nodes/planner.py CHANGED
@@ -46,7 +46,7 @@ def _run(state: PipelineState, tier: str) -> dict:
46
  affect = (state.get("affect") or {}).get("emotion", "NEUTRAL")
47
  gen_cfg = state.get("generation_config") or {}
48
  chunks = state.get("retrieved_chunks") or []
49
- history = (state.get("session_history") or [])[-3:] # last 3 turns only
50
 
51
  tone_tag = _resolve_tone_tag(
52
  user_id, affect, gen_cfg.get("tone_tag", "[TONE:DEFAULT]")
@@ -64,21 +64,13 @@ def _run(state: PipelineState, tier: str) -> dict:
64
  air_written_text=air_written_text,
65
  )
66
 
67
- candidates: list[str] = []
68
- for _ in range(settings.num_candidates):
69
- text = chat_complete(
70
- messages=[{"role": "user", "content": prompt}],
71
- max_tokens=gen_cfg.get("max_tokens", settings.max_tokens_neutral),
72
- temperature=0.7,
73
- tier=tier,
74
- )
75
- candidates.append(text)
76
-
77
- selected = _rank_candidates(
78
- candidates, chunks, affect, profile, gesture_tag=gesture_tag
79
  )
80
 
81
- # Guardrail — replace with safe fallback if output breaks persona
82
  guard = check_output(selected, chunks)
83
  if not guard["passed"]:
84
  selected = guard["fallback"]
@@ -96,7 +88,7 @@ def _run(state: PipelineState, tier: str) -> dict:
96
 
97
  return {
98
  "augmented_prompt": prompt,
99
- "candidates": candidates,
100
  "selected_response": selected,
101
  "llm_tier_used": tier,
102
  "llm_model_used": active_model(tier),
@@ -147,7 +139,7 @@ def _build_prompt(
147
  }.get(persona_mod, "Use your natural communication style.")
148
 
149
  return f"""\
150
- You are {profile["name"]}, an AAC device user with {profile["condition"]}.
151
  Communication style: {profile["style"]}
152
  {tone_tag}{gesture_line}{air_writing_line}
153
 
@@ -170,48 +162,3 @@ Instructions:
170
  - Do NOT say "As an AI" or break persona.
171
 
172
  Response:"""
173
-
174
-
175
- def _rank_candidates(
176
- candidates: list[str],
177
- chunks: list[dict],
178
- affect: str,
179
- profile: dict,
180
- gesture_tag: str | None = None,
181
- ) -> str:
182
- if not candidates:
183
- return "I don't know."
184
- if len(candidates) == 1:
185
- return candidates[0]
186
-
187
- evidence_words = set(" ".join(c["text"] for c in chunks).lower().split())
188
- style_words = set(profile.get("style", "").lower().split())
189
-
190
- affect_positive_map = {
191
- "HAPPY": ["great", "love", "enjoy", "happy", "fun"],
192
- "FRUSTRATED": ["okay", "fine", "sure", "yes", "no"],
193
- "NEUTRAL": [],
194
- "SURPRISED": ["really", "oh", "interesting", "wow"],
195
- }
196
- gesture_word_map = {
197
- "THUMBS_UP": ["yes", "good", "agree", "great", "sure"],
198
- "THUMBS_DOWN": ["no", "disagree", "stop", "don't"],
199
- "POINTING": ["that", "this", "there", "see"],
200
- "WAVING": ["hello", "hi", "bye", "goodbye"],
201
- }
202
- affect_words = set(affect_positive_map.get(affect, [])) | set(
203
- gesture_word_map.get(gesture_tag or "", [])
204
- )
205
-
206
- def score(c: str) -> float:
207
- words = set(c.lower().split())
208
- faithful = len(words & evidence_words) / max(len(words), 1)
209
- style_sim = len(words & style_words) / max(len(words), 1)
210
- affect_m = len(words & affect_words) / max(len(words), 1)
211
- return (
212
- settings.rank_alpha * faithful
213
- + settings.rank_beta * style_sim
214
- + settings.rank_gamma * affect_m
215
- )
216
-
217
- return max(candidates, key=score)
 
46
  affect = (state.get("affect") or {}).get("emotion", "NEUTRAL")
47
  gen_cfg = state.get("generation_config") or {}
48
  chunks = state.get("retrieved_chunks") or []
49
+ history = (state.get("session_history") or [])[-20:]
50
 
51
  tone_tag = _resolve_tone_tag(
52
  user_id, affect, gen_cfg.get("tone_tag", "[TONE:DEFAULT]")
 
64
  air_written_text=air_written_text,
65
  )
66
 
67
+ selected = chat_complete(
68
+ messages=[{"role": "user", "content": prompt}],
69
+ max_tokens=gen_cfg.get("max_tokens", settings.max_tokens_neutral),
70
+ temperature=0.4,
71
+ tier=tier,
 
 
 
 
 
 
 
72
  )
73
 
 
74
  guard = check_output(selected, chunks)
75
  if not guard["passed"]:
76
  selected = guard["fallback"]
 
88
 
89
  return {
90
  "augmented_prompt": prompt,
91
+ "candidates": [selected],
92
  "selected_response": selected,
93
  "llm_tier_used": tier,
94
  "llm_model_used": active_model(tier),
 
139
  }.get(persona_mod, "Use your natural communication style.")
140
 
141
  return f"""\
142
+ You are {profile["name"]}. You have {profile["condition"]} and communicate through an AAC device, but your voice and thoughts are fully your own.
143
  Communication style: {profile["style"]}
144
  {tone_tag}{gesture_line}{air_writing_line}
145
 
 
162
  - Do NOT say "As an AI" or break persona.
163
 
164
  Response:"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
backend/retrieval/vector_store.py CHANGED
@@ -37,7 +37,7 @@ _index_cache: dict[str, tuple[torch.Tensor, list[dict]]] = {}
37
 
38
  def load_index(user_id: str) -> tuple[torch.Tensor, list[dict]]:
39
  if user_id not in _index_cache:
40
- store_path = settings.faiss_store_dir / user_id
41
  vecs = torch.load(
42
  store_path / "vectors.pt", map_location=_DEVICE, weights_only=True
43
  )
@@ -126,7 +126,7 @@ def build_all(
126
  store_dir: str | Path | None = None,
127
  ) -> None:
128
  memories_dir = Path(memories_dir or settings.memories_dir)
129
- store_dir = Path(store_dir or settings.faiss_store_dir)
130
 
131
  print(f"Embedder device: {_DEVICE}")
132
  for persona_file in sorted(memories_dir.glob("*.json")):
 
37
 
38
  def load_index(user_id: str) -> tuple[torch.Tensor, list[dict]]:
39
  if user_id not in _index_cache:
40
+ store_path = settings.vector_store_dir / user_id
41
  vecs = torch.load(
42
  store_path / "vectors.pt", map_location=_DEVICE, weights_only=True
43
  )
 
126
  store_dir: str | Path | None = None,
127
  ) -> None:
128
  memories_dir = Path(memories_dir or settings.memories_dir)
129
+ store_dir = Path(store_dir or settings.vector_store_dir)
130
 
131
  print(f"Embedder device: {_DEVICE}")
132
  for persona_file in sorted(memories_dir.glob("*.json")):
setup.sh CHANGED
@@ -39,7 +39,7 @@ fi
39
 
40
  info "Building vector indexes (downloads BGE-small embedder on first run)..."
41
  python -m backend.retrieval.vector_store
42
- ok "Vector indexes built in data/faiss_store/"
43
 
44
  # Ollama: tiers point at Ollama Cloud — no local pull needed. Just check the
45
  # daemon is reachable so the OpenAI-compatible proxy works.
 
39
 
40
  info "Building vector indexes (downloads BGE-small embedder on first run)..."
41
  python -m backend.retrieval.vector_store
42
+ ok "Vector indexes built in data/vector_store/"
43
 
44
  # Ollama: tiers point at Ollama Cloud — no local pull needed. Just check the
45
  # daemon is reachable so the OpenAI-compatible proxy works.