Spaces:
Sleeping
Sleeping
Commit ·
084a2f9
1
Parent(s): 978ca55
dropped more bloat
Browse files- .gitignore +1 -1
- CLAUDE.md +2 -2
- README.md +5 -5
- backend/api/main.py +0 -1
- backend/config/settings.py +1 -7
- backend/pipeline/nodes/planner.py +8 -61
- backend/retrieval/vector_store.py +2 -2
- setup.sh +1 -1
.gitignore
CHANGED
|
@@ -17,7 +17,7 @@ env/
|
|
| 17 |
.env
|
| 18 |
|
| 19 |
# Data — indexes are rebuilt from source; do NOT commit binaries
|
| 20 |
-
data/
|
| 21 |
|
| 22 |
# Per-turn JSONL logs (contain user conversation content)
|
| 23 |
logs/
|
|
|
|
| 17 |
.env
|
| 18 |
|
| 19 |
# Data — indexes are rebuilt from source; do NOT commit binaries
|
| 20 |
+
data/vector_store/
|
| 21 |
|
| 22 |
# Per-turn JSONL logs (contain user conversation content)
|
| 23 |
logs/
|
CLAUDE.md
CHANGED
|
@@ -110,7 +110,7 @@ Copy `.env.example` → `.env` and set:
|
|
| 110 |
|------|---------|
|
| 111 |
| `data/users.json` | Flat user index (id, name, condition, style) |
|
| 112 |
| `data/memories/<uid>.json` | Full persona JSON with bucketed memories |
|
| 113 |
-
| `data/
|
| 114 |
| `data/generate_users.py` | Regenerates memories + users.json |
|
| 115 |
|
| 116 |
---
|
|
@@ -140,6 +140,6 @@ Copy `.env.example` → `.env` and set:
|
|
| 140 |
- **Guardrail tuning**: edit signal lists in `backend/guardrails/checks.py`
|
| 141 |
- **Affect → generation mapping**: `_AFFECT_CONFIG` in `backend/pipeline/nodes/intent.py`
|
| 142 |
and `_PERSONA_TONE_OVERRIDES` in `backend/pipeline/nodes/planner.py`
|
| 143 |
-
- Vector indexes in `data/
|
| 144 |
via `python -m backend.retrieval.vector_store`
|
| 145 |
- Frontend uses pnpm, Node 22+
|
|
|
|
| 110 |
|------|---------|
|
| 111 |
| `data/users.json` | Flat user index (id, name, condition, style) |
|
| 112 |
| `data/memories/<uid>.json` | Full persona JSON with bucketed memories |
|
| 113 |
+
| `data/vector_store/<uid>/` | `vectors.pt` + `meta.json` — **rebuild after any persona edit** |
|
| 114 |
| `data/generate_users.py` | Regenerates memories + users.json |
|
| 115 |
|
| 116 |
---
|
|
|
|
| 140 |
- **Guardrail tuning**: edit signal lists in `backend/guardrails/checks.py`
|
| 141 |
- **Affect → generation mapping**: `_AFFECT_CONFIG` in `backend/pipeline/nodes/intent.py`
|
| 142 |
and `_PERSONA_TONE_OVERRIDES` in `backend/pipeline/nodes/planner.py`
|
| 143 |
+
- Vector indexes in `data/vector_store/` are gitignored — rebuilt from source JSONs
|
| 144 |
via `python -m backend.retrieval.vector_store`
|
| 145 |
- Frontend uses pnpm, Node 22+
|
README.md
CHANGED
|
@@ -80,7 +80,7 @@ The setup script handles:
|
|
| 80 |
- Python dependency installation
|
| 81 |
- `.env` file creation from template
|
| 82 |
- Vector index building (downloads BGE-small embedder on first run, saves
|
| 83 |
-
per-user `vectors.pt` under `data/
|
| 84 |
- Frontend dependency installation (pnpm)
|
| 85 |
|
| 86 |
---
|
|
@@ -160,7 +160,7 @@ multimodal_aac_chatbot/
|
|
| 160 |
├── data/
|
| 161 |
│ ├── users.json Persona index
|
| 162 |
│ ├── memories/ Per-persona memory JSONs
|
| 163 |
-
│ └──
|
| 164 |
├── logs/ Per-turn JSONL logs (gitignored)
|
| 165 |
│
|
| 166 |
├── setup.sh One-time setup script
|
|
@@ -196,7 +196,7 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
|
|
| 196 |
- [ ] **[Core]** Memories are only autobiographical narratives right now. Need more variety:
|
| 197 |
- [ ] social media posts (voice-matched, synth with LLM)
|
| 198 |
- [ ] past chat logs (synth with LLM)
|
| 199 |
-
- [ ] update the generator script + rebuild
|
| 200 |
- [ ] tag chunks by type so retriever knows what it pulled
|
| 201 |
- [ ] **[Core]** Write down the data schema somewhere so evals can reuse it
|
| 202 |
|
|
@@ -214,7 +214,7 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
|
|
| 214 |
|
| 215 |
> Current state: routing is keyword-based, not LLM-based. The original LLM router (Pydantic-validated JSON) kept emitting the wrong shape with `gemma4:31b-cloud` and hitting the `max_tokens` truncation — 3 retries + hard fallback on every turn, ~30s of dead latency before generation. The keyword router (5 buckets matched against word lists in `intent.py`) handles the demo personas and adds ~0ms. Trade-off: stuck with the 5 hardcoded buckets (`family`, `medical`, `hobbies`, `daily_routine`, `social`) and can't tell `OPEN_DOMAIN` from `PERSONAL`. Fine for now since all personas only have personal memories. Revisit when Ollama Cloud ships `response_format=json_schema` or we add a tiny local classifier.
|
| 216 |
|
| 217 |
-
- [ ] **[Core]** Personal / Contextual / Open-domain all hit the same
|
| 218 |
- [ ] intent node is slow. Cache the prompt, use a tiny model for routing, parallelise the sub-queries
|
| 219 |
|
| 220 |
### Retrieval
|
|
@@ -230,7 +230,7 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
|
|
| 230 |
|
| 231 |
- [ ] **[Core]** API returns one response. Should return multiple candidates so the user can pick (and so the next item works)
|
| 232 |
- [ ] **[Core]** Frontend needs a candidate picker — show all the options, let the user click one, send the selection back
|
| 233 |
-
- [ ] **[Bonus]** When user picks a candidate, save the `(query, picked)` pair to a side
|
| 234 |
|
| 235 |
### Evals
|
| 236 |
|
|
|
|
| 80 |
- Python dependency installation
|
| 81 |
- `.env` file creation from template
|
| 82 |
- Vector index building (downloads BGE-small embedder on first run, saves
|
| 83 |
+
per-user `vectors.pt` under `data/vector_store/`)
|
| 84 |
- Frontend dependency installation (pnpm)
|
| 85 |
|
| 86 |
---
|
|
|
|
| 160 |
├── data/
|
| 161 |
│ ├── users.json Persona index
|
| 162 |
│ ├── memories/ Per-persona memory JSONs
|
| 163 |
+
│ └── vector_store/ vectors.pt + meta.json (gitignored, rebuilt)
|
| 164 |
├── logs/ Per-turn JSONL logs (gitignored)
|
| 165 |
│
|
| 166 |
├── setup.sh One-time setup script
|
|
|
|
| 196 |
- [ ] **[Core]** Memories are only autobiographical narratives right now. Need more variety:
|
| 197 |
- [ ] social media posts (voice-matched, synth with LLM)
|
| 198 |
- [ ] past chat logs (synth with LLM)
|
| 199 |
+
- [ ] update the generator script + rebuild vector store
|
| 200 |
- [ ] tag chunks by type so retriever knows what it pulled
|
| 201 |
- [ ] **[Core]** Write down the data schema somewhere so evals can reuse it
|
| 202 |
|
|
|
|
| 214 |
|
| 215 |
> Current state: routing is keyword-based, not LLM-based. The original LLM router (Pydantic-validated JSON) kept emitting the wrong shape with `gemma4:31b-cloud` and hitting the `max_tokens` truncation — 3 retries + hard fallback on every turn, ~30s of dead latency before generation. The keyword router (5 buckets matched against word lists in `intent.py`) handles the demo personas and adds ~0ms. Trade-off: stuck with the 5 hardcoded buckets (`family`, `medical`, `hobbies`, `daily_routine`, `social`) and can't tell `OPEN_DOMAIN` from `PERSONAL`. Fine for now since all personas only have personal memories. Revisit when Ollama Cloud ships `response_format=json_schema` or we add a tiny local classifier.
|
| 216 |
|
| 217 |
+
- [ ] **[Core]** Personal / Contextual / Open-domain all hit the same vector index right now. Make them actually go different places — open-domain → web search (or stub), contextual → session memory
|
| 218 |
- [ ] intent node is slow. Cache the prompt, use a tiny model for routing, parallelise the sub-queries
|
| 219 |
|
| 220 |
### Retrieval
|
|
|
|
| 230 |
|
| 231 |
- [ ] **[Core]** API returns one response. Should return multiple candidates so the user can pick (and so the next item works)
|
| 232 |
- [ ] **[Core]** Frontend needs a candidate picker — show all the options, let the user click one, send the selection back
|
| 233 |
+
- [ ] **[Bonus]** When user picks a candidate, save the `(query, picked)` pair to a side vector index and check it first next turn
|
| 234 |
|
| 235 |
### Evals
|
| 236 |
|
backend/api/main.py
CHANGED
|
@@ -175,7 +175,6 @@ def debug_config():
|
|
| 175 |
"retrieval_rerank_k": settings.retrieval_rerank_k,
|
| 176 |
"fallback_latency_threshold": settings.fallback_latency_threshold,
|
| 177 |
"slo_target_s": settings.slo_target_s,
|
| 178 |
-
"num_candidates": settings.num_candidates,
|
| 179 |
}
|
| 180 |
|
| 181 |
|
|
|
|
| 175 |
"retrieval_rerank_k": settings.retrieval_rerank_k,
|
| 176 |
"fallback_latency_threshold": settings.fallback_latency_threshold,
|
| 177 |
"slo_target_s": settings.slo_target_s,
|
|
|
|
| 178 |
}
|
| 179 |
|
| 180 |
|
backend/config/settings.py
CHANGED
|
@@ -10,7 +10,7 @@ class Settings(BaseSettings):
|
|
| 10 |
|
| 11 |
# ── Paths ──────────────────────────────────────────────────────────────────
|
| 12 |
data_dir: Path = Path("data")
|
| 13 |
-
|
| 14 |
memories_dir: Path = Path("data/memories")
|
| 15 |
users_json: Path = Path("data/users.json")
|
| 16 |
logs_dir: Path = Path("logs")
|
|
@@ -45,7 +45,6 @@ class Settings(BaseSettings):
|
|
| 45 |
max_tokens_neutral: int = 100
|
| 46 |
max_tokens_frustrated: int = 60
|
| 47 |
max_tokens_surprised: int = 80
|
| 48 |
-
num_candidates: int = 2 # responses generated per turn for ranking
|
| 49 |
|
| 50 |
# ── Sensing ───────────────────────────────────────────────────────────────
|
| 51 |
affect_ema_alpha: float = 0.3 # exponential moving average smoothing
|
|
@@ -55,11 +54,6 @@ class Settings(BaseSettings):
|
|
| 55 |
air_write_end_gap_ms: int = 200 # ms of stillness to end a stroke
|
| 56 |
conflict_overlap_ms: int = 500 # audio + gesture co-occurrence window
|
| 57 |
|
| 58 |
-
# ── Candidate ranking weights ───────────────────────────────────────────────
|
| 59 |
-
rank_alpha: float = 0.4 # faithfulness weight
|
| 60 |
-
rank_beta: float = 0.3 # style similarity weight
|
| 61 |
-
rank_gamma: float = 0.3 # affect-match weight
|
| 62 |
-
|
| 63 |
# ── Evaluation ────────────────────────────────────────────────────────────
|
| 64 |
slo_target_s: float = 6.0 # max acceptable response latency (seconds)
|
| 65 |
|
|
|
|
| 10 |
|
| 11 |
# ── Paths ──────────────────────────────────────────────────────────────────
|
| 12 |
data_dir: Path = Path("data")
|
| 13 |
+
vector_store_dir: Path = Path("data/vector_store")
|
| 14 |
memories_dir: Path = Path("data/memories")
|
| 15 |
users_json: Path = Path("data/users.json")
|
| 16 |
logs_dir: Path = Path("logs")
|
|
|
|
| 45 |
max_tokens_neutral: int = 100
|
| 46 |
max_tokens_frustrated: int = 60
|
| 47 |
max_tokens_surprised: int = 80
|
|
|
|
| 48 |
|
| 49 |
# ── Sensing ───────────────────────────────────────────────────────────────
|
| 50 |
affect_ema_alpha: float = 0.3 # exponential moving average smoothing
|
|
|
|
| 54 |
air_write_end_gap_ms: int = 200 # ms of stillness to end a stroke
|
| 55 |
conflict_overlap_ms: int = 500 # audio + gesture co-occurrence window
|
| 56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
# ── Evaluation ────────────────────────────────────────────────────────────
|
| 58 |
slo_target_s: float = 6.0 # max acceptable response latency (seconds)
|
| 59 |
|
backend/pipeline/nodes/planner.py
CHANGED
|
@@ -46,7 +46,7 @@ def _run(state: PipelineState, tier: str) -> dict:
|
|
| 46 |
affect = (state.get("affect") or {}).get("emotion", "NEUTRAL")
|
| 47 |
gen_cfg = state.get("generation_config") or {}
|
| 48 |
chunks = state.get("retrieved_chunks") or []
|
| 49 |
-
history = (state.get("session_history") or [])[-
|
| 50 |
|
| 51 |
tone_tag = _resolve_tone_tag(
|
| 52 |
user_id, affect, gen_cfg.get("tone_tag", "[TONE:DEFAULT]")
|
|
@@ -64,21 +64,13 @@ def _run(state: PipelineState, tier: str) -> dict:
|
|
| 64 |
air_written_text=air_written_text,
|
| 65 |
)
|
| 66 |
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
temperature=0.7,
|
| 73 |
-
tier=tier,
|
| 74 |
-
)
|
| 75 |
-
candidates.append(text)
|
| 76 |
-
|
| 77 |
-
selected = _rank_candidates(
|
| 78 |
-
candidates, chunks, affect, profile, gesture_tag=gesture_tag
|
| 79 |
)
|
| 80 |
|
| 81 |
-
# Guardrail — replace with safe fallback if output breaks persona
|
| 82 |
guard = check_output(selected, chunks)
|
| 83 |
if not guard["passed"]:
|
| 84 |
selected = guard["fallback"]
|
|
@@ -96,7 +88,7 @@ def _run(state: PipelineState, tier: str) -> dict:
|
|
| 96 |
|
| 97 |
return {
|
| 98 |
"augmented_prompt": prompt,
|
| 99 |
-
"candidates":
|
| 100 |
"selected_response": selected,
|
| 101 |
"llm_tier_used": tier,
|
| 102 |
"llm_model_used": active_model(tier),
|
|
@@ -147,7 +139,7 @@ def _build_prompt(
|
|
| 147 |
}.get(persona_mod, "Use your natural communication style.")
|
| 148 |
|
| 149 |
return f"""\
|
| 150 |
-
You are {profile["name"]}
|
| 151 |
Communication style: {profile["style"]}
|
| 152 |
{tone_tag}{gesture_line}{air_writing_line}
|
| 153 |
|
|
@@ -170,48 +162,3 @@ Instructions:
|
|
| 170 |
- Do NOT say "As an AI" or break persona.
|
| 171 |
|
| 172 |
Response:"""
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
def _rank_candidates(
|
| 176 |
-
candidates: list[str],
|
| 177 |
-
chunks: list[dict],
|
| 178 |
-
affect: str,
|
| 179 |
-
profile: dict,
|
| 180 |
-
gesture_tag: str | None = None,
|
| 181 |
-
) -> str:
|
| 182 |
-
if not candidates:
|
| 183 |
-
return "I don't know."
|
| 184 |
-
if len(candidates) == 1:
|
| 185 |
-
return candidates[0]
|
| 186 |
-
|
| 187 |
-
evidence_words = set(" ".join(c["text"] for c in chunks).lower().split())
|
| 188 |
-
style_words = set(profile.get("style", "").lower().split())
|
| 189 |
-
|
| 190 |
-
affect_positive_map = {
|
| 191 |
-
"HAPPY": ["great", "love", "enjoy", "happy", "fun"],
|
| 192 |
-
"FRUSTRATED": ["okay", "fine", "sure", "yes", "no"],
|
| 193 |
-
"NEUTRAL": [],
|
| 194 |
-
"SURPRISED": ["really", "oh", "interesting", "wow"],
|
| 195 |
-
}
|
| 196 |
-
gesture_word_map = {
|
| 197 |
-
"THUMBS_UP": ["yes", "good", "agree", "great", "sure"],
|
| 198 |
-
"THUMBS_DOWN": ["no", "disagree", "stop", "don't"],
|
| 199 |
-
"POINTING": ["that", "this", "there", "see"],
|
| 200 |
-
"WAVING": ["hello", "hi", "bye", "goodbye"],
|
| 201 |
-
}
|
| 202 |
-
affect_words = set(affect_positive_map.get(affect, [])) | set(
|
| 203 |
-
gesture_word_map.get(gesture_tag or "", [])
|
| 204 |
-
)
|
| 205 |
-
|
| 206 |
-
def score(c: str) -> float:
|
| 207 |
-
words = set(c.lower().split())
|
| 208 |
-
faithful = len(words & evidence_words) / max(len(words), 1)
|
| 209 |
-
style_sim = len(words & style_words) / max(len(words), 1)
|
| 210 |
-
affect_m = len(words & affect_words) / max(len(words), 1)
|
| 211 |
-
return (
|
| 212 |
-
settings.rank_alpha * faithful
|
| 213 |
-
+ settings.rank_beta * style_sim
|
| 214 |
-
+ settings.rank_gamma * affect_m
|
| 215 |
-
)
|
| 216 |
-
|
| 217 |
-
return max(candidates, key=score)
|
|
|
|
| 46 |
affect = (state.get("affect") or {}).get("emotion", "NEUTRAL")
|
| 47 |
gen_cfg = state.get("generation_config") or {}
|
| 48 |
chunks = state.get("retrieved_chunks") or []
|
| 49 |
+
history = (state.get("session_history") or [])[-20:]
|
| 50 |
|
| 51 |
tone_tag = _resolve_tone_tag(
|
| 52 |
user_id, affect, gen_cfg.get("tone_tag", "[TONE:DEFAULT]")
|
|
|
|
| 64 |
air_written_text=air_written_text,
|
| 65 |
)
|
| 66 |
|
| 67 |
+
selected = chat_complete(
|
| 68 |
+
messages=[{"role": "user", "content": prompt}],
|
| 69 |
+
max_tokens=gen_cfg.get("max_tokens", settings.max_tokens_neutral),
|
| 70 |
+
temperature=0.4,
|
| 71 |
+
tier=tier,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
)
|
| 73 |
|
|
|
|
| 74 |
guard = check_output(selected, chunks)
|
| 75 |
if not guard["passed"]:
|
| 76 |
selected = guard["fallback"]
|
|
|
|
| 88 |
|
| 89 |
return {
|
| 90 |
"augmented_prompt": prompt,
|
| 91 |
+
"candidates": [selected],
|
| 92 |
"selected_response": selected,
|
| 93 |
"llm_tier_used": tier,
|
| 94 |
"llm_model_used": active_model(tier),
|
|
|
|
| 139 |
}.get(persona_mod, "Use your natural communication style.")
|
| 140 |
|
| 141 |
return f"""\
|
| 142 |
+
You are {profile["name"]}. You have {profile["condition"]} and communicate through an AAC device, but your voice and thoughts are fully your own.
|
| 143 |
Communication style: {profile["style"]}
|
| 144 |
{tone_tag}{gesture_line}{air_writing_line}
|
| 145 |
|
|
|
|
| 162 |
- Do NOT say "As an AI" or break persona.
|
| 163 |
|
| 164 |
Response:"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
backend/retrieval/vector_store.py
CHANGED
|
@@ -37,7 +37,7 @@ _index_cache: dict[str, tuple[torch.Tensor, list[dict]]] = {}
|
|
| 37 |
|
| 38 |
def load_index(user_id: str) -> tuple[torch.Tensor, list[dict]]:
|
| 39 |
if user_id not in _index_cache:
|
| 40 |
-
store_path = settings.
|
| 41 |
vecs = torch.load(
|
| 42 |
store_path / "vectors.pt", map_location=_DEVICE, weights_only=True
|
| 43 |
)
|
|
@@ -126,7 +126,7 @@ def build_all(
|
|
| 126 |
store_dir: str | Path | None = None,
|
| 127 |
) -> None:
|
| 128 |
memories_dir = Path(memories_dir or settings.memories_dir)
|
| 129 |
-
store_dir = Path(store_dir or settings.
|
| 130 |
|
| 131 |
print(f"Embedder device: {_DEVICE}")
|
| 132 |
for persona_file in sorted(memories_dir.glob("*.json")):
|
|
|
|
| 37 |
|
| 38 |
def load_index(user_id: str) -> tuple[torch.Tensor, list[dict]]:
|
| 39 |
if user_id not in _index_cache:
|
| 40 |
+
store_path = settings.vector_store_dir / user_id
|
| 41 |
vecs = torch.load(
|
| 42 |
store_path / "vectors.pt", map_location=_DEVICE, weights_only=True
|
| 43 |
)
|
|
|
|
| 126 |
store_dir: str | Path | None = None,
|
| 127 |
) -> None:
|
| 128 |
memories_dir = Path(memories_dir or settings.memories_dir)
|
| 129 |
+
store_dir = Path(store_dir or settings.vector_store_dir)
|
| 130 |
|
| 131 |
print(f"Embedder device: {_DEVICE}")
|
| 132 |
for persona_file in sorted(memories_dir.glob("*.json")):
|
setup.sh
CHANGED
|
@@ -39,7 +39,7 @@ fi
|
|
| 39 |
|
| 40 |
info "Building vector indexes (downloads BGE-small embedder on first run)..."
|
| 41 |
python -m backend.retrieval.vector_store
|
| 42 |
-
ok "Vector indexes built in data/
|
| 43 |
|
| 44 |
# Ollama: tiers point at Ollama Cloud — no local pull needed. Just check the
|
| 45 |
# daemon is reachable so the OpenAI-compatible proxy works.
|
|
|
|
| 39 |
|
| 40 |
info "Building vector indexes (downloads BGE-small embedder on first run)..."
|
| 41 |
python -m backend.retrieval.vector_store
|
| 42 |
+
ok "Vector indexes built in data/vector_store/"
|
| 43 |
|
| 44 |
# Ollama: tiers point at Ollama Cloud — no local pull needed. Just check the
|
| 45 |
# daemon is reachable so the OpenAI-compatible proxy works.
|