Spaces:

ub-aac-chatbot
/

aac-chatbot

Sleeping

App Files Files Community

shwetangisingh commited on Apr 17

Commit

084a2f9

1 Parent(s): 978ca55

dropped more bloat

Browse files

Files changed (8) hide show

.gitignore +1 -1
CLAUDE.md +2 -2
README.md +5 -5
backend/api/main.py +0 -1
backend/config/settings.py +1 -7
backend/pipeline/nodes/planner.py +8 -61
backend/retrieval/vector_store.py +2 -2
setup.sh +1 -1

.gitignore CHANGED Viewed

@@ -17,7 +17,7 @@ env/
 .env
 # Data — indexes are rebuilt from source; do NOT commit binaries
-data/faiss_store/
 # Per-turn JSONL logs (contain user conversation content)
 logs/

 .env
 # Data — indexes are rebuilt from source; do NOT commit binaries
+data/vector_store/
 # Per-turn JSONL logs (contain user conversation content)
 logs/

CLAUDE.md CHANGED Viewed

@@ -110,7 +110,7 @@ Copy `.env.example` → `.env` and set:
 |------|---------|
 | `data/users.json` | Flat user index (id, name, condition, style) |
 | `data/memories/<uid>.json` | Full persona JSON with bucketed memories |
-| `data/faiss_store/<uid>/` | `vectors.pt` + `meta.json` — **rebuild after any persona edit** |
 | `data/generate_users.py` | Regenerates memories + users.json |
 ---
@@ -140,6 +140,6 @@ Copy `.env.example` → `.env` and set:
 - **Guardrail tuning**: edit signal lists in `backend/guardrails/checks.py`
 - **Affect → generation mapping**: `_AFFECT_CONFIG` in `backend/pipeline/nodes/intent.py`
   and `_PERSONA_TONE_OVERRIDES` in `backend/pipeline/nodes/planner.py`
-- Vector indexes in `data/faiss_store/` are gitignored — rebuilt from source JSONs
   via `python -m backend.retrieval.vector_store`
 - Frontend uses pnpm, Node 22+

 |------|---------|
 | `data/users.json` | Flat user index (id, name, condition, style) |
 | `data/memories/<uid>.json` | Full persona JSON with bucketed memories |
+| `data/vector_store/<uid>/` | `vectors.pt` + `meta.json` — **rebuild after any persona edit** |
 | `data/generate_users.py` | Regenerates memories + users.json |
 ---
 - **Guardrail tuning**: edit signal lists in `backend/guardrails/checks.py`
 - **Affect → generation mapping**: `_AFFECT_CONFIG` in `backend/pipeline/nodes/intent.py`
   and `_PERSONA_TONE_OVERRIDES` in `backend/pipeline/nodes/planner.py`
+- Vector indexes in `data/vector_store/` are gitignored — rebuilt from source JSONs
   via `python -m backend.retrieval.vector_store`
 - Frontend uses pnpm, Node 22+

README.md CHANGED Viewed

@@ -80,7 +80,7 @@ The setup script handles:
 - Python dependency installation
 - `.env` file creation from template
 - Vector index building (downloads BGE-small embedder on first run, saves
-  per-user `vectors.pt` under `data/faiss_store/`)
 - Frontend dependency installation (pnpm)
 ---
@@ -160,7 +160,7 @@ multimodal_aac_chatbot/
 ├── data/
 │   ├── users.json                 Persona index
 │   ├── memories/                  Per-persona memory JSONs
-│   └── faiss_store/               vectors.pt + meta.json (gitignored, rebuilt)
 ├── logs/                          Per-turn JSONL logs (gitignored)
 │
 ├── setup.sh                       One-time setup script
@@ -196,7 +196,7 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
 - [ ] **[Core]** Memories are only autobiographical narratives right now. Need more variety:
   - [ ] social media posts (voice-matched, synth with LLM)
   - [ ] past chat logs (synth with LLM)
-  - [ ] update the generator script + rebuild faiss
   - [ ] tag chunks by type so retriever knows what it pulled
 - [ ] **[Core]** Write down the data schema somewhere so evals can reuse it
@@ -214,7 +214,7 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
 > Current state: routing is keyword-based, not LLM-based. The original LLM router (Pydantic-validated JSON) kept emitting the wrong shape with `gemma4:31b-cloud` and hitting the `max_tokens` truncation — 3 retries + hard fallback on every turn, ~30s of dead latency before generation. The keyword router (5 buckets matched against word lists in `intent.py`) handles the demo personas and adds ~0ms. Trade-off: stuck with the 5 hardcoded buckets (`family`, `medical`, `hobbies`, `daily_routine`, `social`) and can't tell `OPEN_DOMAIN` from `PERSONAL`. Fine for now since all personas only have personal memories. Revisit when Ollama Cloud ships `response_format=json_schema` or we add a tiny local classifier.
-- [ ] **[Core]** Personal / Contextual / Open-domain all hit the same FAISS index right now. Make them actually go different places — open-domain → web search (or stub), contextual → session memory
 - [ ] intent node is slow. Cache the prompt, use a tiny model for routing, parallelise the sub-queries
 ### Retrieval
@@ -230,7 +230,7 @@ Heads up: all camera/sensing stuff is in the frontend (MediaPipe JS). Backend ju
 - [ ] **[Core]** API returns one response. Should return multiple candidates so the user can pick (and so the next item works)
 - [ ] **[Core]** Frontend needs a candidate picker — show all the options, let the user click one, send the selection back
-- [ ] **[Bonus]** When user picks a candidate, save the `(query, picked)` pair to a side faiss index and check it first next turn
 ### Evals

 - Python dependency installation
 - `.env` file creation from template
 - Vector index building (downloads BGE-small embedder on first run, saves
+  per-user `vectors.pt` under `data/vector_store/`)
 - Frontend dependency installation (pnpm)
 ---
 ├── data/
 │   ├── users.json                 Persona index
 │   ├── memories/                  Per-persona memory JSONs
+│   └── vector_store/              vectors.pt + meta.json (gitignored, rebuilt)
 ├── logs/                          Per-turn JSONL logs (gitignored)
 │
 ├── setup.sh                       One-time setup script
 - [ ] **[Core]** Memories are only autobiographical narratives right now. Need more variety:
   - [ ] social media posts (voice-matched, synth with LLM)
   - [ ] past chat logs (synth with LLM)
+  - [ ] update the generator script + rebuild vector store
   - [ ] tag chunks by type so retriever knows what it pulled
 - [ ] **[Core]** Write down the data schema somewhere so evals can reuse it
 > Current state: routing is keyword-based, not LLM-based. The original LLM router (Pydantic-validated JSON) kept emitting the wrong shape with `gemma4:31b-cloud` and hitting the `max_tokens` truncation — 3 retries + hard fallback on every turn, ~30s of dead latency before generation. The keyword router (5 buckets matched against word lists in `intent.py`) handles the demo personas and adds ~0ms. Trade-off: stuck with the 5 hardcoded buckets (`family`, `medical`, `hobbies`, `daily_routine`, `social`) and can't tell `OPEN_DOMAIN` from `PERSONAL`. Fine for now since all personas only have personal memories. Revisit when Ollama Cloud ships `response_format=json_schema` or we add a tiny local classifier.
+- [ ] **[Core]** Personal / Contextual / Open-domain all hit the same vector index right now. Make them actually go different places — open-domain → web search (or stub), contextual → session memory
 - [ ] intent node is slow. Cache the prompt, use a tiny model for routing, parallelise the sub-queries
 ### Retrieval
 - [ ] **[Core]** API returns one response. Should return multiple candidates so the user can pick (and so the next item works)
 - [ ] **[Core]** Frontend needs a candidate picker — show all the options, let the user click one, send the selection back
+- [ ] **[Bonus]** When user picks a candidate, save the `(query, picked)` pair to a side vector index and check it first next turn
 ### Evals

backend/api/main.py CHANGED Viewed

@@ -175,7 +175,6 @@ def debug_config():
         "retrieval_rerank_k": settings.retrieval_rerank_k,
         "fallback_latency_threshold": settings.fallback_latency_threshold,
         "slo_target_s": settings.slo_target_s,
-        "num_candidates": settings.num_candidates,
     }

         "retrieval_rerank_k": settings.retrieval_rerank_k,
         "fallback_latency_threshold": settings.fallback_latency_threshold,
         "slo_target_s": settings.slo_target_s,
     }

backend/config/settings.py CHANGED Viewed

@@ -10,7 +10,7 @@ class Settings(BaseSettings):
     # ── Paths ──────────────────────────────────────────────────────────────────
     data_dir: Path = Path("data")
-    faiss_store_dir: Path = Path("data/faiss_store")  # name kept for back-compat
     memories_dir: Path = Path("data/memories")
     users_json: Path = Path("data/users.json")
     logs_dir: Path = Path("logs")
@@ -45,7 +45,6 @@ class Settings(BaseSettings):
     max_tokens_neutral: int = 100
     max_tokens_frustrated: int = 60
     max_tokens_surprised: int = 80
-    num_candidates: int = 2  # responses generated per turn for ranking
     # ── Sensing ───────────────────────────────────────────────────────────────
     affect_ema_alpha: float = 0.3  # exponential moving average smoothing
@@ -55,11 +54,6 @@ class Settings(BaseSettings):
     air_write_end_gap_ms: int = 200  # ms of stillness to end a stroke
     conflict_overlap_ms: int = 500  # audio + gesture co-occurrence window
-    # ── Candidate ranking weights ───────────────────────────────────────────────
-    rank_alpha: float = 0.4  # faithfulness weight
-    rank_beta: float = 0.3  # style similarity weight
-    rank_gamma: float = 0.3  # affect-match weight
     # ── Evaluation ────────────────────────────────────────────────────────────
     slo_target_s: float = 6.0  # max acceptable response latency (seconds)

     # ── Paths ──────────────────────────────────────────────────────────────────
     data_dir: Path = Path("data")
+    vector_store_dir: Path = Path("data/vector_store")
     memories_dir: Path = Path("data/memories")
     users_json: Path = Path("data/users.json")
     logs_dir: Path = Path("logs")
     max_tokens_neutral: int = 100
     max_tokens_frustrated: int = 60
     max_tokens_surprised: int = 80
     # ── Sensing ───────────────────────────────────────────────────────────────
     affect_ema_alpha: float = 0.3  # exponential moving average smoothing
     air_write_end_gap_ms: int = 200  # ms of stillness to end a stroke
     conflict_overlap_ms: int = 500  # audio + gesture co-occurrence window
     # ── Evaluation ────────────────────────────────────────────────────────────
     slo_target_s: float = 6.0  # max acceptable response latency (seconds)

backend/pipeline/nodes/planner.py CHANGED Viewed

@@ -46,7 +46,7 @@ def _run(state: PipelineState, tier: str) -> dict:
     affect = (state.get("affect") or {}).get("emotion", "NEUTRAL")
     gen_cfg = state.get("generation_config") or {}
     chunks = state.get("retrieved_chunks") or []
-    history = (state.get("session_history") or [])[-3:]  # last 3 turns only
     tone_tag = _resolve_tone_tag(
         user_id, affect, gen_cfg.get("tone_tag", "[TONE:DEFAULT]")
@@ -64,21 +64,13 @@ def _run(state: PipelineState, tier: str) -> dict:
         air_written_text=air_written_text,
     )
-    candidates: list[str] = []
-    for _ in range(settings.num_candidates):
-        text = chat_complete(
-            messages=[{"role": "user", "content": prompt}],
-            max_tokens=gen_cfg.get("max_tokens", settings.max_tokens_neutral),
-            temperature=0.7,
-            tier=tier,
-        )
-        candidates.append(text)
-    selected = _rank_candidates(
-        candidates, chunks, affect, profile, gesture_tag=gesture_tag
     )
-    # Guardrail — replace with safe fallback if output breaks persona
     guard = check_output(selected, chunks)
     if not guard["passed"]:
         selected = guard["fallback"]
@@ -96,7 +88,7 @@ def _run(state: PipelineState, tier: str) -> dict:
     return {
         "augmented_prompt": prompt,
-        "candidates": candidates,
         "selected_response": selected,
         "llm_tier_used": tier,
         "llm_model_used": active_model(tier),
@@ -147,7 +139,7 @@ def _build_prompt(
     }.get(persona_mod, "Use your natural communication style.")
     return f"""\
-You are {profile["name"]}, an AAC device user with {profile["condition"]}.
 Communication style: {profile["style"]}
 {tone_tag}{gesture_line}{air_writing_line}
@@ -170,48 +162,3 @@ Instructions:
 - Do NOT say "As an AI" or break persona.
 Response:"""
-def _rank_candidates(
-    candidates: list[str],
-    chunks: list[dict],
-    affect: str,
-    profile: dict,
-    gesture_tag: str | None = None,
-) -> str:
-    if not candidates:
-        return "I don't know."
-    if len(candidates) == 1:
-        return candidates[0]
-    evidence_words = set(" ".join(c["text"] for c in chunks).lower().split())
-    style_words = set(profile.get("style", "").lower().split())
-    affect_positive_map = {
-        "HAPPY": ["great", "love", "enjoy", "happy", "fun"],
-        "FRUSTRATED": ["okay", "fine", "sure", "yes", "no"],
-        "NEUTRAL": [],
-        "SURPRISED": ["really", "oh", "interesting", "wow"],
-    }
-    gesture_word_map = {
-        "THUMBS_UP": ["yes", "good", "agree", "great", "sure"],
-        "THUMBS_DOWN": ["no", "disagree", "stop", "don't"],
-        "POINTING": ["that", "this", "there", "see"],
-        "WAVING": ["hello", "hi", "bye", "goodbye"],
-    }
-    affect_words = set(affect_positive_map.get(affect, [])) | set(
-        gesture_word_map.get(gesture_tag or "", [])
-    )
-    def score(c: str) -> float:
-        words = set(c.lower().split())
-        faithful = len(words & evidence_words) / max(len(words), 1)
-        style_sim = len(words & style_words) / max(len(words), 1)
-        affect_m = len(words & affect_words) / max(len(words), 1)
-        return (
-            settings.rank_alpha * faithful
-            + settings.rank_beta * style_sim
-            + settings.rank_gamma * affect_m
-        )
-    return max(candidates, key=score)

     affect = (state.get("affect") or {}).get("emotion", "NEUTRAL")
     gen_cfg = state.get("generation_config") or {}
     chunks = state.get("retrieved_chunks") or []
+    history = (state.get("session_history") or [])[-20:]
     tone_tag = _resolve_tone_tag(
         user_id, affect, gen_cfg.get("tone_tag", "[TONE:DEFAULT]")
         air_written_text=air_written_text,
     )
+    selected = chat_complete(
+        messages=[{"role": "user", "content": prompt}],
+        max_tokens=gen_cfg.get("max_tokens", settings.max_tokens_neutral),
+        temperature=0.4,
+        tier=tier,
     )
     guard = check_output(selected, chunks)
     if not guard["passed"]:
         selected = guard["fallback"]
     return {
         "augmented_prompt": prompt,
+        "candidates": [selected],
         "selected_response": selected,
         "llm_tier_used": tier,
         "llm_model_used": active_model(tier),
     }.get(persona_mod, "Use your natural communication style.")
     return f"""\
+You are {profile["name"]}. You have {profile["condition"]} and communicate through an AAC device, but your voice and thoughts are fully your own.
 Communication style: {profile["style"]}
 {tone_tag}{gesture_line}{air_writing_line}
 - Do NOT say "As an AI" or break persona.
 Response:"""

backend/retrieval/vector_store.py CHANGED Viewed

@@ -37,7 +37,7 @@ _index_cache: dict[str, tuple[torch.Tensor, list[dict]]] = {}
 def load_index(user_id: str) -> tuple[torch.Tensor, list[dict]]:
     if user_id not in _index_cache:
-        store_path = settings.faiss_store_dir / user_id
         vecs = torch.load(
             store_path / "vectors.pt", map_location=_DEVICE, weights_only=True
         )
@@ -126,7 +126,7 @@ def build_all(
     store_dir: str | Path | None = None,
 ) -> None:
     memories_dir = Path(memories_dir or settings.memories_dir)
-    store_dir = Path(store_dir or settings.faiss_store_dir)
     print(f"Embedder device: {_DEVICE}")
     for persona_file in sorted(memories_dir.glob("*.json")):

 def load_index(user_id: str) -> tuple[torch.Tensor, list[dict]]:
     if user_id not in _index_cache:
+        store_path = settings.vector_store_dir / user_id
         vecs = torch.load(
             store_path / "vectors.pt", map_location=_DEVICE, weights_only=True
         )
     store_dir: str | Path | None = None,
 ) -> None:
     memories_dir = Path(memories_dir or settings.memories_dir)
+    store_dir = Path(store_dir or settings.vector_store_dir)
     print(f"Embedder device: {_DEVICE}")
     for persona_file in sorted(memories_dir.glob("*.json")):

setup.sh CHANGED Viewed

@@ -39,7 +39,7 @@ fi
 info "Building vector indexes (downloads BGE-small embedder on first run)..."
 python -m backend.retrieval.vector_store
-ok "Vector indexes built in data/faiss_store/"
 # Ollama: tiers point at Ollama Cloud — no local pull needed. Just check the
 # daemon is reachable so the OpenAI-compatible proxy works.

 info "Building vector indexes (downloads BGE-small embedder on first run)..."
 python -m backend.retrieval.vector_store
+ok "Vector indexes built in data/vector_store/"
 # Ollama: tiers point at Ollama Cloud — no local pull needed. Just check the
 # daemon is reachable so the OpenAI-compatible proxy works.