BabaK07 commited on
Commit
d197c9d
·
1 Parent(s): aefb7b1

Polish retrieval workflow and UI

Browse files
.env.example CHANGED
@@ -14,5 +14,9 @@ EMBEDDING_DIMENSIONS=1024
14
  JINA_API_KEY=
15
  JINA_API_BASE=https://api.jina.ai/v1/embeddings
16
  JINA_EMBEDDING_MODEL=jina-embeddings-v3
 
 
 
 
17
  WEB_SEARCH_PROVIDER=duckduckgo
18
  TAVILY_API_KEY=
 
14
  JINA_API_KEY=
15
  JINA_API_BASE=https://api.jina.ai/v1/embeddings
16
  JINA_EMBEDDING_MODEL=jina-embeddings-v3
17
+ JINA_RERANKER_API_BASE=https://api.jina.ai/v1/rerank
18
+ JINA_RERANKER_MODEL=jina-reranker-v3
19
+ RETRIEVAL_K=4
20
+ RERANK_CANDIDATE_K=12
21
  WEB_SEARCH_PROVIDER=duckduckgo
22
  TAVILY_API_KEY=
README.md CHANGED
@@ -18,6 +18,7 @@ If the uploaded documents are not enough, the agent falls back to web search and
18
  - FastAPI + SQLAlchemy
19
  - LangGraph agent
20
  - Groq chat model
 
21
  - Supabase Postgres + `pgvector`
22
  - Railway deployment
23
 
@@ -29,23 +30,38 @@ Each chunk is stored with metadata (document, page number, chunk index) and embe
29
  At question time:
30
  1. LLM-based document filtering selects relevant documents from user's library
31
  2. Vector search retrieves relevant chunks from selected documents
32
- 3. The agent answers from those chunks when possible
33
- 4. If evidence is weak, the agent uses web search and cites external URLs
 
34
 
35
  ## Chunking Strategy
36
 
37
- - Chunk size: `1200`
38
- - Overlap: `200`
 
39
 
40
  Why this setup:
41
- - Long, structured documents need enough contiguous context.
42
- - Overlap helps avoid missing content around chunk boundaries.
43
- - It gives a practical quality/cost balance for retrieval.
44
 
45
  ## Retrieval Approach
46
 
47
- I use cosine similarity search in `pgvector` (no reranker yet).
48
- The top matches are turned into readable citations (document name + page + snippet), and those are shown per answer in the UI.
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ## Agent Routing Logic
51
 
@@ -71,6 +87,17 @@ Each turn stores/returns source metadata separately from the answer body.
71
  ## Conversation Memory
72
 
73
  Conversation history is maintained within session scope, so follow-ups like “tell me more about that” work as expected.
 
 
 
 
 
 
 
 
 
 
 
74
 
75
  ## Bonus Feature
76
 
@@ -88,22 +115,24 @@ I also implemented LLM-based document filtering:
88
 
89
  - The system sends all user documents (filename, summary, preview) to the LLM
90
  - LLM semantically analyzes and selects only truly relevant documents for the query
91
- - Returns 0 to N documents based on actual relevance (not forced to always return the max limit)
92
- - Fallback returns first N documents if LLM call fails
 
93
 
94
  ## Challenges I Ran Into
95
 
96
  1. Heavy embedding dependencies made deployment images too large.
97
- - I switched to lightweight embeddings for deployment and added Jina API embedding support.
98
  2. Source rendering got messy across multiple chat turns.
99
  - I separated answer text from source payloads and extracted sources per turn.
100
  3. Intermittent DB DNS/pooler issues during deployment.
101
  - I improved connection handling and standardized Supabase transaction-pooler config.
 
 
102
 
103
  ## If I Had More Time
104
 
105
  - Add conversation history UI to display past chat sessions
106
- - Add reranking (cross-encoder) for better precision on long multi-doc queries
107
  - Add automated citation-faithfulness checks
108
  - Add Alembic migrations for cleaner schema evolution
109
  - Add stronger eval/observability for routing and retrieval quality
@@ -126,12 +155,16 @@ Required:
126
  - `GROQ_API_KEY`
127
  - `SECRET_KEY`
128
  - `DATABASE_URL`
129
-
130
- Embeddings (recommended):
131
  - `JINA_API_KEY`
 
 
132
  - `JINA_API_BASE` (default: `https://api.jina.ai/v1/embeddings`)
133
  - `JINA_EMBEDDING_MODEL` (default: `jina-embeddings-v3`)
 
 
134
  - `EMBEDDING_DIMENSIONS` (default: `1024`)
 
 
135
 
136
  Storage:
137
  - `STORAGE_BACKEND=local|supabase`
@@ -144,6 +177,10 @@ Web search:
144
  - `WEB_SEARCH_PROVIDER=duckduckgo|tavily`
145
  - `TAVILY_API_KEY` (if using Tavily)
146
 
 
 
 
 
147
  ## API Endpoints
148
 
149
  - `POST /register`
@@ -154,6 +191,7 @@ Web search:
154
  - `DELETE /documents/{document_id}`
155
  - `GET /documents/{document_id}/pdf`
156
  - `POST /ask`
 
157
 
158
  ## Sample Documents
159
 
 
18
  - FastAPI + SQLAlchemy
19
  - LangGraph agent
20
  - Groq chat model
21
+ - Jina embeddings + Jina reranker
22
  - Supabase Postgres + `pgvector`
23
  - Railway deployment
24
 
 
30
  At question time:
31
  1. LLM-based document filtering selects relevant documents from user's library
32
  2. Vector search retrieves relevant chunks from selected documents
33
+ 3. Jina reranking reorders the retrieved chunks for better final relevance
34
+ 4. The agent answers from those chunks when possible
35
+ 5. If evidence is weak, the agent uses web search and cites external URLs
36
 
37
  ## Chunking Strategy
38
 
39
+ - Splitter: LangChain `RecursiveCharacterTextSplitter`
40
+ - Chunk size: `1000`
41
+ - Overlap: `150`
42
 
43
  Why this setup:
44
+ - It prefers breaking on paragraphs and sentence boundaries before falling back to smaller separators.
45
+ - It preserves more coherent chunks for contracts, specs, and structured PDFs.
46
+ - A smaller overlap keeps recall while reducing duplicated context in retrieval.
47
 
48
  ## Retrieval Approach
49
 
50
+ I use cosine similarity search in `pgvector`, then apply Jina reranking for better final ordering.
51
+ The system uses an LLM-based retrieval planner to choose:
52
+
53
+ - the final number of chunks to keep
54
+ - the candidate pool to rerank
55
+
56
+ Those values are clamped to safe bounds before retrieval runs.
57
+
58
+ The UI shows:
59
+
60
+ - document name
61
+ - page number
62
+ - chunk excerpt
63
+
64
+ for retrieved document sources.
65
 
66
  ## Agent Routing Logic
67
 
 
87
  ## Conversation Memory
88
 
89
  Conversation history is maintained within session scope, so follow-ups like “tell me more about that” work as expected.
90
+ The frontend also preserves the visible chat thread per session, so upload-triggered page refreshes do not wipe the current conversation view.
91
+
92
+ ## Streaming UX
93
+
94
+ Answers are streamed into the chat UI progressively.
95
+
96
+ - the visible response is rendered chunk by chunk
97
+ - source cards are attached after the answer completes
98
+ - a slight pacing delay is added so the stream feels live to the user
99
+
100
+ The streaming route is separate from the standard JSON `/ask` response path.
101
 
102
  ## Bonus Feature
103
 
 
115
 
116
  - The system sends all user documents (filename, summary, preview) to the LLM
117
  - LLM semantically analyzes and selects only truly relevant documents for the query
118
+ - Returns a JSON array of relevant file hashes
119
+ - It is not forced to return a capped number of documents
120
+ - Fallback returns all candidate document hashes if the LLM call fails
121
 
122
  ## Challenges I Ran Into
123
 
124
  1. Heavy embedding dependencies made deployment images too large.
125
+ - I standardized on Jina API embeddings/reranking to keep the runtime lighter while preserving retrieval quality.
126
  2. Source rendering got messy across multiple chat turns.
127
  - I separated answer text from source payloads and extracted sources per turn.
128
  3. Intermittent DB DNS/pooler issues during deployment.
129
  - I improved connection handling and standardized Supabase transaction-pooler config.
130
+ 4. UI state was getting lost after document uploads.
131
+ - I persisted the active chat thread in session storage so the current conversation remains visible after refresh.
132
 
133
  ## If I Had More Time
134
 
135
  - Add conversation history UI to display past chat sessions
 
136
  - Add automated citation-faithfulness checks
137
  - Add Alembic migrations for cleaner schema evolution
138
  - Add stronger eval/observability for routing and retrieval quality
 
155
  - `GROQ_API_KEY`
156
  - `SECRET_KEY`
157
  - `DATABASE_URL`
 
 
158
  - `JINA_API_KEY`
159
+
160
+ Embeddings:
161
  - `JINA_API_BASE` (default: `https://api.jina.ai/v1/embeddings`)
162
  - `JINA_EMBEDDING_MODEL` (default: `jina-embeddings-v3`)
163
+ - `JINA_RERANKER_API_BASE` (default: `https://api.jina.ai/v1/rerank`)
164
+ - `JINA_RERANKER_MODEL` (default: `jina-reranker-v3`)
165
  - `EMBEDDING_DIMENSIONS` (default: `1024`)
166
+ - `RETRIEVAL_K` (default minimum final context size: `4`)
167
+ - `RERANK_CANDIDATE_K` (default minimum rerank candidate pool: `12`)
168
 
169
  Storage:
170
  - `STORAGE_BACKEND=local|supabase`
 
177
  - `WEB_SEARCH_PROVIDER=duckduckgo|tavily`
178
  - `TAVILY_API_KEY` (if using Tavily)
179
 
180
+ Auth:
181
+ - `ACCESS_TOKEN_EXPIRE_MINUTES` (default: `720`)
182
+ - For local development, lowering this can make login/logout testing easier
183
+
184
  ## API Endpoints
185
 
186
  - `POST /register`
 
191
  - `DELETE /documents/{document_id}`
192
  - `GET /documents/{document_id}/pdf`
193
  - `POST /ask`
194
+ - `POST /ask/stream`
195
 
196
  ## Sample Documents
197
 
app/config.py CHANGED
@@ -24,6 +24,10 @@ class Settings(BaseSettings):
24
  jina_api_key: str | None = None
25
  jina_api_base: str = "https://api.jina.ai/v1/embeddings"
26
  jina_embedding_model: str = "jina-embeddings-v3"
 
 
 
 
27
  groq_api_key: str | None = None
28
  web_search_provider: str = "tavily"
29
  tavily_api_key: str | None = None
 
24
  jina_api_key: str | None = None
25
  jina_api_base: str = "https://api.jina.ai/v1/embeddings"
26
  jina_embedding_model: str = "jina-embeddings-v3"
27
+ jina_reranker_api_base: str = "https://api.jina.ai/v1/rerank"
28
+ jina_reranker_model: str = "jina-reranker-v3"
29
+ retrieval_k: int = 4
30
+ rerank_candidate_k: int = 12
31
  groq_api_key: str | None = None
32
  web_search_provider: str = "tavily"
33
  tavily_api_key: str | None = None
app/main.py CHANGED
@@ -1,4 +1,6 @@
 
1
  import re
 
2
  from typing import Any
3
 
4
  from fastapi import Cookie, Depends, FastAPI, File, Form, Header, HTTPException, Request, UploadFile, status
@@ -56,7 +58,10 @@ def _parse_vector_sources(tool_output: str) -> list[dict[str, str]]:
56
  current_page = ""
57
 
58
  for line in lines:
59
- match = re.match(r"^\s*\d+\.\s+document_id=(.*?)\s+\|\s+document=(.*?)\s+\|\s+page=(.*?)\s+\|\s+distance=", line)
 
 
 
60
  if match:
61
  current_document_id = match.group(1).strip()
62
  current_doc = match.group(2).strip()
@@ -159,6 +164,24 @@ def _strip_sources_from_answer(answer: str) -> str:
159
  return "\n".join(filtered).strip()
160
 
161
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
162
  def get_current_user(
163
  access_token: str | None = Cookie(default=None),
164
  db: Session = Depends(get_db),
@@ -349,15 +372,8 @@ def ask_question(
349
  ):
350
  document_service.ensure_page_metadata_for_user(db=db, user=user)
351
  agent = build_agent(db=db, user=user)
352
-
353
- # Use session ID from header if provided, otherwise fall back to access token or user ID
354
- if x_session_id:
355
- session_key = f"user:{user.id}:session:{x_session_id}"
356
- else:
357
- session_key = access_token or f"user:{user.id}"
358
-
359
- config = {"configurable": {"thread_id": session_key}}
360
- print(f"[Agent] thread_id: {session_key}")
361
  previous_messages: list[Any] = []
362
  try:
363
  state = agent.get_state(config)
@@ -374,9 +390,66 @@ def ask_question(
374
  answer = final_message if isinstance(final_message, str) else str(final_message)
375
  answer = _strip_sources_from_answer(answer)
376
  all_messages = result.get("messages", [])
377
- if isinstance(all_messages, list) and len(all_messages) >= len(previous_messages):
378
- current_turn_messages = all_messages[len(previous_messages):]
379
- else:
380
- current_turn_messages = all_messages if isinstance(all_messages, list) else []
381
  sources = _extract_sources_from_messages(current_turn_messages)
382
  return AskResponse(answer=answer, sources=sources)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
  import re
3
+ import time
4
  from typing import Any
5
 
6
  from fastapi import Cookie, Depends, FastAPI, File, Form, Header, HTTPException, Request, UploadFile, status
 
58
  current_page = ""
59
 
60
  for line in lines:
61
+ match = re.match(
62
+ r"^\s*\d+\.\s+document_id=(.*?)\s+\|\s+document=(.*?)\s+\|\s+page=(.*?)\s+\|\s+distance=.*?(?:\s+\|\s+rerank_score=(.*?))?\s*$",
63
+ line,
64
+ )
65
  if match:
66
  current_document_id = match.group(1).strip()
67
  current_doc = match.group(2).strip()
 
164
  return "\n".join(filtered).strip()
165
 
166
 
167
+ def _get_current_turn_messages(*, previous_messages: list[Any], all_messages: list[Any]) -> list[Any]:
168
+ if len(all_messages) >= len(previous_messages):
169
+ return all_messages[len(previous_messages) :]
170
+ return all_messages
171
+
172
+
173
+ def _build_agent_config(*, user: User, access_token: str | None, x_session_id: str | None) -> dict[str, Any]:
174
+ if x_session_id:
175
+ session_key = f"user:{user.id}:session:{x_session_id}"
176
+ else:
177
+ session_key = access_token or f"user:{user.id}"
178
+ return {"configurable": {"thread_id": session_key}}
179
+
180
+
181
+ def _sse_event(event: str, data: dict[str, Any]) -> str:
182
+ return f"event: {event}\ndata: {json.dumps(data)}\n\n"
183
+
184
+
185
  def get_current_user(
186
  access_token: str | None = Cookie(default=None),
187
  db: Session = Depends(get_db),
 
372
  ):
373
  document_service.ensure_page_metadata_for_user(db=db, user=user)
374
  agent = build_agent(db=db, user=user)
375
+ config = _build_agent_config(user=user, access_token=access_token, x_session_id=x_session_id)
376
+ print(f"[Agent] thread_id: {config['configurable']['thread_id']}")
 
 
 
 
 
 
 
377
  previous_messages: list[Any] = []
378
  try:
379
  state = agent.get_state(config)
 
390
  answer = final_message if isinstance(final_message, str) else str(final_message)
391
  answer = _strip_sources_from_answer(answer)
392
  all_messages = result.get("messages", [])
393
+ current_turn_messages = _get_current_turn_messages(
394
+ previous_messages=previous_messages,
395
+ all_messages=all_messages if isinstance(all_messages, list) else [],
396
+ )
397
  sources = _extract_sources_from_messages(current_turn_messages)
398
  return AskResponse(answer=answer, sources=sources)
399
+
400
+
401
+ @app.post("/ask/stream")
402
+ def ask_question_stream(
403
+ payload: AskRequest,
404
+ db: Session = Depends(get_db),
405
+ user: User = Depends(get_current_user),
406
+ access_token: str | None = Cookie(default=None),
407
+ x_session_id: str | None = Header(default=None, alias="X-Session-Id"),
408
+ ):
409
+ document_service.ensure_page_metadata_for_user(db=db, user=user)
410
+ agent = build_agent(db=db, user=user)
411
+ config = _build_agent_config(user=user, access_token=access_token, x_session_id=x_session_id)
412
+
413
+ previous_messages: list[Any] = []
414
+ try:
415
+ state = agent.get_state(config)
416
+ values = getattr(state, "values", {}) or {}
417
+ maybe_messages = values.get("messages", [])
418
+ if isinstance(maybe_messages, list):
419
+ previous_messages = maybe_messages
420
+ except Exception:
421
+ previous_messages = []
422
+
423
+ def event_stream():
424
+ try:
425
+ result = agent.invoke({"messages": [("user", payload.query)]}, config=config)
426
+ all_messages = result.get("messages", [])
427
+ all_messages = all_messages if isinstance(all_messages, list) else []
428
+ current_turn_messages = _get_current_turn_messages(previous_messages=previous_messages, all_messages=all_messages)
429
+ if current_turn_messages:
430
+ final_message = current_turn_messages[-1].content
431
+ final_answer = final_message if isinstance(final_message, str) else str(final_message)
432
+ final_answer = _strip_sources_from_answer(final_answer)
433
+ else:
434
+ final_answer = ""
435
+
436
+ chunk_size = 24
437
+ for index in range(0, len(final_answer), chunk_size):
438
+ yield _sse_event("token", {"content": final_answer[index : index + chunk_size]})
439
+ time.sleep(0.03)
440
+
441
+ sources = _extract_sources_from_messages(current_turn_messages)
442
+ yield _sse_event("sources", {"sources": sources})
443
+ yield _sse_event("done", {"answer": final_answer})
444
+ except Exception as exc:
445
+ yield _sse_event("error", {"detail": str(exc)})
446
+
447
+ return StreamingResponse(
448
+ event_stream(),
449
+ media_type="text/event-stream",
450
+ headers={
451
+ "Cache-Control": "no-cache",
452
+ "Connection": "keep-alive",
453
+ "X-Accel-Buffering": "no",
454
+ },
455
+ )
app/services/agent.py CHANGED
@@ -18,10 +18,6 @@ from app.services.web_search import build_web_search_tool
18
 
19
  class VectorSearchInput(BaseModel):
20
  query: str = Field(..., description="The user question to answer from uploaded documents.")
21
- file_hashes: list[str] | None = Field(
22
- default=None,
23
- description="Optional document hashes to filter search. Leave empty to auto-resolve relevant documents for the current user.",
24
- )
25
 
26
 
27
  LANGGRAPH_CHECKPOINTER = MemorySaver()
@@ -43,11 +39,11 @@ def build_agent(*, db: Session, user: User):
43
  vector_store = VectorStoreService()
44
  web_search_tool = build_web_search_tool()
45
 
46
- def vector_search(query: str, file_hashes: list[str] | None = None) -> str:
47
- resolved_hashes = file_hashes or document_service.resolve_relevant_document_hashes(db, user=user, query=query)
48
  if not resolved_hashes:
49
  return "No uploaded documents are available for this user."
50
- matches = vector_store.similarity_search(db=db, query=query, file_hashes=resolved_hashes, k=4)
51
  if not matches:
52
  return f"No vector matches found for hashes: {resolved_hashes}"
53
  lines = ["Vector evidence (cite document + page + excerpt in final answer):"]
@@ -55,9 +51,10 @@ def build_agent(*, db: Session, user: User):
55
  page_number = match["metadata"].get("page_number")
56
  page_label = str(page_number) if page_number is not None else "unknown"
57
  document_id = match["metadata"].get("document_id")
58
- lines.append(
59
- f"{index}. document_id={document_id} | document={match['metadata']['filename']} | page={page_label} | distance={match['distance']:.4f}"
60
- )
 
61
  lines.append(f" excerpt: {match['content'][:900].replace(chr(10), ' ')}")
62
  return "\n\n".join(lines)
63
 
@@ -66,8 +63,7 @@ def build_agent(*, db: Session, user: User):
66
  name="vector_search",
67
  description=(
68
  "Searches the current user's uploaded documents. "
69
- "If file hashes are omitted, the tool first finds the most relevant document hashes from stored metadata and summary, "
70
- "then applies those hashes as a vector-search filter."
71
  ),
72
  args_schema=VectorSearchInput,
73
  )
 
18
 
19
  class VectorSearchInput(BaseModel):
20
  query: str = Field(..., description="The user question to answer from uploaded documents.")
 
 
 
 
21
 
22
 
23
  LANGGRAPH_CHECKPOINTER = MemorySaver()
 
39
  vector_store = VectorStoreService()
40
  web_search_tool = build_web_search_tool()
41
 
42
+ def vector_search(query: str) -> str:
43
+ resolved_hashes = document_service.resolve_relevant_document_hashes(db, user=user, query=query)
44
  if not resolved_hashes:
45
  return "No uploaded documents are available for this user."
46
+ matches = vector_store.similarity_search(db=db, query=query, file_hashes=resolved_hashes, k=settings.retrieval_k)
47
  if not matches:
48
  return f"No vector matches found for hashes: {resolved_hashes}"
49
  lines = ["Vector evidence (cite document + page + excerpt in final answer):"]
 
51
  page_number = match["metadata"].get("page_number")
52
  page_label = str(page_number) if page_number is not None else "unknown"
53
  document_id = match["metadata"].get("document_id")
54
+ score_parts = [f"distance={match['distance']:.4f}"]
55
+ if "rerank_score" in match:
56
+ score_parts.append(f"rerank_score={match['rerank_score']:.4f}")
57
+ lines.append(f"{index}. document_id={document_id} | document={match['metadata']['filename']} | page={page_label} | {' | '.join(score_parts)}")
58
  lines.append(f" excerpt: {match['content'][:900].replace(chr(10), ' ')}")
59
  return "\n\n".join(lines)
60
 
 
63
  name="vector_search",
64
  description=(
65
  "Searches the current user's uploaded documents. "
66
+ "The tool automatically resolves the most relevant documents for the current user before chunk retrieval."
 
67
  ),
68
  args_schema=VectorSearchInput,
69
  )
app/services/document_service.py CHANGED
@@ -131,17 +131,17 @@ class DocumentService:
131
  "deleted_shared_document": deleted_shared_document,
132
  }
133
 
134
- def resolve_relevant_document_hashes(self, db: Session, *, user: User, query: str, limit: int = 5) -> list[str]:
135
  docs = self.list_user_documents(db, user)
136
  if not docs:
137
  return []
138
 
139
  # Send all documents to LLM for semantic matching
140
- matched_hashes = self._llm_filter_documents(query=query, candidates=docs, limit=limit)
141
  print("Documents Matched ----->", matched_hashes)
142
  return matched_hashes
143
 
144
- def _llm_filter_documents(self, *, query: str, candidates: list[Document], limit: int) -> list[str]:
145
  if not self.settings.groq_api_key or not candidates:
146
  return []
147
  if self.matcher_llm is None:
@@ -163,9 +163,8 @@ class DocumentService:
163
  "Consider semantic similarity, topic alignment, and document purpose.\n\n"
164
  "IMPORTANT: Only include documents that are actually relevant to answering the query.\n"
165
  "It's better to return fewer relevant documents than to include irrelevant ones.\n"
166
- f"You may return anywhere from 0 to {limit} documents.\n\n"
167
- "Return ONLY valid JSON with this exact schema:\n"
168
- '{"file_hashes": ["<hash1>", "<hash2>", ...]}\n\n'
169
  f"User query: {query}\n\n"
170
  f"Available documents:\n{json.dumps(payload, ensure_ascii=True, indent=2)}"
171
  )
@@ -180,12 +179,11 @@ class DocumentService:
180
  content = content.split("```")[1].split("```")[0].strip()
181
 
182
  data = json.loads(content)
183
- hashes = data.get("file_hashes", [])
184
  valid = {item.get("file_hash", "") for item in payload}
185
- return [value for value in hashes if isinstance(value, str) and value in valid][:limit]
186
  except Exception:
187
- # Fallback: return first N documents
188
- return [doc.file_hash for doc in candidates[:limit]]
189
 
190
  def ensure_page_metadata_for_user(self, *, db: Session, user: User) -> None:
191
  docs = self.list_user_documents(db, user)
 
131
  "deleted_shared_document": deleted_shared_document,
132
  }
133
 
134
+ def resolve_relevant_document_hashes(self, db: Session, *, user: User, query: str) -> list[str]:
135
  docs = self.list_user_documents(db, user)
136
  if not docs:
137
  return []
138
 
139
  # Send all documents to LLM for semantic matching
140
+ matched_hashes = self._llm_filter_documents(query=query, candidates=docs)
141
  print("Documents Matched ----->", matched_hashes)
142
  return matched_hashes
143
 
144
+ def _llm_filter_documents(self, *, query: str, candidates: list[Document]) -> list[str]:
145
  if not self.settings.groq_api_key or not candidates:
146
  return []
147
  if self.matcher_llm is None:
 
163
  "Consider semantic similarity, topic alignment, and document purpose.\n\n"
164
  "IMPORTANT: Only include documents that are actually relevant to answering the query.\n"
165
  "It's better to return fewer relevant documents than to include irrelevant ones.\n"
166
+ "Return ONLY a valid JSON array of relevant file hashes, for example:\n"
167
+ '["<hash1>", "<hash2>"]\n\n'
 
168
  f"User query: {query}\n\n"
169
  f"Available documents:\n{json.dumps(payload, ensure_ascii=True, indent=2)}"
170
  )
 
179
  content = content.split("```")[1].split("```")[0].strip()
180
 
181
  data = json.loads(content)
182
+ hashes = data if isinstance(data, list) else []
183
  valid = {item.get("file_hash", "") for item in payload}
184
+ return [value for value in hashes if isinstance(value, str) and value in valid]
185
  except Exception:
186
+ return [doc.file_hash for doc in candidates]
 
187
 
188
  def ensure_page_metadata_for_user(self, *, db: Session, user: User) -> None:
189
  docs = self.list_user_documents(db, user)
app/services/vector_store.py CHANGED
@@ -1,71 +1,16 @@
1
- import hashlib
2
- import math
3
- import re
4
  from typing import Any
5
 
6
  import requests
7
- from sqlalchemy import delete, select
 
 
8
  from sqlalchemy.orm import Session
9
 
10
  from app.config import get_settings
11
  from app.models import DocumentChunk
12
 
13
 
14
- class SimpleTextSplitter:
15
- def __init__(self, *, chunk_size: int, chunk_overlap: int) -> None:
16
- self.chunk_size = chunk_size
17
- self.chunk_overlap = chunk_overlap
18
-
19
- def split_text(self, text: str) -> list[str]:
20
- normalized = text.strip()
21
- if not normalized:
22
- return []
23
- if len(normalized) <= self.chunk_size:
24
- return [normalized]
25
-
26
- chunks: list[str] = []
27
- start = 0
28
- step = max(1, self.chunk_size - self.chunk_overlap)
29
- text_length = len(normalized)
30
- while start < text_length:
31
- end = min(text_length, start + self.chunk_size)
32
- chunk = normalized[start:end].strip()
33
- if chunk:
34
- chunks.append(chunk)
35
- if end >= text_length:
36
- break
37
- start += step
38
- return chunks
39
-
40
-
41
- class LocalHashEmbeddings:
42
- def __init__(self, dimensions: int) -> None:
43
- self.dimensions = dimensions
44
-
45
- def embed_documents(self, texts: list[str]) -> list[list[float]]:
46
- return [self._embed_text(text) for text in texts]
47
-
48
- def embed_query(self, text: str) -> list[float]:
49
- return self._embed_text(text)
50
-
51
- def _embed_text(self, text: str) -> list[float]:
52
- vector = [0.0] * self.dimensions
53
- tokens = re.findall(r"\w+", text.lower())
54
- if not tokens:
55
- return vector
56
-
57
- for token in tokens:
58
- digest = hashlib.sha256(token.encode("utf-8")).digest()
59
- bucket = int.from_bytes(digest[:4], "big") % self.dimensions
60
- sign = 1.0 if digest[4] % 2 == 0 else -1.0
61
- vector[bucket] += sign
62
-
63
- norm = math.sqrt(sum(value * value for value in vector))
64
- if norm == 0:
65
- return vector
66
- return [value / norm for value in vector]
67
-
68
-
69
  class JinaEmbeddings:
70
  def __init__(self, *, api_key: str, base_url: str, model: str, dimensions: int) -> None:
71
  self.api_key = api_key
@@ -114,24 +59,161 @@ class JinaEmbeddings:
114
  return validated
115
 
116
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  class VectorStoreService:
118
  def __init__(self) -> None:
119
- self.splitter = SimpleTextSplitter(chunk_size=1200, chunk_overlap=200)
120
- settings = get_settings()
121
- if settings.jina_api_key:
122
- self.embeddings = JinaEmbeddings(
123
- api_key=settings.jina_api_key,
124
- base_url=settings.jina_api_base,
125
- model=settings.jina_embedding_model,
126
- dimensions=settings.embedding_dimensions,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
127
  )
128
- else:
129
- # Lightweight fallback when hosted embedding credentials are not configured.
130
- self.embeddings = LocalHashEmbeddings(settings.embedding_dimensions)
 
 
 
 
 
131
 
132
  def _get_embeddings(self) -> Any:
133
  return self.embeddings
134
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
  def add_document(self, *, db: Session, document_id: int, file_hash: str, filename: str, pages: list[tuple[int, str]]) -> None:
136
  chunk_rows: list[tuple[int | None, str]] = []
137
  for page_number, page_text in pages:
@@ -163,6 +245,14 @@ class VectorStoreService:
163
  def similarity_search(self, *, db: Session, query: str, file_hashes: list[str], k: int = 4) -> list[dict[str, Any]]:
164
  if not file_hashes:
165
  return []
 
 
 
 
 
 
 
 
166
  query_embedding = self._get_embeddings().embed_query(query)
167
  stmt = (
168
  select(
@@ -176,7 +266,7 @@ class VectorStoreService:
176
  )
177
  .where(DocumentChunk.file_hash.in_(file_hashes))
178
  .order_by(DocumentChunk.embedding.cosine_distance(query_embedding))
179
- .limit(k)
180
  )
181
  results = db.execute(stmt).all()
182
  matches: list[dict[str, Any]] = []
@@ -194,4 +284,4 @@ class VectorStoreService:
194
  "distance": row.distance,
195
  }
196
  )
197
- return matches
 
1
+ import json
 
 
2
  from typing import Any
3
 
4
  import requests
5
+ from langchain_groq import ChatGroq
6
+ from langchain_text_splitters import RecursiveCharacterTextSplitter
7
+ from sqlalchemy import delete, func, select
8
  from sqlalchemy.orm import Session
9
 
10
  from app.config import get_settings
11
  from app.models import DocumentChunk
12
 
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  class JinaEmbeddings:
15
  def __init__(self, *, api_key: str, base_url: str, model: str, dimensions: int) -> None:
16
  self.api_key = api_key
 
59
  return validated
60
 
61
 
62
+ class JinaReranker:
63
+ def __init__(self, *, api_key: str, base_url: str, model: str) -> None:
64
+ self.api_key = api_key
65
+ self.base_url = base_url
66
+ self.model = model
67
+
68
+ def rerank(self, *, query: str, documents: list[str], top_n: int) -> list[dict[str, Any]]:
69
+ if not documents:
70
+ return []
71
+
72
+ response = requests.post(
73
+ self.base_url,
74
+ headers={
75
+ "Content-Type": "application/json",
76
+ "Authorization": f"Bearer {self.api_key}",
77
+ },
78
+ json={
79
+ "model": self.model,
80
+ "query": query,
81
+ "top_n": top_n,
82
+ "documents": documents,
83
+ "return_documents": False,
84
+ },
85
+ timeout=60,
86
+ )
87
+ response.raise_for_status()
88
+ return response.json().get("results", [])
89
+
90
+
91
  class VectorStoreService:
92
  def __init__(self) -> None:
93
+ self.settings = get_settings()
94
+ if not self.settings.jina_api_key:
95
+ raise RuntimeError("JINA_API_KEY is required for document embedding and retrieval.")
96
+
97
+ self.splitter = RecursiveCharacterTextSplitter(
98
+ chunk_size=1000,
99
+ chunk_overlap=150,
100
+ separators=[
101
+ "\n\n",
102
+ "\n",
103
+ ". ",
104
+ "? ",
105
+ "! ",
106
+ "; ",
107
+ ", ",
108
+ " ",
109
+ "",
110
+ ],
111
+ keep_separator=True,
112
+ )
113
+ self.embeddings = JinaEmbeddings(
114
+ api_key=self.settings.jina_api_key,
115
+ base_url=self.settings.jina_api_base,
116
+ model=self.settings.jina_embedding_model,
117
+ dimensions=self.settings.embedding_dimensions,
118
+ )
119
+ self.retrieval_router = (
120
+ ChatGroq(
121
+ api_key=self.settings.groq_api_key,
122
+ model=self.settings.model_name,
123
+ temperature=0,
124
  )
125
+ if self.settings.groq_api_key
126
+ else None
127
+ )
128
+ self.reranker = JinaReranker(
129
+ api_key=self.settings.jina_api_key,
130
+ base_url=self.settings.jina_reranker_api_base,
131
+ model=self.settings.jina_reranker_model,
132
+ )
133
 
134
  def _get_embeddings(self) -> Any:
135
  return self.embeddings
136
 
137
+ def _choose_retrieval_sizes(
138
+ self,
139
+ *,
140
+ db: Session,
141
+ query: str,
142
+ file_hashes: list[str],
143
+ requested_k: int,
144
+ ) -> tuple[int, int]:
145
+ available_chunks = db.scalar(
146
+ select(func.count())
147
+ .select_from(DocumentChunk)
148
+ .where(DocumentChunk.file_hash.in_(file_hashes))
149
+ ) or 0
150
+ if available_chunks <= 0:
151
+ return 0, 0
152
+
153
+ if self.retrieval_router is None:
154
+ raise RuntimeError("GROQ_API_KEY is required for LLM-based retrieval size selection.")
155
+
156
+ prompt = (
157
+ "You are a retrieval planner for a RAG system.\n"
158
+ "Choose how many chunks to keep after reranking and how many vector candidates to send to the reranker.\n"
159
+ "Return only valid JSON with this exact schema:\n"
160
+ '{"final_k": 4, "candidate_k": 12}\n\n'
161
+ "Rules:\n"
162
+ f"- final_k must be between 1 and {min(8, available_chunks)}\n"
163
+ f"- candidate_k must be between final_k and {min(30, available_chunks)}\n"
164
+ "- candidate_k should usually be around 2x to 4x final_k\n"
165
+ "- Use larger values for broad, comparative, or synthesis-heavy queries\n"
166
+ "- Use smaller values for narrow fact lookup queries\n\n"
167
+ f"Query: {query}\n"
168
+ f"Selected documents: {len(file_hashes)}\n"
169
+ f"Available chunks: {available_chunks}\n"
170
+ f"Requested final_k hint: {requested_k}\n"
171
+ f"Configured minimum final_k: {self.settings.retrieval_k}\n"
172
+ f"Configured minimum candidate_k: {self.settings.rerank_candidate_k}\n"
173
+ )
174
+
175
+ response = self.retrieval_router.invoke(prompt)
176
+ content = response.content if isinstance(response.content, str) else str(response.content)
177
+ if "```json" in content:
178
+ content = content.split("```json", 1)[1].split("```", 1)[0].strip()
179
+ elif "```" in content:
180
+ content = content.split("```", 1)[1].split("```", 1)[0].strip()
181
+ data = json.loads(content)
182
+ final_k = int(data["final_k"])
183
+ candidate_k = int(data["candidate_k"])
184
+
185
+ final_k = max(1, min(final_k, available_chunks, 8))
186
+ candidate_floor = max(final_k, self.settings.rerank_candidate_k)
187
+ candidate_k = max(final_k, candidate_k)
188
+ candidate_k = min(max(candidate_floor, candidate_k), available_chunks, 30)
189
+ return final_k, candidate_k
190
+
191
+ def _rerank_matches(self, *, query: str, matches: list[dict[str, Any]], top_n: int) -> list[dict[str, Any]]:
192
+ if self.reranker is None or not matches:
193
+ return matches[:top_n]
194
+
195
+ try:
196
+ results = self.reranker.rerank(
197
+ query=query,
198
+ documents=[match["content"] for match in matches],
199
+ top_n=min(top_n, len(matches)),
200
+ )
201
+ except requests.RequestException:
202
+ return matches[:top_n]
203
+
204
+ reranked: list[dict[str, Any]] = []
205
+ for item in results:
206
+ index = item.get("index")
207
+ if not isinstance(index, int) or index < 0 or index >= len(matches):
208
+ continue
209
+ match = dict(matches[index])
210
+ score = item.get("relevance_score")
211
+ if isinstance(score, (int, float)):
212
+ match["rerank_score"] = float(score)
213
+ reranked.append(match)
214
+
215
+ return reranked or matches[:top_n]
216
+
217
  def add_document(self, *, db: Session, document_id: int, file_hash: str, filename: str, pages: list[tuple[int, str]]) -> None:
218
  chunk_rows: list[tuple[int | None, str]] = []
219
  for page_number, page_text in pages:
 
245
  def similarity_search(self, *, db: Session, query: str, file_hashes: list[str], k: int = 4) -> list[dict[str, Any]]:
246
  if not file_hashes:
247
  return []
248
+ final_k, candidate_k = self._choose_retrieval_sizes(
249
+ db=db,
250
+ query=query,
251
+ file_hashes=file_hashes,
252
+ requested_k=k,
253
+ )
254
+ if final_k == 0:
255
+ return []
256
  query_embedding = self._get_embeddings().embed_query(query)
257
  stmt = (
258
  select(
 
266
  )
267
  .where(DocumentChunk.file_hash.in_(file_hashes))
268
  .order_by(DocumentChunk.embedding.cosine_distance(query_embedding))
269
+ .limit(candidate_k)
270
  )
271
  results = db.execute(stmt).all()
272
  matches: list[dict[str, Any]] = []
 
284
  "distance": row.distance,
285
  }
286
  )
287
+ return self._rerank_matches(query=query, matches=matches, top_n=final_k)
app/static/style.css CHANGED
@@ -25,6 +25,7 @@
25
  body {
26
  margin: 0;
27
  min-height: 100vh;
 
28
  color: var(--ink-1);
29
  font-family: "Space Grotesk", "Helvetica Neue", sans-serif;
30
  background:
@@ -64,25 +65,14 @@ body {
64
  width: min(1460px, calc(100% - 2.5rem));
65
  margin: 0 auto;
66
  padding: 1.6rem 0 3.4rem;
 
 
67
  }
68
 
69
  .hero {
70
  margin-bottom: 1rem;
71
  }
72
 
73
- .workspace-strip {
74
- margin-bottom: 0.95rem;
75
- display: flex;
76
- justify-content: space-between;
77
- align-items: flex-start;
78
- gap: 1rem;
79
- }
80
-
81
- .workspace-strip h1 {
82
- font-size: clamp(1.65rem, 3.5vw, 2.2rem);
83
- margin-top: 0.35rem;
84
- }
85
-
86
  .hero-topline {
87
  display: flex;
88
  align-items: center;
@@ -149,6 +139,14 @@ button {
149
  margin: 0.35rem 0 0;
150
  }
151
 
 
 
 
 
 
 
 
 
152
  .grid {
153
  display: grid;
154
  gap: 1rem;
@@ -210,23 +208,54 @@ button {
210
  grid-template-columns: minmax(360px, 430px) 1fr;
211
  gap: 1.2rem;
212
  align-items: start;
 
 
213
  }
214
 
215
  .sidebar-panel {
216
  position: sticky;
217
- top: 1rem;
 
218
  max-height: calc(100vh - 2rem);
219
- overflow: auto;
 
220
  gap: 1rem;
221
  padding: 1.55rem;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
222
  }
223
 
224
  .sidebar-docs {
225
- max-height: 52vh;
 
226
  overflow: auto;
227
  padding-right: 0.25rem;
228
  }
229
 
 
 
 
 
 
230
  .user-email {
231
  overflow-wrap: anywhere;
232
  }
@@ -334,11 +363,16 @@ button.danger {
334
 
335
  .chat-shell {
336
  gap: 0.75rem;
 
 
337
  }
338
 
339
  .chat-panel {
340
- min-height: 78vh;
 
341
  padding: 1.55rem;
 
 
342
  }
343
 
344
  .chat-thread {
@@ -346,8 +380,8 @@ button.danger {
346
  background: rgba(255, 250, 243, 0.7);
347
  border-radius: 14px;
348
  padding: 0.9rem;
349
- min-height: 56vh;
350
- max-height: 68vh;
351
  overflow-y: auto;
352
  display: grid;
353
  gap: 0.65rem;
@@ -403,10 +437,11 @@ button.danger {
403
  padding-top: 0.35rem;
404
  border-top: 1px solid var(--line);
405
  background: linear-gradient(180deg, rgba(255, 252, 247, 0), rgba(255, 252, 247, 0.92) 35%);
 
406
  }
407
 
408
  .chat-composer textarea {
409
- min-height: 118px;
410
  font-size: 1rem;
411
  }
412
 
@@ -420,6 +455,24 @@ button.danger {
420
  margin-top: 0;
421
  }
422
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
423
  .chat-markdown p:last-child {
424
  margin-bottom: 0;
425
  }
@@ -594,14 +647,26 @@ button.danger {
594
  .doc-head {
595
  display: flex;
596
  justify-content: space-between;
597
- align-items: center;
598
  gap: 0.8rem;
599
  }
600
 
 
 
 
 
 
601
  .doc-pages {
 
 
 
602
  color: #684b29;
603
  font-size: 0.8rem;
604
- padding: 0.2rem 0.55rem;
 
 
 
 
605
  border-radius: 999px;
606
  border: 1px solid rgba(178, 74, 0, 0.24);
607
  background: rgba(255, 206, 140, 0.3);
@@ -633,10 +698,32 @@ code {
633
  }
634
  }
635
 
636
- @media (max-width: 720px) {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
637
  .shell {
638
  width: min(1120px, calc(100% - 1rem));
639
  padding-top: 0.95rem;
 
 
640
  }
641
 
642
  .toolbar {
@@ -653,24 +740,26 @@ code {
653
 
654
  .app-layout {
655
  grid-template-columns: 1fr;
656
- }
657
-
658
- .workspace-strip {
659
- flex-direction: column;
660
- align-items: flex-start;
661
  }
662
 
663
  .sidebar-panel {
664
  position: static;
 
665
  max-height: none;
 
 
666
  }
667
 
668
  .sidebar-docs {
 
669
  max-height: none;
 
670
  }
671
 
672
  .chat-panel {
673
  min-height: 68vh;
 
674
  }
675
 
676
  .chat-thread {
 
25
  body {
26
  margin: 0;
27
  min-height: 100vh;
28
+ overflow: hidden;
29
  color: var(--ink-1);
30
  font-family: "Space Grotesk", "Helvetica Neue", sans-serif;
31
  background:
 
65
  width: min(1460px, calc(100% - 2.5rem));
66
  margin: 0 auto;
67
  padding: 1.6rem 0 3.4rem;
68
+ height: 100vh;
69
+ overflow: hidden;
70
  }
71
 
72
  .hero {
73
  margin-bottom: 1rem;
74
  }
75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
  .hero-topline {
77
  display: flex;
78
  align-items: center;
 
139
  margin: 0.35rem 0 0;
140
  }
141
 
142
+ .developer-credit {
143
+ margin: 0.45rem 0 0;
144
+ color: #7b4a22;
145
+ font-size: 0.84rem;
146
+ font-weight: 600;
147
+ letter-spacing: 0.02em;
148
+ }
149
+
150
  .grid {
151
  display: grid;
152
  gap: 1rem;
 
208
  grid-template-columns: minmax(360px, 430px) 1fr;
209
  gap: 1.2rem;
210
  align-items: start;
211
+ margin-top: 0.35rem;
212
+ height: calc(100vh - 2rem);
213
  }
214
 
215
  .sidebar-panel {
216
  position: sticky;
217
+ top: 0.6rem;
218
+ height: calc(100vh - 2rem);
219
  max-height: calc(100vh - 2rem);
220
+ min-height: 0;
221
+ overflow: hidden;
222
  gap: 1rem;
223
  padding: 1.55rem;
224
+ display: flex;
225
+ flex-direction: column;
226
+ }
227
+
228
+ .sidebar-title-row {
229
+ display: flex;
230
+ justify-content: space-between;
231
+ align-items: flex-start;
232
+ gap: 0.9rem;
233
+ flex-wrap: wrap;
234
+ }
235
+
236
+ .sidebar-title {
237
+ font-size: clamp(1.6rem, 3vw, 2.25rem);
238
+ margin-top: 0.25rem;
239
+ line-height: 1.05;
240
+ }
241
+
242
+ .account-head {
243
+ padding-top: 0.2rem;
244
+ border-top: 1px solid var(--line);
245
  }
246
 
247
  .sidebar-docs {
248
+ flex: 1 1 auto;
249
+ min-height: 0;
250
  overflow: auto;
251
  padding-right: 0.25rem;
252
  }
253
 
254
+ #logout-form {
255
+ margin-top: auto;
256
+ flex-shrink: 0;
257
+ }
258
+
259
  .user-email {
260
  overflow-wrap: anywhere;
261
  }
 
363
 
364
  .chat-shell {
365
  gap: 0.75rem;
366
+ height: 100%;
367
+ min-height: 0;
368
  }
369
 
370
  .chat-panel {
371
+ min-height: 0;
372
+ height: 100%;
373
  padding: 1.55rem;
374
+ display: flex;
375
+ flex-direction: column;
376
  }
377
 
378
  .chat-thread {
 
380
  background: rgba(255, 250, 243, 0.7);
381
  border-radius: 14px;
382
  padding: 0.9rem;
383
+ min-height: 0;
384
+ flex: 1 1 auto;
385
  overflow-y: auto;
386
  display: grid;
387
  gap: 0.65rem;
 
437
  padding-top: 0.35rem;
438
  border-top: 1px solid var(--line);
439
  background: linear-gradient(180deg, rgba(255, 252, 247, 0), rgba(255, 252, 247, 0.92) 35%);
440
+ flex-shrink: 0;
441
  }
442
 
443
  .chat-composer textarea {
444
+ min-height: 104px;
445
  font-size: 1rem;
446
  }
447
 
 
455
  margin-top: 0;
456
  }
457
 
458
+ .source-meta-right {
459
+ display: inline-flex;
460
+ align-items: center;
461
+ gap: 0.45rem;
462
+ flex-wrap: wrap;
463
+ justify-content: flex-end;
464
+ }
465
+
466
+ .source-score {
467
+ border: 1px solid rgba(178, 74, 0, 0.22);
468
+ background: linear-gradient(135deg, rgba(255, 187, 107, 0.24), rgba(255, 123, 0, 0.12));
469
+ color: #8d3600;
470
+ padding: 0.18rem 0.5rem;
471
+ border-radius: 999px;
472
+ font-size: 0.74rem;
473
+ font-weight: 600;
474
+ }
475
+
476
  .chat-markdown p:last-child {
477
  margin-bottom: 0;
478
  }
 
647
  .doc-head {
648
  display: flex;
649
  justify-content: space-between;
650
+ align-items: flex-start;
651
  gap: 0.8rem;
652
  }
653
 
654
+ .doc-head h3 {
655
+ flex: 1 1 auto;
656
+ min-width: 0;
657
+ }
658
+
659
  .doc-pages {
660
+ display: inline-flex;
661
+ align-items: center;
662
+ justify-content: center;
663
  color: #684b29;
664
  font-size: 0.8rem;
665
+ min-width: max-content;
666
+ padding: 0.28rem 0.8rem;
667
+ line-height: 1.2;
668
+ white-space: nowrap;
669
+ flex-shrink: 0;
670
  border-radius: 999px;
671
  border: 1px solid rgba(178, 74, 0, 0.24);
672
  background: rgba(255, 206, 140, 0.3);
 
698
  }
699
  }
700
 
701
+ @media (max-width: 1120px) {
702
+ .app-layout {
703
+ grid-template-columns: minmax(320px, 380px) 1fr;
704
+ }
705
+
706
+ .sidebar-title-row {
707
+ flex-direction: column;
708
+ align-items: flex-start;
709
+ gap: 0.55rem;
710
+ }
711
+
712
+ .sidebar-title {
713
+ font-size: clamp(1.4rem, 5vw, 2rem);
714
+ }
715
+
716
+ .badge {
717
+ white-space: normal;
718
+ }
719
+ }
720
+
721
+ @media (max-width: 820px) {
722
  .shell {
723
  width: min(1120px, calc(100% - 1rem));
724
  padding-top: 0.95rem;
725
+ height: auto;
726
+ overflow: visible;
727
  }
728
 
729
  .toolbar {
 
740
 
741
  .app-layout {
742
  grid-template-columns: 1fr;
743
+ height: auto;
 
 
 
 
744
  }
745
 
746
  .sidebar-panel {
747
  position: static;
748
+ height: auto;
749
  max-height: none;
750
+ min-height: auto;
751
+ overflow: visible;
752
  }
753
 
754
  .sidebar-docs {
755
+ flex: initial;
756
  max-height: none;
757
+ min-height: 0;
758
  }
759
 
760
  .chat-panel {
761
  min-height: 68vh;
762
+ height: auto;
763
  }
764
 
765
  .chat-thread {
app/templates/index.html CHANGED
@@ -27,6 +27,7 @@
27
  <p class="lede">
28
  Upload PDFs, avoid duplicate reprocessing by file hash, and ask an agent that uses user-scoped document retrieval with optional web search.
29
  </p>
 
30
  {% if db_unavailable %}
31
  <p class="db-warning">
32
  Database connection is temporarily unavailable. This is usually a transient DNS/network issue with the Supabase host. Please retry shortly.
@@ -34,14 +35,6 @@
34
  {% endif %}
35
  </section>
36
  {% else %}
37
- <section class="workspace-strip card">
38
- <div>
39
- <p class="eyebrow">LangGraph Assignment</p>
40
- <h1>DocsQA Workspace</h1>
41
- <p class="muted">Private document chat with structured sources.</p>
42
- </div>
43
- <span class="badge">FastAPI + Supabase + PGVector</span>
44
- </section>
45
  {% endif %}
46
 
47
  {% if not user %}
@@ -72,6 +65,18 @@
72
  <section class="app-layout">
73
  <aside class="card panel sidebar-panel">
74
  <div class="panel-head">
 
 
 
 
 
 
 
 
 
 
 
 
75
  <h2 class="user-email">{{ user.email }}</h2>
76
  <p class="muted">Your uploaded docs are private to this account.</p>
77
  </div>
@@ -148,6 +153,7 @@
148
  // Session management
149
  let currentSessionId = sessionStorage.getItem("chat_session_id") || `session_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
150
  sessionStorage.setItem("chat_session_id", currentSessionId);
 
151
 
152
  const registerForm = document.getElementById("register-form");
153
  const loginForm = document.getElementById("login-form");
@@ -162,19 +168,38 @@
162
  const docDeleteButtons = document.querySelectorAll(".doc-delete-btn");
163
  const newChatBtn = document.getElementById("new-chat-btn");
164
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165
  // New Chat button handler
166
  newChatBtn?.addEventListener("click", () => {
167
  currentSessionId = `session_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
168
  sessionStorage.setItem("chat_session_id", currentSessionId);
169
- if (chatThread) {
170
- chatThread.innerHTML = `
171
- <article class="chat-msg assistant">
172
- <div class="chat-bubble chat-bubble-assistant chat-markdown">
173
- <p>Ask anything about your uploaded PDFs and I will answer with citations from retrieved chunks.</p>
174
- </div>
175
- </article>
176
- `;
177
- }
178
  });
179
 
180
  const safeJson = async (response) => {
@@ -327,36 +352,9 @@
327
  return "_No citations available for this turn._";
328
  };
329
 
330
- const sourceStopwords = new Set([
331
- "the", "and", "for", "with", "from", "that", "this", "what", "who", "how", "are", "was", "were", "is",
332
- "of", "about", "tell", "more", "please", "can", "you", "your", "according", "resume"
333
- ]);
334
-
335
- const extractQueryTerms = (queryText) => {
336
- const raw = (queryText || "").toLowerCase().match(/[a-z0-9_]+/g) || [];
337
- const deduped = [];
338
- const seen = new Set();
339
- for (const term of raw) {
340
- if (term.length < 3 || sourceStopwords.has(term) || seen.has(term)) continue;
341
- seen.add(term);
342
- deduped.push(term);
343
- }
344
- return deduped;
345
- };
346
-
347
- const escapeRegex = (value) => value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
348
-
349
- const highlightMatches = (text, terms) => {
350
- const plain = text || "";
351
- if (!terms.length) return escapeHtml(plain);
352
- const pattern = new RegExp(`\\b(${terms.map(escapeRegex).join("|")})\\b`, "gi");
353
- return escapeHtml(plain).replace(pattern, "<mark>$1</mark>");
354
- };
355
-
356
  const renderSourcesHtml = (sources, queryText = "") => {
357
  const vectorSources = Array.isArray(sources?.vector) ? sources.vector : [];
358
  const webSources = Array.isArray(sources?.web) ? sources.web : [];
359
- const terms = extractQueryTerms(queryText);
360
 
361
  const sections = [];
362
 
@@ -366,7 +364,7 @@
366
  const documentId = (src.document_id || "").toString().trim();
367
  const doc = escapeHtml(src.document || "Unknown document");
368
  const page = escapeHtml(src.page || "unknown");
369
- const excerptHtml = highlightMatches(src.excerpt || "", terms);
370
  const pageNumber = Number.parseInt(src.page || "", 10);
371
  const pageAnchor = Number.isFinite(pageNumber) && pageNumber > 0 ? `#page=${pageNumber}` : "";
372
  const pdfUrl = documentId ? `/documents/${encodeURIComponent(documentId)}/pdf${pageAnchor}` : "";
@@ -374,7 +372,9 @@
374
  <article class="source-card">
375
  <div class="source-meta">
376
  <span class="source-doc">${doc}</span>
377
- <span class="source-page">Page ${page}</span>
 
 
378
  </div>
379
  <p class="source-excerpt">${excerptHtml || "No excerpt available."}</p>
380
  ${
@@ -434,6 +434,87 @@
434
  container.appendChild(details);
435
  };
436
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
437
  const appendMessage = ({ role, text, markdown = false, pending = false, isError = false }) => {
438
  if (!chatThread) return null;
439
  const row = document.createElement("article");
@@ -454,6 +535,7 @@
454
  row.appendChild(bubble);
455
  chatThread.appendChild(row);
456
  chatThread.scrollTop = chatThread.scrollHeight;
 
457
  return bubble;
458
  };
459
 
@@ -493,6 +575,7 @@
493
  event.preventDefault();
494
  const response = await fetch("/logout", { method: "POST" });
495
  if (response.ok) {
 
496
  window.location.reload();
497
  }
498
  });
@@ -528,6 +611,7 @@
528
  uploadResult.classList.toggle("error", !response.ok);
529
  setBusy(uploadForm, false);
530
  if (response.ok) {
 
531
  window.location.reload();
532
  }
533
  });
@@ -562,34 +646,23 @@
562
  if (queryInput) queryInput.value = "";
563
  setBusy(askForm, true);
564
  const pendingBubble = appendMessage({ role: "assistant", text: "Thinking...", markdown: false, pending: true });
565
-
566
- const response = await fetch("/ask", {
567
- method: "POST",
568
- headers: {
569
- "Content-Type": "application/json",
570
- "X-Session-Id": currentSessionId
571
- },
572
- body: JSON.stringify({ query }),
573
- });
574
- const body = await safeJson(response);
575
  const target = pendingBubble || appendMessage({ role: "assistant", text: "", markdown: false });
576
  if (!target) {
577
  setBusy(askForm, false);
578
  return;
579
  }
580
- target.classList.remove("chat-pending");
581
-
582
- if (response.ok) {
583
- const answerText = body.answer || "Response received.";
584
- target.classList.add("chat-markdown");
585
- renderAssistantResponse(target, answerText, body.sources || null, query);
586
- } else {
587
- const message = prettyError(body);
588
  target.classList.add("chat-error");
589
  target.textContent = message;
 
 
 
 
590
  }
591
- chatThread.scrollTop = chatThread.scrollHeight;
592
- setBusy(askForm, false);
593
  });
594
 
595
  queryInput?.addEventListener("keydown", (event) => {
 
27
  <p class="lede">
28
  Upload PDFs, avoid duplicate reprocessing by file hash, and ask an agent that uses user-scoped document retrieval with optional web search.
29
  </p>
30
+ <p class="developer-credit">Developed by Baba Kattubadi</p>
31
  {% if db_unavailable %}
32
  <p class="db-warning">
33
  Database connection is temporarily unavailable. This is usually a transient DNS/network issue with the Supabase host. Please retry shortly.
 
35
  {% endif %}
36
  </section>
37
  {% else %}
 
 
 
 
 
 
 
 
38
  {% endif %}
39
 
40
  {% if not user %}
 
65
  <section class="app-layout">
66
  <aside class="card panel sidebar-panel">
67
  <div class="panel-head">
68
+ <p class="eyebrow">LangGraph Assignment</p>
69
+ <div class="sidebar-title-row">
70
+ <div>
71
+ <h1 class="sidebar-title">DocsQA Workspace</h1>
72
+ <p class="muted">Private document chat with structured sources.</p>
73
+ <p class="developer-credit">Developed by Baba Kattubadi</p>
74
+ </div>
75
+ <span class="badge">FastAPI + Supabase + PGVector</span>
76
+ </div>
77
+ </div>
78
+
79
+ <div class="panel-head account-head">
80
  <h2 class="user-email">{{ user.email }}</h2>
81
  <p class="muted">Your uploaded docs are private to this account.</p>
82
  </div>
 
153
  // Session management
154
  let currentSessionId = sessionStorage.getItem("chat_session_id") || `session_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
155
  sessionStorage.setItem("chat_session_id", currentSessionId);
156
+ const chatStorageKey = () => `chat_thread_${currentSessionId}`;
157
 
158
  const registerForm = document.getElementById("register-form");
159
  const loginForm = document.getElementById("login-form");
 
168
  const docDeleteButtons = document.querySelectorAll(".doc-delete-btn");
169
  const newChatBtn = document.getElementById("new-chat-btn");
170
 
171
+ const saveChatThread = () => {
172
+ if (!chatThread) return;
173
+ sessionStorage.setItem(chatStorageKey(), chatThread.innerHTML);
174
+ };
175
+
176
+ const restoreChatThread = () => {
177
+ if (!chatThread) return;
178
+ const savedThread = sessionStorage.getItem(chatStorageKey());
179
+ if (savedThread) {
180
+ chatThread.innerHTML = savedThread;
181
+ }
182
+ };
183
+
184
+ const resetChatThread = () => {
185
+ if (!chatThread) return;
186
+ chatThread.innerHTML = `
187
+ <article class="chat-msg assistant">
188
+ <div class="chat-bubble chat-bubble-assistant chat-markdown">
189
+ <p>Ask anything about your uploaded PDFs and I will answer with citations from retrieved chunks.</p>
190
+ </div>
191
+ </article>
192
+ `;
193
+ saveChatThread();
194
+ };
195
+
196
+ restoreChatThread();
197
+
198
  // New Chat button handler
199
  newChatBtn?.addEventListener("click", () => {
200
  currentSessionId = `session_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
201
  sessionStorage.setItem("chat_session_id", currentSessionId);
202
+ resetChatThread();
 
 
 
 
 
 
 
 
203
  });
204
 
205
  const safeJson = async (response) => {
 
352
  return "_No citations available for this turn._";
353
  };
354
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
355
  const renderSourcesHtml = (sources, queryText = "") => {
356
  const vectorSources = Array.isArray(sources?.vector) ? sources.vector : [];
357
  const webSources = Array.isArray(sources?.web) ? sources.web : [];
 
358
 
359
  const sections = [];
360
 
 
364
  const documentId = (src.document_id || "").toString().trim();
365
  const doc = escapeHtml(src.document || "Unknown document");
366
  const page = escapeHtml(src.page || "unknown");
367
+ const excerptHtml = escapeHtml(src.excerpt || "");
368
  const pageNumber = Number.parseInt(src.page || "", 10);
369
  const pageAnchor = Number.isFinite(pageNumber) && pageNumber > 0 ? `#page=${pageNumber}` : "";
370
  const pdfUrl = documentId ? `/documents/${encodeURIComponent(documentId)}/pdf${pageAnchor}` : "";
 
372
  <article class="source-card">
373
  <div class="source-meta">
374
  <span class="source-doc">${doc}</span>
375
+ <div class="source-meta-right">
376
+ <span class="source-page">Page ${page}</span>
377
+ </div>
378
  </div>
379
  <p class="source-excerpt">${excerptHtml || "No excerpt available."}</p>
380
  ${
 
434
  container.appendChild(details);
435
  };
436
 
437
+ const readStreamingAnswer = async ({ query, target }) => {
438
+ const response = await fetch("/ask/stream", {
439
+ method: "POST",
440
+ headers: {
441
+ "Content-Type": "application/json",
442
+ "X-Session-Id": currentSessionId
443
+ },
444
+ body: JSON.stringify({ query }),
445
+ });
446
+
447
+ if (!response.ok || !response.body) {
448
+ const body = await safeJson(response);
449
+ throw new Error(prettyError(body));
450
+ }
451
+
452
+ const reader = response.body.getReader();
453
+ const decoder = new TextDecoder();
454
+ let buffer = "";
455
+ let answerText = "";
456
+ let sources = null;
457
+
458
+ const processEvent = (rawEvent) => {
459
+ const lines = rawEvent.split("\n");
460
+ let eventName = "message";
461
+ const dataLines = [];
462
+
463
+ for (const line of lines) {
464
+ if (line.startsWith("event:")) {
465
+ eventName = line.slice(6).trim();
466
+ } else if (line.startsWith("data:")) {
467
+ dataLines.push(line.slice(5).trim());
468
+ }
469
+ }
470
+
471
+ if (!dataLines.length) return;
472
+ const payload = JSON.parse(dataLines.join("\n"));
473
+
474
+ if (eventName === "token") {
475
+ answerText += payload.content || "";
476
+ target.classList.remove("chat-pending");
477
+ target.classList.add("chat-markdown");
478
+ target.innerHTML = renderMarkdown(answerText || "Thinking...");
479
+ chatThread.scrollTop = chatThread.scrollHeight;
480
+ return;
481
+ }
482
+
483
+ if (eventName === "sources") {
484
+ sources = payload.sources || null;
485
+ return;
486
+ }
487
+
488
+ if (eventName === "done") {
489
+ answerText = payload.answer || answerText || "Response received.";
490
+ target.classList.remove("chat-pending");
491
+ target.classList.add("chat-markdown");
492
+ renderAssistantResponse(target, answerText, sources, query);
493
+ chatThread.scrollTop = chatThread.scrollHeight;
494
+ return;
495
+ }
496
+
497
+ if (eventName === "error") {
498
+ throw new Error(payload.detail || "Streaming failed.");
499
+ }
500
+ };
501
+
502
+ while (true) {
503
+ const { value, done } = await reader.read();
504
+ if (done) break;
505
+ buffer += decoder.decode(value, { stream: true });
506
+ const events = buffer.split("\n\n");
507
+ buffer = events.pop() || "";
508
+ for (const rawEvent of events) {
509
+ if (rawEvent.trim()) processEvent(rawEvent);
510
+ }
511
+ }
512
+
513
+ buffer += decoder.decode();
514
+ if (buffer.trim()) processEvent(buffer);
515
+ return { answer: answerText, sources };
516
+ };
517
+
518
  const appendMessage = ({ role, text, markdown = false, pending = false, isError = false }) => {
519
  if (!chatThread) return null;
520
  const row = document.createElement("article");
 
535
  row.appendChild(bubble);
536
  chatThread.appendChild(row);
537
  chatThread.scrollTop = chatThread.scrollHeight;
538
+ saveChatThread();
539
  return bubble;
540
  };
541
 
 
575
  event.preventDefault();
576
  const response = await fetch("/logout", { method: "POST" });
577
  if (response.ok) {
578
+ sessionStorage.removeItem(chatStorageKey());
579
  window.location.reload();
580
  }
581
  });
 
611
  uploadResult.classList.toggle("error", !response.ok);
612
  setBusy(uploadForm, false);
613
  if (response.ok) {
614
+ saveChatThread();
615
  window.location.reload();
616
  }
617
  });
 
646
  if (queryInput) queryInput.value = "";
647
  setBusy(askForm, true);
648
  const pendingBubble = appendMessage({ role: "assistant", text: "Thinking...", markdown: false, pending: true });
 
 
 
 
 
 
 
 
 
 
649
  const target = pendingBubble || appendMessage({ role: "assistant", text: "", markdown: false });
650
  if (!target) {
651
  setBusy(askForm, false);
652
  return;
653
  }
654
+ try {
655
+ await readStreamingAnswer({ query, target });
656
+ } catch (error) {
657
+ const message = error instanceof Error ? error.message : "Request failed.";
658
+ target.classList.remove("chat-pending");
 
 
 
659
  target.classList.add("chat-error");
660
  target.textContent = message;
661
+ } finally {
662
+ chatThread.scrollTop = chatThread.scrollHeight;
663
+ saveChatThread();
664
+ setBusy(askForm, false);
665
  }
 
 
666
  });
667
 
668
  queryInput?.addEventListener("keydown", (event) => {