Spaces:

KBaba7
/

DocsBot

Sleeping

App Files Files Community

BabaK07 commited on 17 days ago

Commit

d197c9d

1 Parent(s): aefb7b1

Polish retrieval workflow and UI

Browse files

Files changed (9) hide show

.env.example +4 -0
README.md +53 -15
app/config.py +4 -0
app/main.py +87 -14
app/services/agent.py +8 -12
app/services/document_service.py +8 -10
app/services/vector_store.py +162 -72
app/static/style.css +117 -28
app/templates/index.html +139 -66

.env.example CHANGED Viewed

@@ -14,5 +14,9 @@ EMBEDDING_DIMENSIONS=1024
 JINA_API_KEY=
 JINA_API_BASE=https://api.jina.ai/v1/embeddings
 JINA_EMBEDDING_MODEL=jina-embeddings-v3
 WEB_SEARCH_PROVIDER=duckduckgo
 TAVILY_API_KEY=

 JINA_API_KEY=
 JINA_API_BASE=https://api.jina.ai/v1/embeddings
 JINA_EMBEDDING_MODEL=jina-embeddings-v3
+JINA_RERANKER_API_BASE=https://api.jina.ai/v1/rerank
+JINA_RERANKER_MODEL=jina-reranker-v3
+RETRIEVAL_K=4
+RERANK_CANDIDATE_K=12
 WEB_SEARCH_PROVIDER=duckduckgo
 TAVILY_API_KEY=

README.md CHANGED Viewed

@@ -18,6 +18,7 @@ If the uploaded documents are not enough, the agent falls back to web search and
 - FastAPI + SQLAlchemy
 - LangGraph agent
 - Groq chat model
 - Supabase Postgres + `pgvector`
 - Railway deployment
@@ -29,23 +30,38 @@ Each chunk is stored with metadata (document, page number, chunk index) and embe
 At question time:
 1. LLM-based document filtering selects relevant documents from user's library
 2. Vector search retrieves relevant chunks from selected documents
-3. The agent answers from those chunks when possible
-4. If evidence is weak, the agent uses web search and cites external URLs
 ## Chunking Strategy
-- Chunk size: `1200`
-- Overlap: `200`
 Why this setup:
-- Long, structured documents need enough contiguous context.
-- Overlap helps avoid missing content around chunk boundaries.
-- It gives a practical quality/cost balance for retrieval.
 ## Retrieval Approach
-I use cosine similarity search in `pgvector` (no reranker yet).
-The top matches are turned into readable citations (document name + page + snippet), and those are shown per answer in the UI.
 ## Agent Routing Logic
@@ -71,6 +87,17 @@ Each turn stores/returns source metadata separately from the answer body.
 ## Conversation Memory
 Conversation history is maintained within session scope, so follow-ups like “tell me more about that” work as expected.
 ## Bonus Feature
@@ -88,22 +115,24 @@ I also implemented LLM-based document filtering:
 - The system sends all user documents (filename, summary, preview) to the LLM
 - LLM semantically analyzes and selects only truly relevant documents for the query
-- Returns 0 to N documents based on actual relevance (not forced to always return the max limit)
-- Fallback returns first N documents if LLM call fails
 ## Challenges I Ran Into
 1. Heavy embedding dependencies made deployment images too large.
-   - I switched to lightweight embeddings for deployment and added Jina API embedding support.
 2. Source rendering got messy across multiple chat turns.
    - I separated answer text from source payloads and extracted sources per turn.
 3. Intermittent DB DNS/pooler issues during deployment.
    - I improved connection handling and standardized Supabase transaction-pooler config.
 ## If I Had More Time
 - Add conversation history UI to display past chat sessions
-- Add reranking (cross-encoder) for better precision on long multi-doc queries
 - Add automated citation-faithfulness checks
 - Add Alembic migrations for cleaner schema evolution
 - Add stronger eval/observability for routing and retrieval quality
@@ -126,12 +155,16 @@ Required:
 - `GROQ_API_KEY`
 - `SECRET_KEY`
 - `DATABASE_URL`
-Embeddings (recommended):
 - `JINA_API_KEY`
 - `JINA_API_BASE` (default: `https://api.jina.ai/v1/embeddings`)
 - `JINA_EMBEDDING_MODEL` (default: `jina-embeddings-v3`)
 - `EMBEDDING_DIMENSIONS` (default: `1024`)
 Storage:
 - `STORAGE_BACKEND=local|supabase`
@@ -144,6 +177,10 @@ Web search:
 - `WEB_SEARCH_PROVIDER=duckduckgo|tavily`
 - `TAVILY_API_KEY` (if using Tavily)
 ## API Endpoints
 - `POST /register`
@@ -154,6 +191,7 @@ Web search:
 - `DELETE /documents/{document_id}`
 - `GET /documents/{document_id}/pdf`
 - `POST /ask`
 ## Sample Documents

 - FastAPI + SQLAlchemy
 - LangGraph agent
 - Groq chat model
+- Jina embeddings + Jina reranker
 - Supabase Postgres + `pgvector`
 - Railway deployment
 At question time:
 1. LLM-based document filtering selects relevant documents from user's library
 2. Vector search retrieves relevant chunks from selected documents
+3. Jina reranking reorders the retrieved chunks for better final relevance
+4. The agent answers from those chunks when possible
+5. If evidence is weak, the agent uses web search and cites external URLs
 ## Chunking Strategy
+- Splitter: LangChain `RecursiveCharacterTextSplitter`
+- Chunk size: `1000`
+- Overlap: `150`
 Why this setup:
+- It prefers breaking on paragraphs and sentence boundaries before falling back to smaller separators.
+- It preserves more coherent chunks for contracts, specs, and structured PDFs.
+- A smaller overlap keeps recall while reducing duplicated context in retrieval.
 ## Retrieval Approach
+I use cosine similarity search in `pgvector`, then apply Jina reranking for better final ordering.
+The system uses an LLM-based retrieval planner to choose:
+- the final number of chunks to keep
+- the candidate pool to rerank
+Those values are clamped to safe bounds before retrieval runs.
+The UI shows:
+- document name
+- page number
+- chunk excerpt
+for retrieved document sources.
 ## Agent Routing Logic
 ## Conversation Memory
 Conversation history is maintained within session scope, so follow-ups like “tell me more about that” work as expected.
+The frontend also preserves the visible chat thread per session, so upload-triggered page refreshes do not wipe the current conversation view.
+## Streaming UX
+Answers are streamed into the chat UI progressively.
+- the visible response is rendered chunk by chunk
+- source cards are attached after the answer completes
+- a slight pacing delay is added so the stream feels live to the user
+The streaming route is separate from the standard JSON `/ask` response path.
 ## Bonus Feature
 - The system sends all user documents (filename, summary, preview) to the LLM
 - LLM semantically analyzes and selects only truly relevant documents for the query
+- Returns a JSON array of relevant file hashes
+- It is not forced to return a capped number of documents
+- Fallback returns all candidate document hashes if the LLM call fails
 ## Challenges I Ran Into
 1. Heavy embedding dependencies made deployment images too large.
+   - I standardized on Jina API embeddings/reranking to keep the runtime lighter while preserving retrieval quality.
 2. Source rendering got messy across multiple chat turns.
    - I separated answer text from source payloads and extracted sources per turn.
 3. Intermittent DB DNS/pooler issues during deployment.
    - I improved connection handling and standardized Supabase transaction-pooler config.
+4. UI state was getting lost after document uploads.
+   - I persisted the active chat thread in session storage so the current conversation remains visible after refresh.
 ## If I Had More Time
 - Add conversation history UI to display past chat sessions
 - Add automated citation-faithfulness checks
 - Add Alembic migrations for cleaner schema evolution
 - Add stronger eval/observability for routing and retrieval quality
 - `GROQ_API_KEY`
 - `SECRET_KEY`
 - `DATABASE_URL`
 - `JINA_API_KEY`
+Embeddings:
 - `JINA_API_BASE` (default: `https://api.jina.ai/v1/embeddings`)
 - `JINA_EMBEDDING_MODEL` (default: `jina-embeddings-v3`)
+- `JINA_RERANKER_API_BASE` (default: `https://api.jina.ai/v1/rerank`)
+- `JINA_RERANKER_MODEL` (default: `jina-reranker-v3`)
 - `EMBEDDING_DIMENSIONS` (default: `1024`)
+- `RETRIEVAL_K` (default minimum final context size: `4`)
+- `RERANK_CANDIDATE_K` (default minimum rerank candidate pool: `12`)
 Storage:
 - `STORAGE_BACKEND=local|supabase`
 - `WEB_SEARCH_PROVIDER=duckduckgo|tavily`
 - `TAVILY_API_KEY` (if using Tavily)
+Auth:
+- `ACCESS_TOKEN_EXPIRE_MINUTES` (default: `720`)
+- For local development, lowering this can make login/logout testing easier
 ## API Endpoints
 - `POST /register`
 - `DELETE /documents/{document_id}`
 - `GET /documents/{document_id}/pdf`
 - `POST /ask`
+- `POST /ask/stream`
 ## Sample Documents

app/config.py CHANGED Viewed

@@ -24,6 +24,10 @@ class Settings(BaseSettings):
     jina_api_key: str | None = None
     jina_api_base: str = "https://api.jina.ai/v1/embeddings"
     jina_embedding_model: str = "jina-embeddings-v3"
     groq_api_key: str | None = None
     web_search_provider: str = "tavily"
     tavily_api_key: str | None = None

     jina_api_key: str | None = None
     jina_api_base: str = "https://api.jina.ai/v1/embeddings"
     jina_embedding_model: str = "jina-embeddings-v3"
+    jina_reranker_api_base: str = "https://api.jina.ai/v1/rerank"
+    jina_reranker_model: str = "jina-reranker-v3"
+    retrieval_k: int = 4
+    rerank_candidate_k: int = 12
     groq_api_key: str | None = None
     web_search_provider: str = "tavily"
     tavily_api_key: str | None = None

app/main.py CHANGED Viewed

@@ -1,4 +1,6 @@
 import re
 from typing import Any
 from fastapi import Cookie, Depends, FastAPI, File, Form, Header, HTTPException, Request, UploadFile, status
@@ -56,7 +58,10 @@ def _parse_vector_sources(tool_output: str) -> list[dict[str, str]]:
     current_page = ""
     for line in lines:
-        match = re.match(r"^\s*\d+\.\s+document_id=(.*?)\s+\|\s+document=(.*?)\s+\|\s+page=(.*?)\s+\|\s+distance=", line)
         if match:
             current_document_id = match.group(1).strip()
             current_doc = match.group(2).strip()
@@ -159,6 +164,24 @@ def _strip_sources_from_answer(answer: str) -> str:
     return "\n".join(filtered).strip()
 def get_current_user(
     access_token: str | None = Cookie(default=None),
     db: Session = Depends(get_db),
@@ -349,15 +372,8 @@ def ask_question(
 ):
     document_service.ensure_page_metadata_for_user(db=db, user=user)
     agent = build_agent(db=db, user=user)
-    # Use session ID from header if provided, otherwise fall back to access token or user ID
-    if x_session_id:
-        session_key = f"user:{user.id}:session:{x_session_id}"
-    else:
-        session_key = access_token or f"user:{user.id}"
-    config = {"configurable": {"thread_id": session_key}}
-    print(f"[Agent] thread_id: {session_key}")
     previous_messages: list[Any] = []
     try:
         state = agent.get_state(config)
@@ -374,9 +390,66 @@ def ask_question(
     answer = final_message if isinstance(final_message, str) else str(final_message)
     answer = _strip_sources_from_answer(answer)
     all_messages = result.get("messages", [])
-    if isinstance(all_messages, list) and len(all_messages) >= len(previous_messages):
-        current_turn_messages = all_messages[len(previous_messages):]
-    else:
-        current_turn_messages = all_messages if isinstance(all_messages, list) else []
     sources = _extract_sources_from_messages(current_turn_messages)
     return AskResponse(answer=answer, sources=sources)

+import json
 import re
+import time
 from typing import Any
 from fastapi import Cookie, Depends, FastAPI, File, Form, Header, HTTPException, Request, UploadFile, status
     current_page = ""
     for line in lines:
+        match = re.match(
+            r"^\s*\d+\.\s+document_id=(.*?)\s+\|\s+document=(.*?)\s+\|\s+page=(.*?)\s+\|\s+distance=.*?(?:\s+\|\s+rerank_score=(.*?))?\s*$",
+            line,
+        )
         if match:
             current_document_id = match.group(1).strip()
             current_doc = match.group(2).strip()
     return "\n".join(filtered).strip()
+def _get_current_turn_messages(*, previous_messages: list[Any], all_messages: list[Any]) -> list[Any]:
+    if len(all_messages) >= len(previous_messages):
+        return all_messages[len(previous_messages) :]
+    return all_messages
+def _build_agent_config(*, user: User, access_token: str | None, x_session_id: str | None) -> dict[str, Any]:
+    if x_session_id:
+        session_key = f"user:{user.id}:session:{x_session_id}"
+    else:
+        session_key = access_token or f"user:{user.id}"
+    return {"configurable": {"thread_id": session_key}}
+def _sse_event(event: str, data: dict[str, Any]) -> str:
+    return f"event: {event}\ndata: {json.dumps(data)}\n\n"
 def get_current_user(
     access_token: str | None = Cookie(default=None),
     db: Session = Depends(get_db),
 ):
     document_service.ensure_page_metadata_for_user(db=db, user=user)
     agent = build_agent(db=db, user=user)
+    config = _build_agent_config(user=user, access_token=access_token, x_session_id=x_session_id)
+    print(f"[Agent] thread_id: {config['configurable']['thread_id']}")
     previous_messages: list[Any] = []
     try:
         state = agent.get_state(config)
     answer = final_message if isinstance(final_message, str) else str(final_message)
     answer = _strip_sources_from_answer(answer)
     all_messages = result.get("messages", [])
+    current_turn_messages = _get_current_turn_messages(
+        previous_messages=previous_messages,
+        all_messages=all_messages if isinstance(all_messages, list) else [],
+    )
     sources = _extract_sources_from_messages(current_turn_messages)
     return AskResponse(answer=answer, sources=sources)
+@app.post("/ask/stream")
+def ask_question_stream(
+    payload: AskRequest,
+    db: Session = Depends(get_db),
+    user: User = Depends(get_current_user),
+    access_token: str | None = Cookie(default=None),
+    x_session_id: str | None = Header(default=None, alias="X-Session-Id"),
+):
+    document_service.ensure_page_metadata_for_user(db=db, user=user)
+    agent = build_agent(db=db, user=user)
+    config = _build_agent_config(user=user, access_token=access_token, x_session_id=x_session_id)
+    previous_messages: list[Any] = []
+    try:
+        state = agent.get_state(config)
+        values = getattr(state, "values", {}) or {}
+        maybe_messages = values.get("messages", [])
+        if isinstance(maybe_messages, list):
+            previous_messages = maybe_messages
+    except Exception:
+        previous_messages = []
+    def event_stream():
+        try:
+            result = agent.invoke({"messages": [("user", payload.query)]}, config=config)
+            all_messages = result.get("messages", [])
+            all_messages = all_messages if isinstance(all_messages, list) else []
+            current_turn_messages = _get_current_turn_messages(previous_messages=previous_messages, all_messages=all_messages)
+            if current_turn_messages:
+                final_message = current_turn_messages[-1].content
+                final_answer = final_message if isinstance(final_message, str) else str(final_message)
+                final_answer = _strip_sources_from_answer(final_answer)
+            else:
+                final_answer = ""
+            chunk_size = 24
+            for index in range(0, len(final_answer), chunk_size):
+                yield _sse_event("token", {"content": final_answer[index : index + chunk_size]})
+                time.sleep(0.03)
+            sources = _extract_sources_from_messages(current_turn_messages)
+            yield _sse_event("sources", {"sources": sources})
+            yield _sse_event("done", {"answer": final_answer})
+        except Exception as exc:
+            yield _sse_event("error", {"detail": str(exc)})
+    return StreamingResponse(
+        event_stream(),
+        media_type="text/event-stream",
+        headers={
+            "Cache-Control": "no-cache",
+            "Connection": "keep-alive",
+            "X-Accel-Buffering": "no",
+        },
+    )

app/services/agent.py CHANGED Viewed

@@ -18,10 +18,6 @@ from app.services.web_search import build_web_search_tool
 class VectorSearchInput(BaseModel):
     query: str = Field(..., description="The user question to answer from uploaded documents.")
-    file_hashes: list[str] | None = Field(
-        default=None,
-        description="Optional document hashes to filter search. Leave empty to auto-resolve relevant documents for the current user.",
-    )
 LANGGRAPH_CHECKPOINTER = MemorySaver()
@@ -43,11 +39,11 @@ def build_agent(*, db: Session, user: User):
     vector_store = VectorStoreService()
     web_search_tool = build_web_search_tool()
-    def vector_search(query: str, file_hashes: list[str] | None = None) -> str:
-        resolved_hashes = file_hashes or document_service.resolve_relevant_document_hashes(db, user=user, query=query)
         if not resolved_hashes:
             return "No uploaded documents are available for this user."
-        matches = vector_store.similarity_search(db=db, query=query, file_hashes=resolved_hashes, k=4)
         if not matches:
             return f"No vector matches found for hashes: {resolved_hashes}"
         lines = ["Vector evidence (cite document + page + excerpt in final answer):"]
@@ -55,9 +51,10 @@ def build_agent(*, db: Session, user: User):
             page_number = match["metadata"].get("page_number")
             page_label = str(page_number) if page_number is not None else "unknown"
             document_id = match["metadata"].get("document_id")
-            lines.append(
-                f"{index}. document_id={document_id} | document={match['metadata']['filename']} | page={page_label} | distance={match['distance']:.4f}"
-            )
             lines.append(f"   excerpt: {match['content'][:900].replace(chr(10), ' ')}")
         return "\n\n".join(lines)
@@ -66,8 +63,7 @@ def build_agent(*, db: Session, user: User):
         name="vector_search",
         description=(
             "Searches the current user's uploaded documents. "
-            "If file hashes are omitted, the tool first finds the most relevant document hashes from stored metadata and summary, "
-            "then applies those hashes as a vector-search filter."
         ),
         args_schema=VectorSearchInput,
     )

 class VectorSearchInput(BaseModel):
     query: str = Field(..., description="The user question to answer from uploaded documents.")
 LANGGRAPH_CHECKPOINTER = MemorySaver()
     vector_store = VectorStoreService()
     web_search_tool = build_web_search_tool()
+    def vector_search(query: str) -> str:
+        resolved_hashes = document_service.resolve_relevant_document_hashes(db, user=user, query=query)
         if not resolved_hashes:
             return "No uploaded documents are available for this user."
+        matches = vector_store.similarity_search(db=db, query=query, file_hashes=resolved_hashes, k=settings.retrieval_k)
         if not matches:
             return f"No vector matches found for hashes: {resolved_hashes}"
         lines = ["Vector evidence (cite document + page + excerpt in final answer):"]
             page_number = match["metadata"].get("page_number")
             page_label = str(page_number) if page_number is not None else "unknown"
             document_id = match["metadata"].get("document_id")
+            score_parts = [f"distance={match['distance']:.4f}"]
+            if "rerank_score" in match:
+                score_parts.append(f"rerank_score={match['rerank_score']:.4f}")
+            lines.append(f"{index}. document_id={document_id} | document={match['metadata']['filename']} | page={page_label} | {' | '.join(score_parts)}")
             lines.append(f"   excerpt: {match['content'][:900].replace(chr(10), ' ')}")
         return "\n\n".join(lines)
         name="vector_search",
         description=(
             "Searches the current user's uploaded documents. "
+            "The tool automatically resolves the most relevant documents for the current user before chunk retrieval."
         ),
         args_schema=VectorSearchInput,
     )

app/services/document_service.py CHANGED Viewed

@@ -131,17 +131,17 @@ class DocumentService:
             "deleted_shared_document": deleted_shared_document,
         }
-    def resolve_relevant_document_hashes(self, db: Session, *, user: User, query: str, limit: int = 5) -> list[str]:
         docs = self.list_user_documents(db, user)
         if not docs:
             return []
         # Send all documents to LLM for semantic matching
-        matched_hashes = self._llm_filter_documents(query=query, candidates=docs, limit=limit)
         print("Documents Matched ----->", matched_hashes)
         return matched_hashes
-    def _llm_filter_documents(self, *, query: str, candidates: list[Document], limit: int) -> list[str]:
         if not self.settings.groq_api_key or not candidates:
             return []
         if self.matcher_llm is None:
@@ -163,9 +163,8 @@ class DocumentService:
             "Consider semantic similarity, topic alignment, and document purpose.\n\n"
             "IMPORTANT: Only include documents that are actually relevant to answering the query.\n"
             "It's better to return fewer relevant documents than to include irrelevant ones.\n"
-            f"You may return anywhere from 0 to {limit} documents.\n\n"
-            "Return ONLY valid JSON with this exact schema:\n"
-            '{"file_hashes": ["<hash1>", "<hash2>", ...]}\n\n'
             f"User query: {query}\n\n"
             f"Available documents:\n{json.dumps(payload, ensure_ascii=True, indent=2)}"
         )
@@ -180,12 +179,11 @@ class DocumentService:
                 content = content.split("```")[1].split("```")[0].strip()
             data = json.loads(content)
-            hashes = data.get("file_hashes", [])
             valid = {item.get("file_hash", "") for item in payload}
-            return [value for value in hashes if isinstance(value, str) and value in valid][:limit]
         except Exception:
-            # Fallback: return first N documents
-            return [doc.file_hash for doc in candidates[:limit]]
     def ensure_page_metadata_for_user(self, *, db: Session, user: User) -> None:
         docs = self.list_user_documents(db, user)

             "deleted_shared_document": deleted_shared_document,
         }
+    def resolve_relevant_document_hashes(self, db: Session, *, user: User, query: str) -> list[str]:
         docs = self.list_user_documents(db, user)
         if not docs:
             return []
         # Send all documents to LLM for semantic matching
+        matched_hashes = self._llm_filter_documents(query=query, candidates=docs)
         print("Documents Matched ----->", matched_hashes)
         return matched_hashes
+    def _llm_filter_documents(self, *, query: str, candidates: list[Document]) -> list[str]:
         if not self.settings.groq_api_key or not candidates:
             return []
         if self.matcher_llm is None:
             "Consider semantic similarity, topic alignment, and document purpose.\n\n"
             "IMPORTANT: Only include documents that are actually relevant to answering the query.\n"
             "It's better to return fewer relevant documents than to include irrelevant ones.\n"
+            "Return ONLY a valid JSON array of relevant file hashes, for example:\n"
+            '["<hash1>", "<hash2>"]\n\n'
             f"User query: {query}\n\n"
             f"Available documents:\n{json.dumps(payload, ensure_ascii=True, indent=2)}"
         )
                 content = content.split("```")[1].split("```")[0].strip()
             data = json.loads(content)
+            hashes = data if isinstance(data, list) else []
             valid = {item.get("file_hash", "") for item in payload}
+            return [value for value in hashes if isinstance(value, str) and value in valid]
         except Exception:
+            return [doc.file_hash for doc in candidates]
     def ensure_page_metadata_for_user(self, *, db: Session, user: User) -> None:
         docs = self.list_user_documents(db, user)

app/services/vector_store.py CHANGED Viewed

@@ -1,71 +1,16 @@
-import hashlib
-import math
-import re
 from typing import Any
 import requests
-from sqlalchemy import delete, select
 from sqlalchemy.orm import Session
 from app.config import get_settings
 from app.models import DocumentChunk
-class SimpleTextSplitter:
-    def __init__(self, *, chunk_size: int, chunk_overlap: int) -> None:
-        self.chunk_size = chunk_size
-        self.chunk_overlap = chunk_overlap
-    def split_text(self, text: str) -> list[str]:
-        normalized = text.strip()
-        if not normalized:
-            return []
-        if len(normalized) <= self.chunk_size:
-            return [normalized]
-        chunks: list[str] = []
-        start = 0
-        step = max(1, self.chunk_size - self.chunk_overlap)
-        text_length = len(normalized)
-        while start < text_length:
-            end = min(text_length, start + self.chunk_size)
-            chunk = normalized[start:end].strip()
-            if chunk:
-                chunks.append(chunk)
-            if end >= text_length:
-                break
-            start += step
-        return chunks
-class LocalHashEmbeddings:
-    def __init__(self, dimensions: int) -> None:
-        self.dimensions = dimensions
-    def embed_documents(self, texts: list[str]) -> list[list[float]]:
-        return [self._embed_text(text) for text in texts]
-    def embed_query(self, text: str) -> list[float]:
-        return self._embed_text(text)
-    def _embed_text(self, text: str) -> list[float]:
-        vector = [0.0] * self.dimensions
-        tokens = re.findall(r"\w+", text.lower())
-        if not tokens:
-            return vector
-        for token in tokens:
-            digest = hashlib.sha256(token.encode("utf-8")).digest()
-            bucket = int.from_bytes(digest[:4], "big") % self.dimensions
-            sign = 1.0 if digest[4] % 2 == 0 else -1.0
-            vector[bucket] += sign
-        norm = math.sqrt(sum(value * value for value in vector))
-        if norm == 0:
-            return vector
-        return [value / norm for value in vector]
 class JinaEmbeddings:
     def __init__(self, *, api_key: str, base_url: str, model: str, dimensions: int) -> None:
         self.api_key = api_key
@@ -114,24 +59,161 @@ class JinaEmbeddings:
         return validated
 class VectorStoreService:
     def __init__(self) -> None:
-        self.splitter = SimpleTextSplitter(chunk_size=1200, chunk_overlap=200)
-        settings = get_settings()
-        if settings.jina_api_key:
-            self.embeddings = JinaEmbeddings(
-                api_key=settings.jina_api_key,
-                base_url=settings.jina_api_base,
-                model=settings.jina_embedding_model,
-                dimensions=settings.embedding_dimensions,
             )
-        else:
-            # Lightweight fallback when hosted embedding credentials are not configured.
-            self.embeddings = LocalHashEmbeddings(settings.embedding_dimensions)
     def _get_embeddings(self) -> Any:
         return self.embeddings
     def add_document(self, *, db: Session, document_id: int, file_hash: str, filename: str, pages: list[tuple[int, str]]) -> None:
         chunk_rows: list[tuple[int | None, str]] = []
         for page_number, page_text in pages:
@@ -163,6 +245,14 @@ class VectorStoreService:
     def similarity_search(self, *, db: Session, query: str, file_hashes: list[str], k: int = 4) -> list[dict[str, Any]]:
         if not file_hashes:
             return []
         query_embedding = self._get_embeddings().embed_query(query)
         stmt = (
             select(
@@ -176,7 +266,7 @@ class VectorStoreService:
             )
             .where(DocumentChunk.file_hash.in_(file_hashes))
             .order_by(DocumentChunk.embedding.cosine_distance(query_embedding))
-            .limit(k)
         )
         results = db.execute(stmt).all()
         matches: list[dict[str, Any]] = []
@@ -194,4 +284,4 @@ class VectorStoreService:
                     "distance": row.distance,
                 }
             )
-        return matches

+import json
 from typing import Any
 import requests
+from langchain_groq import ChatGroq
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+from sqlalchemy import delete, func, select
 from sqlalchemy.orm import Session
 from app.config import get_settings
 from app.models import DocumentChunk
 class JinaEmbeddings:
     def __init__(self, *, api_key: str, base_url: str, model: str, dimensions: int) -> None:
         self.api_key = api_key
         return validated
+class JinaReranker:
+    def __init__(self, *, api_key: str, base_url: str, model: str) -> None:
+        self.api_key = api_key
+        self.base_url = base_url
+        self.model = model
+    def rerank(self, *, query: str, documents: list[str], top_n: int) -> list[dict[str, Any]]:
+        if not documents:
+            return []
+        response = requests.post(
+            self.base_url,
+            headers={
+                "Content-Type": "application/json",
+                "Authorization": f"Bearer {self.api_key}",
+            },
+            json={
+                "model": self.model,
+                "query": query,
+                "top_n": top_n,
+                "documents": documents,
+                "return_documents": False,
+            },
+            timeout=60,
+        )
+        response.raise_for_status()
+        return response.json().get("results", [])
 class VectorStoreService:
     def __init__(self) -> None:
+        self.settings = get_settings()
+        if not self.settings.jina_api_key:
+            raise RuntimeError("JINA_API_KEY is required for document embedding and retrieval.")
+        self.splitter = RecursiveCharacterTextSplitter(
+            chunk_size=1000,
+            chunk_overlap=150,
+            separators=[
+                "\n\n",
+                "\n",
+                ". ",
+                "? ",
+                "! ",
+                "; ",
+                ", ",
+                " ",
+                "",
+            ],
+            keep_separator=True,
+        )
+        self.embeddings = JinaEmbeddings(
+            api_key=self.settings.jina_api_key,
+            base_url=self.settings.jina_api_base,
+            model=self.settings.jina_embedding_model,
+            dimensions=self.settings.embedding_dimensions,
+        )
+        self.retrieval_router = (
+            ChatGroq(
+                api_key=self.settings.groq_api_key,
+                model=self.settings.model_name,
+                temperature=0,
             )
+            if self.settings.groq_api_key
+            else None
+        )
+        self.reranker = JinaReranker(
+            api_key=self.settings.jina_api_key,
+            base_url=self.settings.jina_reranker_api_base,
+            model=self.settings.jina_reranker_model,
+        )
     def _get_embeddings(self) -> Any:
         return self.embeddings
+    def _choose_retrieval_sizes(
+        self,
+        *,
+        db: Session,
+        query: str,
+        file_hashes: list[str],
+        requested_k: int,
+    ) -> tuple[int, int]:
+        available_chunks = db.scalar(
+            select(func.count())
+            .select_from(DocumentChunk)
+            .where(DocumentChunk.file_hash.in_(file_hashes))
+        ) or 0
+        if available_chunks <= 0:
+            return 0, 0
+        if self.retrieval_router is None:
+            raise RuntimeError("GROQ_API_KEY is required for LLM-based retrieval size selection.")
+        prompt = (
+            "You are a retrieval planner for a RAG system.\n"
+            "Choose how many chunks to keep after reranking and how many vector candidates to send to the reranker.\n"
+            "Return only valid JSON with this exact schema:\n"
+            '{"final_k": 4, "candidate_k": 12}\n\n'
+            "Rules:\n"
+            f"- final_k must be between 1 and {min(8, available_chunks)}\n"
+            f"- candidate_k must be between final_k and {min(30, available_chunks)}\n"
+            "- candidate_k should usually be around 2x to 4x final_k\n"
+            "- Use larger values for broad, comparative, or synthesis-heavy queries\n"
+            "- Use smaller values for narrow fact lookup queries\n\n"
+            f"Query: {query}\n"
+            f"Selected documents: {len(file_hashes)}\n"
+            f"Available chunks: {available_chunks}\n"
+            f"Requested final_k hint: {requested_k}\n"
+            f"Configured minimum final_k: {self.settings.retrieval_k}\n"
+            f"Configured minimum candidate_k: {self.settings.rerank_candidate_k}\n"
+        )
+        response = self.retrieval_router.invoke(prompt)
+        content = response.content if isinstance(response.content, str) else str(response.content)
+        if "```json" in content:
+            content = content.split("```json", 1)[1].split("```", 1)[0].strip()
+        elif "```" in content:
+            content = content.split("```", 1)[1].split("```", 1)[0].strip()
+        data = json.loads(content)
+        final_k = int(data["final_k"])
+        candidate_k = int(data["candidate_k"])
+        final_k = max(1, min(final_k, available_chunks, 8))
+        candidate_floor = max(final_k, self.settings.rerank_candidate_k)
+        candidate_k = max(final_k, candidate_k)
+        candidate_k = min(max(candidate_floor, candidate_k), available_chunks, 30)
+        return final_k, candidate_k
+    def _rerank_matches(self, *, query: str, matches: list[dict[str, Any]], top_n: int) -> list[dict[str, Any]]:
+        if self.reranker is None or not matches:
+            return matches[:top_n]
+        try:
+            results = self.reranker.rerank(
+                query=query,
+                documents=[match["content"] for match in matches],
+                top_n=min(top_n, len(matches)),
+            )
+        except requests.RequestException:
+            return matches[:top_n]
+        reranked: list[dict[str, Any]] = []
+        for item in results:
+            index = item.get("index")
+            if not isinstance(index, int) or index < 0 or index >= len(matches):
+                continue
+            match = dict(matches[index])
+            score = item.get("relevance_score")
+            if isinstance(score, (int, float)):
+                match["rerank_score"] = float(score)
+            reranked.append(match)
+        return reranked or matches[:top_n]
     def add_document(self, *, db: Session, document_id: int, file_hash: str, filename: str, pages: list[tuple[int, str]]) -> None:
         chunk_rows: list[tuple[int | None, str]] = []
         for page_number, page_text in pages:
     def similarity_search(self, *, db: Session, query: str, file_hashes: list[str], k: int = 4) -> list[dict[str, Any]]:
         if not file_hashes:
             return []
+        final_k, candidate_k = self._choose_retrieval_sizes(
+            db=db,
+            query=query,
+            file_hashes=file_hashes,
+            requested_k=k,
+        )
+        if final_k == 0:
+            return []
         query_embedding = self._get_embeddings().embed_query(query)
         stmt = (
             select(
             )
             .where(DocumentChunk.file_hash.in_(file_hashes))
             .order_by(DocumentChunk.embedding.cosine_distance(query_embedding))
+            .limit(candidate_k)
         )
         results = db.execute(stmt).all()
         matches: list[dict[str, Any]] = []
                     "distance": row.distance,
                 }
             )
+        return self._rerank_matches(query=query, matches=matches, top_n=final_k)

app/static/style.css CHANGED Viewed

@@ -25,6 +25,7 @@
 body {
   margin: 0;
   min-height: 100vh;
   color: var(--ink-1);
   font-family: "Space Grotesk", "Helvetica Neue", sans-serif;
   background:
@@ -64,25 +65,14 @@ body {
   width: min(1460px, calc(100% - 2.5rem));
   margin: 0 auto;
   padding: 1.6rem 0 3.4rem;
 }
 .hero {
   margin-bottom: 1rem;
 }
-.workspace-strip {
-  margin-bottom: 0.95rem;
-  display: flex;
-  justify-content: space-between;
-  align-items: flex-start;
-  gap: 1rem;
-}
-.workspace-strip h1 {
-  font-size: clamp(1.65rem, 3.5vw, 2.2rem);
-  margin-top: 0.35rem;
-}
 .hero-topline {
   display: flex;
   align-items: center;
@@ -149,6 +139,14 @@ button {
   margin: 0.35rem 0 0;
 }
 .grid {
   display: grid;
   gap: 1rem;
@@ -210,23 +208,54 @@ button {
   grid-template-columns: minmax(360px, 430px) 1fr;
   gap: 1.2rem;
   align-items: start;
 }
 .sidebar-panel {
   position: sticky;
-  top: 1rem;
   max-height: calc(100vh - 2rem);
-  overflow: auto;
   gap: 1rem;
   padding: 1.55rem;
 }
 .sidebar-docs {
-  max-height: 52vh;
   overflow: auto;
   padding-right: 0.25rem;
 }
 .user-email {
   overflow-wrap: anywhere;
 }
@@ -334,11 +363,16 @@ button.danger {
 .chat-shell {
   gap: 0.75rem;
 }
 .chat-panel {
-  min-height: 78vh;
   padding: 1.55rem;
 }
 .chat-thread {
@@ -346,8 +380,8 @@ button.danger {
   background: rgba(255, 250, 243, 0.7);
   border-radius: 14px;
   padding: 0.9rem;
-  min-height: 56vh;
-  max-height: 68vh;
   overflow-y: auto;
   display: grid;
   gap: 0.65rem;
@@ -403,10 +437,11 @@ button.danger {
   padding-top: 0.35rem;
   border-top: 1px solid var(--line);
   background: linear-gradient(180deg, rgba(255, 252, 247, 0), rgba(255, 252, 247, 0.92) 35%);
 }
 .chat-composer textarea {
-  min-height: 118px;
   font-size: 1rem;
 }
@@ -420,6 +455,24 @@ button.danger {
   margin-top: 0;
 }
 .chat-markdown p:last-child {
   margin-bottom: 0;
 }
@@ -594,14 +647,26 @@ button.danger {
 .doc-head {
   display: flex;
   justify-content: space-between;
-  align-items: center;
   gap: 0.8rem;
 }
 .doc-pages {
   color: #684b29;
   font-size: 0.8rem;
-  padding: 0.2rem 0.55rem;
   border-radius: 999px;
   border: 1px solid rgba(178, 74, 0, 0.24);
   background: rgba(255, 206, 140, 0.3);
@@ -633,10 +698,32 @@ code {
   }
 }
-@media (max-width: 720px) {
   .shell {
     width: min(1120px, calc(100% - 1rem));
     padding-top: 0.95rem;
   }
   .toolbar {
@@ -653,24 +740,26 @@ code {
   .app-layout {
     grid-template-columns: 1fr;
-  }
-  .workspace-strip {
-    flex-direction: column;
-    align-items: flex-start;
   }
   .sidebar-panel {
     position: static;
     max-height: none;
   }
   .sidebar-docs {
     max-height: none;
   }
   .chat-panel {
     min-height: 68vh;
   }
   .chat-thread {

 body {
   margin: 0;
   min-height: 100vh;
+  overflow: hidden;
   color: var(--ink-1);
   font-family: "Space Grotesk", "Helvetica Neue", sans-serif;
   background:
   width: min(1460px, calc(100% - 2.5rem));
   margin: 0 auto;
   padding: 1.6rem 0 3.4rem;
+  height: 100vh;
+  overflow: hidden;
 }
 .hero {
   margin-bottom: 1rem;
 }
 .hero-topline {
   display: flex;
   align-items: center;
   margin: 0.35rem 0 0;
 }
+.developer-credit {
+  margin: 0.45rem 0 0;
+  color: #7b4a22;
+  font-size: 0.84rem;
+  font-weight: 600;
+  letter-spacing: 0.02em;
+}
 .grid {
   display: grid;
   gap: 1rem;
   grid-template-columns: minmax(360px, 430px) 1fr;
   gap: 1.2rem;
   align-items: start;
+  margin-top: 0.35rem;
+  height: calc(100vh - 2rem);
 }
 .sidebar-panel {
   position: sticky;
+  top: 0.6rem;
+  height: calc(100vh - 2rem);
   max-height: calc(100vh - 2rem);
+  min-height: 0;
+  overflow: hidden;
   gap: 1rem;
   padding: 1.55rem;
+  display: flex;
+  flex-direction: column;
+}
+.sidebar-title-row {
+  display: flex;
+  justify-content: space-between;
+  align-items: flex-start;
+  gap: 0.9rem;
+  flex-wrap: wrap;
+}
+.sidebar-title {
+  font-size: clamp(1.6rem, 3vw, 2.25rem);
+  margin-top: 0.25rem;
+  line-height: 1.05;
+}
+.account-head {
+  padding-top: 0.2rem;
+  border-top: 1px solid var(--line);
 }
 .sidebar-docs {
+  flex: 1 1 auto;
+  min-height: 0;
   overflow: auto;
   padding-right: 0.25rem;
 }
+#logout-form {
+  margin-top: auto;
+  flex-shrink: 0;
+}
 .user-email {
   overflow-wrap: anywhere;
 }
 .chat-shell {
   gap: 0.75rem;
+  height: 100%;
+  min-height: 0;
 }
 .chat-panel {
+  min-height: 0;
+  height: 100%;
   padding: 1.55rem;
+  display: flex;
+  flex-direction: column;
 }
 .chat-thread {
   background: rgba(255, 250, 243, 0.7);
   border-radius: 14px;
   padding: 0.9rem;
+  min-height: 0;
+  flex: 1 1 auto;
   overflow-y: auto;
   display: grid;
   gap: 0.65rem;
   padding-top: 0.35rem;
   border-top: 1px solid var(--line);
   background: linear-gradient(180deg, rgba(255, 252, 247, 0), rgba(255, 252, 247, 0.92) 35%);
+  flex-shrink: 0;
 }
 .chat-composer textarea {
+  min-height: 104px;
   font-size: 1rem;
 }
   margin-top: 0;
 }
+.source-meta-right {
+  display: inline-flex;
+  align-items: center;
+  gap: 0.45rem;
+  flex-wrap: wrap;
+  justify-content: flex-end;
+}
+.source-score {
+  border: 1px solid rgba(178, 74, 0, 0.22);
+  background: linear-gradient(135deg, rgba(255, 187, 107, 0.24), rgba(255, 123, 0, 0.12));
+  color: #8d3600;
+  padding: 0.18rem 0.5rem;
+  border-radius: 999px;
+  font-size: 0.74rem;
+  font-weight: 600;
+}
 .chat-markdown p:last-child {
   margin-bottom: 0;
 }
 .doc-head {
   display: flex;
   justify-content: space-between;
+  align-items: flex-start;
   gap: 0.8rem;
 }
+.doc-head h3 {
+  flex: 1 1 auto;
+  min-width: 0;
+}
 .doc-pages {
+  display: inline-flex;
+  align-items: center;
+  justify-content: center;
   color: #684b29;
   font-size: 0.8rem;
+  min-width: max-content;
+  padding: 0.28rem 0.8rem;
+  line-height: 1.2;
+  white-space: nowrap;
+  flex-shrink: 0;
   border-radius: 999px;
   border: 1px solid rgba(178, 74, 0, 0.24);
   background: rgba(255, 206, 140, 0.3);
   }
 }
+@media (max-width: 1120px) {
+  .app-layout {
+    grid-template-columns: minmax(320px, 380px) 1fr;
+  }
+  .sidebar-title-row {
+    flex-direction: column;
+    align-items: flex-start;
+    gap: 0.55rem;
+  }
+  .sidebar-title {
+    font-size: clamp(1.4rem, 5vw, 2rem);
+  }
+  .badge {
+    white-space: normal;
+  }
+}
+@media (max-width: 820px) {
   .shell {
     width: min(1120px, calc(100% - 1rem));
     padding-top: 0.95rem;
+    height: auto;
+    overflow: visible;
   }
   .toolbar {
   .app-layout {
     grid-template-columns: 1fr;
+    height: auto;
   }
   .sidebar-panel {
     position: static;
+    height: auto;
     max-height: none;
+    min-height: auto;
+    overflow: visible;
   }
   .sidebar-docs {
+    flex: initial;
     max-height: none;
+    min-height: 0;
   }
   .chat-panel {
     min-height: 68vh;
+    height: auto;
   }
   .chat-thread {

app/templates/index.html CHANGED Viewed

@@ -27,6 +27,7 @@
         <p class="lede">
           Upload PDFs, avoid duplicate reprocessing by file hash, and ask an agent that uses user-scoped document retrieval with optional web search.
         </p>
         {% if db_unavailable %}
         <p class="db-warning">
           Database connection is temporarily unavailable. This is usually a transient DNS/network issue with the Supabase host. Please retry shortly.
@@ -34,14 +35,6 @@
         {% endif %}
       </section>
       {% else %}
-      <section class="workspace-strip card">
-        <div>
-          <p class="eyebrow">LangGraph Assignment</p>
-          <h1>DocsQA Workspace</h1>
-          <p class="muted">Private document chat with structured sources.</p>
-        </div>
-        <span class="badge">FastAPI + Supabase + PGVector</span>
-      </section>
       {% endif %}
       {% if not user %}
@@ -72,6 +65,18 @@
       <section class="app-layout">
         <aside class="card panel sidebar-panel">
           <div class="panel-head">
             <h2 class="user-email">{{ user.email }}</h2>
             <p class="muted">Your uploaded docs are private to this account.</p>
           </div>
@@ -148,6 +153,7 @@
       // Session management
       let currentSessionId = sessionStorage.getItem("chat_session_id") || `session_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
       sessionStorage.setItem("chat_session_id", currentSessionId);
       const registerForm = document.getElementById("register-form");
       const loginForm = document.getElementById("login-form");
@@ -162,19 +168,38 @@
       const docDeleteButtons = document.querySelectorAll(".doc-delete-btn");
       const newChatBtn = document.getElementById("new-chat-btn");
       // New Chat button handler
       newChatBtn?.addEventListener("click", () => {
         currentSessionId = `session_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
         sessionStorage.setItem("chat_session_id", currentSessionId);
-        if (chatThread) {
-          chatThread.innerHTML = `
-            <article class="chat-msg assistant">
-              <div class="chat-bubble chat-bubble-assistant chat-markdown">
-                <p>Ask anything about your uploaded PDFs and I will answer with citations from retrieved chunks.</p>
-              </div>
-            </article>
-          `;
-        }
       });
       const safeJson = async (response) => {
@@ -327,36 +352,9 @@
         return "_No citations available for this turn._";
       };
-      const sourceStopwords = new Set([
-        "the", "and", "for", "with", "from", "that", "this", "what", "who", "how", "are", "was", "were", "is",
-        "of", "about", "tell", "more", "please", "can", "you", "your", "according", "resume"
-      ]);
-      const extractQueryTerms = (queryText) => {
-        const raw = (queryText || "").toLowerCase().match(/[a-z0-9_]+/g) || [];
-        const deduped = [];
-        const seen = new Set();
-        for (const term of raw) {
-          if (term.length < 3 || sourceStopwords.has(term) || seen.has(term)) continue;
-          seen.add(term);
-          deduped.push(term);
-        }
-        return deduped;
-      };
-      const escapeRegex = (value) => value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
-      const highlightMatches = (text, terms) => {
-        const plain = text || "";
-        if (!terms.length) return escapeHtml(plain);
-        const pattern = new RegExp(`\\b(${terms.map(escapeRegex).join("|")})\\b`, "gi");
-        return escapeHtml(plain).replace(pattern, "<mark>$1</mark>");
-      };
       const renderSourcesHtml = (sources, queryText = "") => {
         const vectorSources = Array.isArray(sources?.vector) ? sources.vector : [];
         const webSources = Array.isArray(sources?.web) ? sources.web : [];
-        const terms = extractQueryTerms(queryText);
         const sections = [];
@@ -366,7 +364,7 @@
               const documentId = (src.document_id || "").toString().trim();
               const doc = escapeHtml(src.document || "Unknown document");
               const page = escapeHtml(src.page || "unknown");
-              const excerptHtml = highlightMatches(src.excerpt || "", terms);
               const pageNumber = Number.parseInt(src.page || "", 10);
               const pageAnchor = Number.isFinite(pageNumber) && pageNumber > 0 ? `#page=${pageNumber}` : "";
               const pdfUrl = documentId ? `/documents/${encodeURIComponent(documentId)}/pdf${pageAnchor}` : "";
@@ -374,7 +372,9 @@
                 <article class="source-card">
                   <div class="source-meta">
                     <span class="source-doc">${doc}</span>
-                    <span class="source-page">Page ${page}</span>
                   </div>
                   <p class="source-excerpt">${excerptHtml || "No excerpt available."}</p>
                   ${
@@ -434,6 +434,87 @@
         container.appendChild(details);
       };
       const appendMessage = ({ role, text, markdown = false, pending = false, isError = false }) => {
         if (!chatThread) return null;
         const row = document.createElement("article");
@@ -454,6 +535,7 @@
         row.appendChild(bubble);
         chatThread.appendChild(row);
         chatThread.scrollTop = chatThread.scrollHeight;
         return bubble;
       };
@@ -493,6 +575,7 @@
         event.preventDefault();
         const response = await fetch("/logout", { method: "POST" });
         if (response.ok) {
           window.location.reload();
         }
       });
@@ -528,6 +611,7 @@
         uploadResult.classList.toggle("error", !response.ok);
         setBusy(uploadForm, false);
         if (response.ok) {
           window.location.reload();
         }
       });
@@ -562,34 +646,23 @@
         if (queryInput) queryInput.value = "";
         setBusy(askForm, true);
         const pendingBubble = appendMessage({ role: "assistant", text: "Thinking...", markdown: false, pending: true });
-        const response = await fetch("/ask", {
-          method: "POST",
-          headers: {
-            "Content-Type": "application/json",
-            "X-Session-Id": currentSessionId
-          },
-          body: JSON.stringify({ query }),
-        });
-        const body = await safeJson(response);
         const target = pendingBubble || appendMessage({ role: "assistant", text: "", markdown: false });
         if (!target) {
           setBusy(askForm, false);
           return;
         }
-        target.classList.remove("chat-pending");
-        if (response.ok) {
-          const answerText = body.answer || "Response received.";
-          target.classList.add("chat-markdown");
-          renderAssistantResponse(target, answerText, body.sources || null, query);
-        } else {
-          const message = prettyError(body);
           target.classList.add("chat-error");
           target.textContent = message;
         }
-        chatThread.scrollTop = chatThread.scrollHeight;
-        setBusy(askForm, false);
       });
       queryInput?.addEventListener("keydown", (event) => {

         <p class="lede">
           Upload PDFs, avoid duplicate reprocessing by file hash, and ask an agent that uses user-scoped document retrieval with optional web search.
         </p>
+        <p class="developer-credit">Developed by Baba Kattubadi</p>
         {% if db_unavailable %}
         <p class="db-warning">
           Database connection is temporarily unavailable. This is usually a transient DNS/network issue with the Supabase host. Please retry shortly.
         {% endif %}
       </section>
       {% else %}
       {% endif %}
       {% if not user %}
       <section class="app-layout">
         <aside class="card panel sidebar-panel">
           <div class="panel-head">
+            <p class="eyebrow">LangGraph Assignment</p>
+            <div class="sidebar-title-row">
+              <div>
+                <h1 class="sidebar-title">DocsQA Workspace</h1>
+                <p class="muted">Private document chat with structured sources.</p>
+                <p class="developer-credit">Developed by Baba Kattubadi</p>
+              </div>
+              <span class="badge">FastAPI + Supabase + PGVector</span>
+            </div>
+          </div>
+          <div class="panel-head account-head">
             <h2 class="user-email">{{ user.email }}</h2>
             <p class="muted">Your uploaded docs are private to this account.</p>
           </div>
       // Session management
       let currentSessionId = sessionStorage.getItem("chat_session_id") || `session_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
       sessionStorage.setItem("chat_session_id", currentSessionId);
+      const chatStorageKey = () => `chat_thread_${currentSessionId}`;
       const registerForm = document.getElementById("register-form");
       const loginForm = document.getElementById("login-form");
       const docDeleteButtons = document.querySelectorAll(".doc-delete-btn");
       const newChatBtn = document.getElementById("new-chat-btn");
+      const saveChatThread = () => {
+        if (!chatThread) return;
+        sessionStorage.setItem(chatStorageKey(), chatThread.innerHTML);
+      };
+      const restoreChatThread = () => {
+        if (!chatThread) return;
+        const savedThread = sessionStorage.getItem(chatStorageKey());
+        if (savedThread) {
+          chatThread.innerHTML = savedThread;
+        }
+      };
+      const resetChatThread = () => {
+        if (!chatThread) return;
+        chatThread.innerHTML = `
+          <article class="chat-msg assistant">
+            <div class="chat-bubble chat-bubble-assistant chat-markdown">
+              <p>Ask anything about your uploaded PDFs and I will answer with citations from retrieved chunks.</p>
+            </div>
+          </article>
+        `;
+        saveChatThread();
+      };
+      restoreChatThread();
       // New Chat button handler
       newChatBtn?.addEventListener("click", () => {
         currentSessionId = `session_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
         sessionStorage.setItem("chat_session_id", currentSessionId);
+        resetChatThread();
       });
       const safeJson = async (response) => {
         return "_No citations available for this turn._";
       };
       const renderSourcesHtml = (sources, queryText = "") => {
         const vectorSources = Array.isArray(sources?.vector) ? sources.vector : [];
         const webSources = Array.isArray(sources?.web) ? sources.web : [];
         const sections = [];
               const documentId = (src.document_id || "").toString().trim();
               const doc = escapeHtml(src.document || "Unknown document");
               const page = escapeHtml(src.page || "unknown");
+              const excerptHtml = escapeHtml(src.excerpt || "");
               const pageNumber = Number.parseInt(src.page || "", 10);
               const pageAnchor = Number.isFinite(pageNumber) && pageNumber > 0 ? `#page=${pageNumber}` : "";
               const pdfUrl = documentId ? `/documents/${encodeURIComponent(documentId)}/pdf${pageAnchor}` : "";
                 <article class="source-card">
                   <div class="source-meta">
                     <span class="source-doc">${doc}</span>
+                    <div class="source-meta-right">
+                      <span class="source-page">Page ${page}</span>
+                    </div>
                   </div>
                   <p class="source-excerpt">${excerptHtml || "No excerpt available."}</p>
                   ${
         container.appendChild(details);
       };
+      const readStreamingAnswer = async ({ query, target }) => {
+        const response = await fetch("/ask/stream", {
+          method: "POST",
+          headers: {
+            "Content-Type": "application/json",
+            "X-Session-Id": currentSessionId
+          },
+          body: JSON.stringify({ query }),
+        });
+        if (!response.ok || !response.body) {
+          const body = await safeJson(response);
+          throw new Error(prettyError(body));
+        }
+        const reader = response.body.getReader();
+        const decoder = new TextDecoder();
+        let buffer = "";
+        let answerText = "";
+        let sources = null;
+        const processEvent = (rawEvent) => {
+          const lines = rawEvent.split("\n");
+          let eventName = "message";
+          const dataLines = [];
+          for (const line of lines) {
+            if (line.startsWith("event:")) {
+              eventName = line.slice(6).trim();
+            } else if (line.startsWith("data:")) {
+              dataLines.push(line.slice(5).trim());
+            }
+          }
+          if (!dataLines.length) return;
+          const payload = JSON.parse(dataLines.join("\n"));
+          if (eventName === "token") {
+            answerText += payload.content || "";
+            target.classList.remove("chat-pending");
+            target.classList.add("chat-markdown");
+            target.innerHTML = renderMarkdown(answerText || "Thinking...");
+            chatThread.scrollTop = chatThread.scrollHeight;
+            return;
+          }
+          if (eventName === "sources") {
+            sources = payload.sources || null;
+            return;
+          }
+          if (eventName === "done") {
+            answerText = payload.answer || answerText || "Response received.";
+            target.classList.remove("chat-pending");
+            target.classList.add("chat-markdown");
+            renderAssistantResponse(target, answerText, sources, query);
+            chatThread.scrollTop = chatThread.scrollHeight;
+            return;
+          }
+          if (eventName === "error") {
+            throw new Error(payload.detail || "Streaming failed.");
+          }
+        };
+        while (true) {
+          const { value, done } = await reader.read();
+          if (done) break;
+          buffer += decoder.decode(value, { stream: true });
+          const events = buffer.split("\n\n");
+          buffer = events.pop() || "";
+          for (const rawEvent of events) {
+            if (rawEvent.trim()) processEvent(rawEvent);
+          }
+        }
+        buffer += decoder.decode();
+        if (buffer.trim()) processEvent(buffer);
+        return { answer: answerText, sources };
+      };
       const appendMessage = ({ role, text, markdown = false, pending = false, isError = false }) => {
         if (!chatThread) return null;
         const row = document.createElement("article");
         row.appendChild(bubble);
         chatThread.appendChild(row);
         chatThread.scrollTop = chatThread.scrollHeight;
+        saveChatThread();
         return bubble;
       };
         event.preventDefault();
         const response = await fetch("/logout", { method: "POST" });
         if (response.ok) {
+          sessionStorage.removeItem(chatStorageKey());
           window.location.reload();
         }
       });
         uploadResult.classList.toggle("error", !response.ok);
         setBusy(uploadForm, false);
         if (response.ok) {
+          saveChatThread();
           window.location.reload();
         }
       });
         if (queryInput) queryInput.value = "";
         setBusy(askForm, true);
         const pendingBubble = appendMessage({ role: "assistant", text: "Thinking...", markdown: false, pending: true });
         const target = pendingBubble || appendMessage({ role: "assistant", text: "", markdown: false });
         if (!target) {
           setBusy(askForm, false);
           return;
         }
+        try {
+          await readStreamingAnswer({ query, target });
+        } catch (error) {
+          const message = error instanceof Error ? error.message : "Request failed.";
+          target.classList.remove("chat-pending");
           target.classList.add("chat-error");
           target.textContent = message;
+        } finally {
+          chatThread.scrollTop = chatThread.scrollHeight;
+          saveChatThread();
+          setBusy(askForm, false);
         }
       });
       queryInput?.addEventListener("keydown", (event) => {