Spaces:

Shouvik99
/

LifeGuide

Running

App Files Files Community

Shouvik Choudhury commited on 20 days ago

Commit

fa3aafe

unverified ·

2 Parent(s): 3611fcd 26a5301

Merge pull request #4 from Shouvik599/feature-multi-turn-converse

Browse files

Files changed (4) hide show

README.md +102 -18
app.py +127 -42
frontend/index.html +302 -437
rag_chain.py +178 -198

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ pinned: false
 # 🕊️ Sacred Texts RAG — Multi-Religion Knowledge Base
-A Retrieval-Augmented Generation (RAG) application that answers spiritual queries using Bhagavad Gita, Quran, Bible and the Guru Granth Sahib as the sole knowledge sources.
 ---
@@ -22,10 +22,10 @@ sacred-texts-rag/
 ├── requirements.txt
 ├── .env.example
 ├── ingest.py               # Step 1: Load PDFs → chunk → embed → store
-├── rag_chain.py            # Core RAG chain logic
 ├── app.py                  # FastAPI backend server
 └── frontend/
-    └── index.html          # Chat UI (open in browser)
 ```
 ---
@@ -49,7 +49,7 @@ Place your PDF files in a `books/` folder:
 books/
 ├── bhagavad_gita.pdf
 ├── quran.pdf
-└── bible.pdf
 └── guru_granth_sahib.pdf
 ```
@@ -67,20 +67,24 @@ This will:
 ```bash
 python app.py
 ```
-Server runs at: `http://localhost:8000`
 ### 6. Open the Frontend
-Open `frontend/index.html` in your browser — no server needed for the UI.
 ---
 ## 🔑 Environment Variables
-| Variable | Description |
-|---|---|
-| `NVIDIA_API_KEY` | Your NVIDIA API key |
-| `CHROMA_DB_PATH` | Path to ChromaDB storage (default: `./chroma_db`) |
-| `CHUNKS_PER_BOOK` | Number of chunks to retrieve per query (default: `3`) |
 ---
@@ -90,30 +94,110 @@ Open `frontend/index.html` in your browser — no server needed for the UI.
 User Query
     │
     ▼
-[Embedding Model]  ←── NVIDIA llama-nemotron-embed-vl-1b-v2
     │
     ▼
-[ChromaDB Vector Store]  ←── Semantic similarity search
-    │  (retrieves top-K chunks from Gita, Quran, Bible, and the Guru Granth Sahib)
     │
     ▼
-[Prompt with Context]
     │
     ▼
 [Llama-3.3-70b-instruct]  ←── Answer grounded ONLY in retrieved texts
     │
     ▼
-Response with source citations (book + chapter/verse)
 ```
 ---
 ## 📝 Notes
 - The LLM is instructed **never** to answer from outside the provided texts
-- Each response includes **source citations** (which book the answer came from)
 - Responses synthesize wisdom **across all books** when relevant
 ## 🎬 Demo
-App Link : https://shouvik99-lifeguide.hf.space/

 # 🕊️ Sacred Texts RAG — Multi-Religion Knowledge Base
+A Retrieval-Augmented Generation (RAG) application that answers spiritual queries using the Bhagavad Gita, Quran, Bible, and Guru Granth Sahib as the sole knowledge sources. Now with **multi-turn conversation memory** — ask follow-up questions naturally, just like a real dialogue.
 ---
 ├── requirements.txt
 ├── .env.example
 ├── ingest.py               # Step 1: Load PDFs → chunk → embed → store
+├── rag_chain.py            # Core RAG chain logic (with session memory)
 ├── app.py                  # FastAPI backend server
 └── frontend/
+    └── index.html          # Chat UI (served by FastAPI)
 ```
 ---
 books/
 ├── bhagavad_gita.pdf
 ├── quran.pdf
+├── bible.pdf
 └── guru_granth_sahib.pdf
 ```
 ```bash
 python app.py
 ```
+Server runs at: `http://localhost:7860`
 ### 6. Open the Frontend
+Navigate to `http://localhost:7860` in your browser — the FastAPI server serves the UI directly.
 ---
 ## 🔑 Environment Variables
+| Variable | Description | Default |
+|---|---|---|
+| `NVIDIA_API_KEY` | Your NVIDIA API key | — |
+| `CHROMA_DB_PATH` | Path to ChromaDB storage | `./chroma_db` |
+| `COLLECTION_NAME` | ChromaDB collection name | `sacred_texts` |
+| `CHUNKS_PER_BOOK` | Chunks retrieved per book per query | `3` |
+| `MAX_HISTORY_TURNS` | Max conversation turns kept in memory per session | `6` |
+| `HOST` | Server bind host | `0.0.0.0` |
+| `PORT` | Server port | `7860` |
 ---
 User Query
     │
     ▼
+[Session Memory]  ←── Injects prior conversation turns into LLM context
     │
     ▼
+[Query Augmentation]  ←── Short follow-ups are enriched with previous question
     │
     ▼
+[Hybrid Retrieval: BM25 + Vector Search]  ←── Per-book guaranteed slots
+    │
+    ▼
+[NVIDIA Reranker]  ←── llama-3.2-nv-rerankqa-1b-v2 re-scores pooled candidates
+    │
+    ▼
+[Semantic Cache Check]  ←── Skip LLM if a similar question was answered before
+    │
+    ▼
+[Prompt with Context + History]
     │
     ▼
 [Llama-3.3-70b-instruct]  ←── Answer grounded ONLY in retrieved texts
     │
     ▼
+Streamed response with source citations (book + chapter/verse)
+```
+---
+## 💬 Multi-Turn Conversation
+The app maintains per-session conversation history so you can ask natural follow-up questions:
+```
+You:  "What do the scriptures say about forgiveness?"
+AI:   [Answer citing Gita, Quran, Bible, Guru Granth Sahib]
+You:  "Elaborate on the second point"       ← follow-up, no context needed
+AI:   [Continues from previous answer]
+You:  "What does the Bible say specifically?"  ← drill-down
+AI:   [Focuses on Bible passages from the thread]
+```
+**How sessions work:**
+- A session ID is created automatically on your first question and persisted in the browser's `localStorage`
+- The server keeps the last `MAX_HISTORY_TURNS` (default: 6) human+AI pairs in memory
+- Click **↺ New Conversation** in the header to clear history and start fresh
+- Sessions are scoped to the server process — they reset on server restart
+---
+## 🌐 API Endpoints
+| Method | Endpoint | Description |
+|---|---|---|
+| `POST` | `/ask` | Ask a question; streams NDJSON response |
+| `POST` | `/clear` | Clear conversation history for a session |
+| `GET` | `/history` | Inspect conversation history for a session |
+| `GET` | `/books` | List all books indexed in the knowledge base |
+| `GET` | `/health` | Health check |
+| `GET` | `/` | Serves the frontend UI |
+| `GET` | `/docs` | Swagger UI |
+### `/ask` Request Body
+```json
+{
+  "question": "What do the scriptures say about compassion?",
+  "session_id": "optional-uuid-string"
+}
+```
+### `/ask` Response (streamed NDJSON)
+```json
+{"type": "token",   "data": "The Bhagavad Gita teaches..."}
+{"type": "token",   "data": " compassion as..."}
+{"type": "sources", "data": [{"book": "Bhagavad Gita 2:47", "page": "2:47", "snippet": "..."}]}
 ```
+Cache hits return a single `{"type": "cache", "data": {"answer": "...", "sources": [...]}}` line.
 ---
 ## 📝 Notes
 - The LLM is instructed **never** to answer from outside the provided texts
+- Each response includes **source citations** (book + chapter/verse where available)
 - Responses synthesize wisdom **across all books** when relevant
+- The semantic cache skips the LLM for repeated or near-identical questions (cosine distance < 0.35)
+- Follow-up retrieval automatically augments vague short queries with the previous question for better semantic matching
+---
+## 🗺️ Planned Features
+- Contextual chunk expansion (fetch ±1 surrounding chunks)
+- HyDE — Hypothetical Document Embedding for abstract queries
+- Answer faithfulness scoring (LLM-as-judge)
+- Query rewriting for vague inputs
+- Snippet preview on source hover
+- Query suggestions after each answer
+- Compare mode — side-by-side view across books
+- Hallucination guardrail
+- Out-of-scope detection
+- Rate limiting & API key hardening
+---
 ## 🎬 Demo
+App Link: https://shouvik99-lifeguide.hf.space/

app.py CHANGED Viewed

@@ -2,7 +2,9 @@
 app.py — FastAPI backend server for the Sacred Texts RAG application.
 Endpoints:
-    POST /ask          — Ask a question, get an answer with sources
     GET  /health       — Health check
     GET  /books        — List books currently in the knowledge base
@@ -11,13 +13,20 @@ Run with:
 """
 import os
-from fastapi import FastAPI, HTTPException
 from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel, Field
 from dotenv import load_dotenv
-from fastapi.responses import StreamingResponse, FileResponse
-from rag_chain import query_sacred_texts, get_embeddings, get_vector_store  # ← FIXED
-from starlette.concurrency import run_in_threadpool
 load_dotenv()
@@ -26,34 +35,54 @@ load_dotenv()
 app = FastAPI(
     title="Sacred Texts RAG API",
     description="Ask questions answered exclusively from Bhagavad Gita, Quran, Bible, and Guru Granth Sahib",
-    version="1.0.0",
 )
-# Allow requests from the local frontend (index.html opened as file://)
 app.add_middleware(
     CORSMiddleware,
-    allow_origins=["*"],           # Restrict in production
     allow_credentials=True,
     allow_methods=["*"],
     allow_headers=["*"],
 )
 # ─── Request / Response Models ────────────────────────────────────────────────
 class AskRequest(BaseModel):
     question: str = Field(..., min_length=3, max_length=1000,
                           example="What do the scriptures say about compassion?")
-class Source(BaseModel):
-    book: str
-    page: int | str
-    snippet: str
-class AskResponse(BaseModel):
-    question: str
-    answer: str
-    sources: list[Source]
 class HealthResponse(BaseModel):
     status: str
@@ -63,49 +92,67 @@ class BooksResponse(BaseModel):
     books: list[str]
     total_chunks: int
 # ─── Routes ───────────────────────────────────────────────────────────────────
 @app.get("/health", response_model=HealthResponse, tags=["System"])
 def health_check():
-    """Check that the API is running."""
     return {"status": "ok", "message": "Sacred Texts RAG is running 🕊️"}
 @app.get("/books", response_model=BooksResponse, tags=["Knowledge Base"])
 def list_books():
-    """List all books currently indexed in the knowledge base."""
     try:
-        embeddings = get_embeddings()               # ← FIXED Step 1
-        vector_store = get_vector_store(embeddings) # ← FIXED Step 2
-        collection = vector_store._collection
-        results = collection.get(include=["metadatas"])
-        metadatas = results.get("metadatas", [])
-        books = sorted(set(
-            m.get("book", "Unknown")
-            for m in metadatas
-            if m  # guard against None
-        ))
         return {"books": books, "total_chunks": len(metadatas)}
     except Exception as e:
         raise HTTPException(status_code=500, detail=f"Could not read knowledge base: {e}")
 @app.post("/ask", tags=["Query"])
-async def ask(request: AskRequest):
     """
     Ask a spiritual or philosophical question.
-    The answer is grounded strictly in the sacred texts.
     """
-    if not request.question.strip():
         raise HTTPException(status_code=400, detail="Question cannot be empty.")
     try:
         return StreamingResponse(
-        query_sacred_texts(request.question),
-        media_type="application/json"
         )
     except FileNotFoundError:
         raise HTTPException(
@@ -115,26 +162,64 @@ async def ask(request: AskRequest):
     except Exception as e:
         raise HTTPException(status_code=500, detail=str(e))
 @app.get("/", include_in_schema=False)
 async def serve_frontend():
-    """Serves the static frontend HTML file."""
     frontend_path = "frontend/index.html"
     if os.path.exists(frontend_path):
         return FileResponse(frontend_path)
     return {"message": "Sacred Texts RAG API is live. Visit /docs for Swagger UI."}
 # ─── Entry Point ──────────────────────────────────────────────────────────────
 if __name__ == "__main__":
     import uvicorn
-    # HF Spaces uses 7860 by default
     host = os.getenv("HOST", "0.0.0.0")
-    port = int(os.getenv("PORT", "7860"))
-    print(f"\n🕊️  Sacred Texts RAG — API Server")
     print(f"{'─' * 40}")
     print(f"🌐  Running at : http://{host}:{port}")
     print(f"{'─' * 40}\n")
-    uvicorn.run("app:app", host=host, port=port, reload=False) # reload=False for production

 app.py — FastAPI backend server for the Sacred Texts RAG application.
 Endpoints:
+    POST /ask          — Ask a question, get a streamed answer with sources
+    POST /clear        — Clear conversation history for a session
+    GET  /history      — Retrieve conversation history for a session
     GET  /health       — Health check
     GET  /books        — List books currently in the knowledge base
 """
 import os
+import uuid
+from fastapi import FastAPI, HTTPException, Request, Response
 from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel, Field
 from dotenv import load_dotenv
+from fastapi.responses import StreamingResponse, FileResponse, JSONResponse
+from rag_chain import (
+    query_sacred_texts,
+    get_embeddings,
+    get_vector_store,
+    clear_session,
+    get_history,
+)
+from langchain_core.messages import HumanMessage, AIMessage
 load_dotenv()
 app = FastAPI(
     title="Sacred Texts RAG API",
     description="Ask questions answered exclusively from Bhagavad Gita, Quran, Bible, and Guru Granth Sahib",
+    version="2.0.0",
 )
 app.add_middleware(
     CORSMiddleware,
+    allow_origins=["*"],
     allow_credentials=True,
     allow_methods=["*"],
     allow_headers=["*"],
+    expose_headers=["X-Session-Id"],
 )
+SESSION_COOKIE = "rag_session_id"
+# ─── Helpers ─────────────────────────────────────────────────────────────────
+def get_or_create_session(request: Request, response: Response) -> str:
+    """
+    Read the session ID from the cookie (or X-Session-Id header).
+    If absent, generate a new one and set it on the response cookie.
+    """
+    session_id = (
+        request.cookies.get(SESSION_COOKIE)
+        or request.headers.get("X-Session-Id")
+    )
+    if not session_id:
+        session_id = str(uuid.uuid4())
+        response.set_cookie(
+            key=SESSION_COOKIE,
+            value=session_id,
+            httponly=True,
+            samesite="lax",
+            max_age=60 * 60 * 24,   # 24 hours
+        )
+    return session_id
 # ─── Request / Response Models ────────────────────────────────────────────────
 class AskRequest(BaseModel):
     question: str = Field(..., min_length=3, max_length=1000,
                           example="What do the scriptures say about compassion?")
+    session_id: str | None = Field(
+        default=None,
+        description="Optional session ID for multi-turn conversations. "
+                    "If omitted, the server reads/creates one via cookie.",
+    )
 class HealthResponse(BaseModel):
     status: str
     books: list[str]
     total_chunks: int
+class ClearRequest(BaseModel):
+    session_id: str | None = None
+class HistoryItem(BaseModel):
+    role: str          # "human" | "ai"
+    content: str
+class HistoryResponse(BaseModel):
+    session_id: str
+    turns: int
+    messages: list[HistoryItem]
 # ─── Routes ───────────────────────────────────────────────────────────────────
 @app.get("/health", response_model=HealthResponse, tags=["System"])
 def health_check():
     return {"status": "ok", "message": "Sacred Texts RAG is running 🕊️"}
 @app.get("/books", response_model=BooksResponse, tags=["Knowledge Base"])
 def list_books():
     try:
+        embeddings   = get_embeddings()
+        vector_store = get_vector_store(embeddings)
+        collection   = vector_store._collection
+        results      = collection.get(include=["metadatas"])
+        metadatas    = results.get("metadatas", [])
+        books = sorted(set(m.get("book", "Unknown") for m in metadatas if m))
         return {"books": books, "total_chunks": len(metadatas)}
     except Exception as e:
         raise HTTPException(status_code=500, detail=f"Could not read knowledge base: {e}")
 @app.post("/ask", tags=["Query"])
+async def ask(request_body: AskRequest, request: Request, response: Response):
     """
     Ask a spiritual or philosophical question.
+    Streams the answer as NDJSON (one JSON object per line).
+    Maintains per-session conversation history automatically via cookie or
+    the `session_id` field in the request body.
     """
+    if not request_body.question.strip():
         raise HTTPException(status_code=400, detail="Question cannot be empty.")
+    # Resolve session: body field > cookie/header > new
+    if request_body.session_id:
+        session_id = request_body.session_id
+    else:
+        session_id = get_or_create_session(request, response)
     try:
+        stream = query_sacred_texts(request_body.question, session_id=session_id)
+        # We need to forward the session_id so the frontend can persist it
+        headers = {"X-Session-Id": session_id}
         return StreamingResponse(
+            stream,
+            media_type="application/x-ndjson",
+            headers=headers,
         )
     except FileNotFoundError:
         raise HTTPException(
     except Exception as e:
         raise HTTPException(status_code=500, detail=str(e))
+@app.post("/clear", tags=["Session"])
+async def clear_conversation(body: ClearRequest, request: Request, response: Response):
+    """
+    Clear the conversation history for the given session.
+    If session_id is omitted, clears the session identified by cookie.
+    """
+    session_id = body.session_id or request.cookies.get(SESSION_COOKIE)
+    if not session_id:
+        raise HTTPException(status_code=400, detail="No session to clear.")
+    clear_session(session_id)
+    return {"status": "cleared", "session_id": session_id}
+@app.get("/history", response_model=HistoryResponse, tags=["Session"])
+async def conversation_history(session_id: str | None = None, request: Request = None):
+    """
+    Return the conversation history for a session (for debugging / display).
+    """
+    sid = session_id or (request.cookies.get(SESSION_COOKIE) if request else None)
+    if not sid:
+        raise HTTPException(status_code=400, detail="Provide session_id query param or cookie.")
+    messages = get_history(sid)
+    items = []
+    for msg in messages:
+        if isinstance(msg, HumanMessage):
+            items.append(HistoryItem(role="human", content=msg.content))
+        elif isinstance(msg, AIMessage):
+            items.append(HistoryItem(role="ai", content=msg.content))
+    return HistoryResponse(
+        session_id=sid,
+        turns=len(items) // 2,
+        messages=items,
+    )
 @app.get("/", include_in_schema=False)
 async def serve_frontend():
     frontend_path = "frontend/index.html"
     if os.path.exists(frontend_path):
         return FileResponse(frontend_path)
     return {"message": "Sacred Texts RAG API is live. Visit /docs for Swagger UI."}
 # ─── Entry Point ──────────────────────────────────────────────────────────────
 if __name__ == "__main__":
     import uvicorn
     host = os.getenv("HOST", "0.0.0.0")
+    port = int(os.getenv("PORT", "7860"))
+    print(f"\n🕊️  Sacred Texts RAG — API Server v2.0")
     print(f"{'─' * 40}")
     print(f"🌐  Running at : http://{host}:{port}")
+    print(f"🧠  Multi-turn conversation: ENABLED")
     print(f"{'─' * 40}\n")
+    uvicorn.run("app:app", host=host, port=port, reload=False)

frontend/index.html CHANGED Viewed

@@ -13,13 +13,7 @@
   <style>
     /* ── Reset & Base ─────────────────────────────────────────── */
-    *,
-    *::before,
-    *::after {
-      box-sizing: border-box;
-      margin: 0;
-      padding: 0;
-    }
     :root {
       --bg: #0d0b07;
@@ -32,68 +26,13 @@
       --cream: #f0e6cc;
       --muted: #7a6a4a;
       --gita: #e07b3b;
-      /* saffron */
       --quran: #3bba85;
-      /* green */
       --bible: #5b8ce0;
-      /* blue */
       --granth: #b07ce0;
-      /* violet — Sikh royal purple */
-    }
-    /* Animated Thinking state for streaming */
-    .thinking-dots {
-      display: inline-flex;
-      gap: 4px;
-      margin-left: 4px;
-    }
-    .thinking-dots span {
-      width: 4px;
-      height: 4px;
-      background: var(--gold);
-      border-radius: 50%;
-      animation: bounce 1.4s infinite ease-in-out;
-    }
-    @keyframes bounce {
-      0%,
-      80%,
-      100% {
-        transform: scale(0);
-      }
-      40% {
-        transform: scale(1);
-      }
     }
-    /* Make streaming text fade in slightly for smoothness */
-    #currentStreamingMsg p {
-      animation: fadeIn 0.3s ease-in;
-    }
-    @keyframes fadeIn {
-      from {
-        opacity: 0.7;
-      }
-      to {
-        opacity: 1;
-      }
-    }
-    /* Ensure the bubble has a minimum height so it doesn't look like a "small block" */
-    .msg-bubble:empty::before {
-      content: "Writing wisdom...";
-      color: var(--muted);
-      font-style: italic;
-      font-size: 0.9rem;
-    }
-    html,
-    body {
       height: 100%;
       background: var(--bg);
       color: var(--cream);
@@ -103,15 +42,14 @@
       overflow: hidden;
     }
-    /* ── Background texture ───────────────────────────────────── */
     body::before {
       content: '';
       position: fixed;
       inset: 0;
       background:
-        radial-gradient(ellipse 80% 60% at 20% 10%, rgba(201, 153, 58, .07) 0%, transparent 60%),
-        radial-gradient(ellipse 60% 80% at 80% 90%, rgba(91, 140, 224, .05) 0%, transparent 60%),
-        radial-gradient(ellipse 50% 50% at 50% 50%, rgba(176, 124, 224, .04) 0%, transparent 60%),
         url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='400' height='400'%3E%3Cfilter id='n'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='0.75' numOctaves='4' stitchTiles='stitch'/%3E%3CfeColorMatrix type='saturate' values='0'/%3E%3C/filter%3E%3Crect width='400' height='400' filter='url(%23n)' opacity='0.04'/%3E%3C/svg%3E");
       pointer-events: none;
       z-index: 0;
@@ -131,106 +69,121 @@
     /* ── Header ───────────────────────────────────────────────── */
     header {
-      padding: 28px 0 18px;
       text-align: center;
       border-bottom: 1px solid var(--border);
     }
     .mandala {
-      font-size: 2rem;
       letter-spacing: .5rem;
       color: var(--gold);
       opacity: .6;
-      margin-bottom: 8px;
       animation: spin 60s linear infinite;
       display: inline-block;
     }
-    @keyframes spin {
-      to {
-        transform: rotate(360deg);
-      }
-    }
     h1 {
       font-family: 'Cinzel Decorative', serif;
-      font-size: clamp(1.2rem, 3vw, 1.9rem);
       font-weight: 400;
       color: var(--gold-pale);
       letter-spacing: .12em;
-      text-shadow: 0 0 40px rgba(201, 153, 58, .3);
     }
     .subtitle {
       font-family: 'IM Fell English', serif;
       font-style: italic;
-      font-size: .95rem;
       color: var(--muted);
-      margin-top: 4px;
     }
     .badges {
       display: flex;
       justify-content: center;
-      gap: 12px;
-      margin-top: 12px;
       flex-wrap: wrap;
     }
     .badge {
-      font-size: .72rem;
       letter-spacing: .1em;
       text-transform: uppercase;
-      padding: 3px 10px;
       border-radius: 20px;
       border: 1px solid;
       font-family: 'Cormorant Garamond', serif;
       font-weight: 600;
     }
-    .badge-gita {
-      color: var(--gita);
-      border-color: var(--gita);
-      background: rgba(224, 123, 59, .1);
     }
-    .badge-quran {
-      color: var(--quran);
-      border-color: var(--quran);
-      background: rgba(59, 186, 133, .1);
     }
-    .badge-bible {
-      color: var(--bible);
-      border-color: var(--bible);
-      background: rgba(91, 140, 224, .1);
     }
-    .badge-granth {
-      color: var(--granth);
-      border-color: var(--granth);
-      background: rgba(176, 124, 224, .1);
     }
     /* ── Chat Window ──────────────────────────────────────────── */
     .chat-window {
       overflow-y: auto;
-      padding: 28px 0;
       display: flex;
       flex-direction: column;
       gap: 24px;
       scrollbar-width: thin;
       scrollbar-color: var(--border) transparent;
     }
-    .chat-window::-webkit-scrollbar {
-      width: 4px;
-    }
-    .chat-window::-webkit-scrollbar-thumb {
-      background: var(--border);
-      border-radius: 4px;
-    }
     /* ── Welcome State ────────────────────────────────────────── */
     .welcome {
@@ -239,84 +192,46 @@
       padding: 20px;
       max-width: 500px;
     }
-    .welcome-icon {
-      font-size: 3.5rem;
-      margin-bottom: 16px;
-      filter: drop-shadow(0 0 20px rgba(201, 153, 58, .4));
-    }
     .welcome h2 {
       font-family: 'IM Fell English', serif;
       font-style: italic;
-      font-size: 1.5rem;
       color: var(--gold-light);
-      margin-bottom: 10px;
-    }
-    .welcome p {
-      font-size: .95rem;
-      color: var(--muted);
-      line-height: 1.8;
-    }
-    .suggested-queries {
-      margin-top: 24px;
-      display: flex;
-      flex-direction: column;
-      gap: 8px;
     }
     .suggested-queries button {
       background: var(--surface);
       border: 1px solid var(--border);
       color: var(--cream);
-      padding: 10px 16px;
       border-radius: 8px;
       font-family: 'Cormorant Garamond', serif;
-      font-size: .95rem;
       font-style: italic;
       cursor: pointer;
       transition: all .2s;
       text-align: left;
     }
-    .suggested-queries button:hover {
-      border-color: var(--gold);
-      color: var(--gold-pale);
-      background: var(--surface-2);
-    }
     /* ── Messages ─────────────────────────────────────────────── */
     .message {
       display: flex;
       flex-direction: column;
-      gap: 8px;
       animation: fadeUp .4s ease both;
     }
-    @keyframes fadeUp {
-      from {
-        opacity: 0;
-        transform: translateY(12px);
-      }
-      to {
-        opacity: 1;
-        transform: translateY(0);
-      }
-    }
-    .message-user {
-      align-items: flex-end;
-    }
-    .message-assistant {
-      align-items: flex-start;
-    }
     .msg-label {
-      font-size: .7rem;
       letter-spacing: .15em;
       text-transform: uppercase;
       color: var(--muted);
@@ -326,7 +241,7 @@
     .msg-bubble {
       max-width: 92%;
-      padding: 16px 20px;
       border-radius: 12px;
       line-height: 1.75;
     }
@@ -336,40 +251,40 @@
       border: 1px solid var(--border);
       color: var(--cream);
       font-style: italic;
-      font-size: 1rem;
       border-bottom-right-radius: 4px;
     }
     .message-assistant .msg-bubble {
-      background: linear-gradient(135deg, var(--surface) 0%, rgba(30, 26, 17, .95) 100%);
-      border: 1px solid rgba(201, 153, 58, .2);
       color: var(--cream);
-      font-size: 1rem;
       border-bottom-left-radius: 4px;
-      box-shadow: 0 4px 24px rgba(0, 0, 0, .4), inset 0 1px 0 rgba(201, 153, 58, .1);
-    }
-    .msg-bubble p {
-      margin-bottom: 1em;
     }
-    .msg-bubble p:last-child {
-      margin-bottom: 0;
-    }
-    .msg-bubble strong {
-      color: var(--gold-light);
-      font-weight: 600;
     }
     /* ── Sources Panel ────────────────────────────────────────── */
-    .sources {
-      max-width: 92%;
-      margin-top: 4px;
-    }
     .sources-label {
-      font-size: .72rem;
       letter-spacing: .12em;
       text-transform: uppercase;
       color: var(--muted);
@@ -378,27 +293,12 @@
       align-items: center;
       gap: 6px;
     }
-    .sources-label::before,
-    .sources-label::after {
-      content: '';
-      flex: 1;
-      height: 1px;
-      background: var(--border);
-    }
-    .sources-label::before {
-      max-width: 20px;
-    }
-    .source-tags {
-      display: flex;
-      flex-wrap: wrap;
-      gap: 6px;
-    }
     .source-tag {
-      font-size: .78rem;
       padding: 4px 10px;
       border-radius: 6px;
       border: 1px solid;
@@ -406,101 +306,55 @@
       cursor: default;
       transition: all .2s;
     }
-    .source-tag:hover {
-      transform: translateY(-1px);
-      filter: brightness(1.2);
-    }
-    .source-gita {
-      color: var(--gita);
-      border-color: rgba(224, 123, 59, .4);
-      background: rgba(224, 123, 59, .08);
-    }
-    .source-quran {
-      color: var(--quran);
-      border-color: rgba(59, 186, 133, .4);
-      background: rgba(59, 186, 133, .08);
-    }
-    .source-bible {
-      color: var(--bible);
-      border-color: rgba(91, 140, 224, .4);
-      background: rgba(91, 140, 224, .08);
-    }
-    .source-granth {
-      color: var(--granth);
-      border-color: rgba(176, 124, 224, .4);
-      background: rgba(176, 124, 224, .08);
-    }
-    .source-other {
-      color: var(--gold-light);
-      border-color: rgba(201, 153, 58, .4);
-      background: rgba(201, 153, 58, .08);
-    }
     /* ── Loading ──────────────────────────────────────────────── */
     .loading {
       display: flex;
       align-items: center;
-      gap: 12px;
-      padding: 14px 18px;
-      border: 1px solid rgba(201, 153, 58, .15);
       border-radius: 12px;
       background: var(--surface);
       width: fit-content;
       max-width: 280px;
     }
-    .loading-dots {
-      display: flex;
-      gap: 5px;
-    }
     .loading-dots span {
-      width: 6px;
-      height: 6px;
       border-radius: 50%;
       background: var(--gold);
       animation: dot-pulse 1.4s ease-in-out infinite;
     }
-    .loading-dots span:nth-child(2) {
-      animation-delay: .2s;
-    }
-    .loading-dots span:nth-child(3) {
-      animation-delay: .4s;
-    }
     @keyframes dot-pulse {
-      0%,
-      80%,
-      100% {
-        opacity: .2;
-        transform: scale(.8);
-      }
-      40% {
-        opacity: 1;
-        transform: scale(1.1);
-      }
     }
-    .loading-text {
-      font-size: .85rem;
-      font-style: italic;
-      color: var(--muted);
     }
     /* ── Error ────────────────────────────────────────────────── */
     .error-bubble {
-      background: rgba(180, 60, 60, .1);
-      border: 1px solid rgba(180, 60, 60, .3);
       color: #e08080;
       padding: 12px 16px;
       border-radius: 10px;
@@ -509,52 +363,38 @@
     }
     /* ── Input Area ───────────────────────────────────────────── */
-    .input-area {
-      padding: 16px 0 24px;
-      border-top: 1px solid var(--border);
-    }
-    .input-row {
-      display: flex;
-      gap: 10px;
-      align-items: flex-end;
-    }
     textarea {
       flex: 1;
       background: var(--surface);
       border: 1px solid var(--border);
       color: var(--cream);
-      padding: 14px 16px;
       border-radius: 12px;
       font-family: 'Cormorant Garamond', serif;
-      font-size: 1rem;
       line-height: 1.6;
       resize: none;
-      min-height: 52px;
-      max-height: 140px;
       outline: none;
       transition: border-color .2s, box-shadow .2s;
     }
-    textarea::placeholder {
-      color: var(--muted);
-      font-style: italic;
-    }
     textarea:focus {
-      border-color: rgba(201, 153, 58, .5);
-      box-shadow: 0 0 0 3px rgba(201, 153, 58, .08);
     }
     .send-btn {
-      width: 52px;
-      height: 52px;
       border-radius: 12px;
-      border: 1px solid rgba(201, 153, 58, .4);
-      background: linear-gradient(135deg, rgba(201, 153, 58, .2), rgba(201, 153, 58, .05));
       color: var(--gold);
-      font-size: 1.3rem;
       cursor: pointer;
       transition: all .2s;
       display: flex;
@@ -562,36 +402,15 @@
       justify-content: center;
       flex-shrink: 0;
     }
     .send-btn:hover:not(:disabled) {
-      background: linear-gradient(135deg, rgba(201, 153, 58, .35), rgba(201, 153, 58, .15));
       border-color: var(--gold);
       transform: translateY(-1px);
-      box-shadow: 0 4px 16px rgba(201, 153, 58, .2);
     }
-    .send-btn:disabled {
-      opacity: .3;
-      cursor: not-allowed;
-      transform: none;
-    }
-    .input-hint {
-      font-size: .72rem;
-      color: var(--muted);
-      margin-top: 8px;
-      text-align: center;
-      font-style: italic;
-    }
-    /* ── Divider line ─────────────────────────────────────────── */
-    .ornament {
-      text-align: center;
-      color: var(--border);
-      font-size: .8rem;
-      letter-spacing: .4em;
-      margin: 4px 0;
-    }
   </style>
 </head>
@@ -609,6 +428,16 @@
         <span class="badge badge-bible">Bible</span>
         <span class="badge badge-granth">Guru Granth Sahib</span>
       </div>
     </header>
     <!-- Chat Window -->
@@ -616,15 +445,18 @@
       <div class="welcome" id="welcomePane">
         <div class="welcome-icon">🕊️</div>
         <h2>"Seek, and it shall be given unto you"</h2>
-        <p>Ask any spiritual or philosophical question. Answers are drawn exclusively from the Bhagavad Gita, Quran,
-          Bible, and Guru Granth Sahib.</p>
         <div class="suggested-queries">
           <button onclick="askSuggested(this)">What do the scriptures say about forgiveness?</button>
           <button onclick="askSuggested(this)">How should one face fear and death?</button>
           <button onclick="askSuggested(this)">What is the purpose of prayer and worship?</button>
           <button onclick="askSuggested(this)">What is the nature of the soul according to each religion?</button>
-          <button onclick="askSuggested(this)">What do the scriptures teach about humility and selfless
-            service?</button>
         </div>
       </div>
     </div>
@@ -632,26 +464,88 @@
     <!-- Input -->
     <div class="input-area">
       <div class="input-row">
-        <textarea id="questionInput" placeholder="Ask a question from the sacred texts…" rows="1"
-          onkeydown="handleKey(event)" oninput="autoResize(this)"></textarea>
-        <button class="send-btn" id="sendBtn" onclick="sendQuestion()" title="Ask (Enter)">
-          ✦
-        </button>
       </div>
-      <p class="input-hint">Press Enter to ask · Shift+Enter for new line · Answers grounded strictly in the sacred
-        texts</p>
     </div>
   </div>
   <script>
     const API_BASE = window.location.origin;
-    let isLoading = false;
-    // ── Helpers ────────────────────────────────────────────────
     function getSourceClass(book) {
       const b = book.toLowerCase();
-      if (b.includes("gita")) return "source-gita";
       if (b.includes("quran") || b.includes("koran")) return "source-quran";
       if (b.includes("bible") || b.includes("testament")) return "source-bible";
       if (b.includes("granth") || b.includes("guru")) return "source-granth";
@@ -670,23 +564,28 @@
     function autoResize(el) {
       el.style.height = "auto";
-      el.style.height = Math.min(el.scrollHeight, 140) + "px";
     }
     function formatAnswer(text) {
-      // Convert markdown-ish bold (**text**) to <strong>
       text = text.replace(/\*\*(.*?)\*\*/g, "<strong>$1</strong>");
-      // Wrap paragraphs
       return text.split(/\n\n+/).filter(p => p.trim()).map(p => `<p>${p.trim()}</p>`).join("");
     }
-    // ── Append message to chat ─────────────────────────────────
-    function appendUserMessage(question) {
       const w = document.getElementById("chatWindow");
       const div = document.createElement("div");
       div.className = "message message-user";
       div.innerHTML = `
-        <span class="msg-label">You</span>
         <div class="msg-bubble">${escapeHtml(question)}</div>
       `;
       w.appendChild(div);
@@ -710,63 +609,46 @@
       return div;
     }
-    function replaceLoadingWithAnswer(loadingEl, data) {
-      const w = document.getElementById("chatWindow");
-      // Build source tags
-      const sourceTags = (data.sources || []).map(s => {
         const cls = getSourceClass(s.book);
-        return `<span class="source-tag ${cls}" title="Page ${s.page}">📖 ${s.book}</span>`;
       }).join("");
-      const sourcesHtml = sourceTags ? `
-        <div class="sources">
-          <div class="sources-label">References</div>
-          <div class="source-tags">${sourceTags}</div>
-        </div>
-      ` : "";
-      loadingEl.innerHTML = `
-        <span class="msg-label">Sacred Texts</span>
-        <div class="msg-bubble">${formatAnswer(data.answer)}</div>
-        ${sourcesHtml}
-      `;
-      scrollToBottom();
-    }
-    function replaceLoadingWithError(loadingEl, msg) {
-      loadingEl.innerHTML = `
-        <span class="msg-label">Error</span>
-        <div class="error-bubble">⚠️ ${escapeHtml(msg)}</div>
-      `;
-      scrollToBottom();
-    }
-    function escapeHtml(str) {
-      return str.replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;");
     }
-    // ── Send question ──────────────────────────────────────────
     async function sendQuestion() {
       if (isLoading) return;
-      const input = document.getElementById("questionInput");
       const question = input.value.trim();
       if (!question) return;
       hideWelcome();
       isLoading = true;
       document.getElementById("sendBtn").disabled = true;
       input.value = "";
       input.style.height = "auto";
-      appendUserMessage(question);
       const loadingEl = appendLoading();
       try {
         const res = await fetch(`${API_BASE}/ask`, {
-          method: "POST",
           headers: { "Content-Type": "application/json" },
-          body: JSON.stringify({ question }),
         });
         if (!res.ok) {
@@ -774,36 +656,36 @@
           throw new Error(err.detail || "Server error");
         }
-        // Initialize variables to build the UI
-        const reader = res.body.getReader();
-        const decoder = new TextDecoder();
-        let fullAnswer = "";
-        let buffer = "";
-        // Prepare the assistant UI bubble immediately
         loadingEl.innerHTML = `
-  <span class="msg-label">Sacred Texts</span>
-  <div class="msg-bubble" id="currentStreamingMsg">
-    <div class="loading-text">The scriptures are being revealed<span class="thinking-dots"><span></span><span></span><span></span></span></div>
-  </div>
-  <div id="currentStreamingSources"></div>
-`;
-        const bubble = document.getElementById("currentStreamingMsg");
         const sourcesContainer = document.getElementById("currentStreamingSources");
-        let firstTokenReceived = false;
         while (true) {
           const { done, value } = await reader.read();
           if (done) break;
-          // Append new data to the buffer
           buffer += decoder.decode(value, { stream: true });
-          // Split by newline
           const lines = buffer.split("\n");
-          buffer = lines.pop();
           for (const line of lines) {
             if (!line.trim()) continue;
@@ -811,20 +693,13 @@
               const parsed = JSON.parse(line);
               if (parsed.type === "token") {
-                //Remove the loading text as soon as the first word arrives
-                if (!firstTokenReceived) {
-                  bubble.innerHTML = "";
-                  firstTokenReceived = true;
-                }
                 fullAnswer += parsed.data;
-                // Dynamically update the bubble with formatted markdown/paragraphs
                 bubble.innerHTML = formatAnswer(fullAnswer);
                 scrollToBottom();
               }
               else if (parsed.type === "sources") {
-                sourcesData = parsed.data;
-                renderSourcesInPlace(sourcesContainer, sourcesData);
               }
               else if (parsed.type === "cache") {
                 bubble.innerHTML = formatAnswer(parsed.data.answer);
@@ -832,18 +707,24 @@
                 scrollToBottom();
               }
             } catch (e) {
-              console.error("Stream parsing error", e);
             }
           }
         }
-        // Clean up IDs once done so next messages don't conflict
         bubble.removeAttribute("id");
         sourcesContainer.removeAttribute("id");
       } catch (err) {
-        let msg = err.message;
-        replaceLoadingWithError(loadingEl, msg);
       } finally {
         isLoading = false;
         document.getElementById("sendBtn").disabled = false;
@@ -851,27 +732,9 @@
       }
     }
-    // Helper to render sources inside the streaming flow
-    function renderSourcesInPlace(container, sources) {
-      const sourceTags = (sources || []).map(s => {
-        const cls = getSourceClass(s.book);
-        // Use verse citations as the primary text
-        return `<span class="source-tag ${cls}" title="${s.snippet}">📖 ${s.book}</span>`;
-      }).join("");
-      if (sourceTags) {
-        container.innerHTML = `
-      <div class="sources">
-        <div class="sources-label">Citations</div>
-        <div class="source-tags">${sourceTags}</div>
-      </div>
-    `;
-      }
-    }
     function askSuggested(btn) {
       const input = document.getElementById("questionInput");
-      input.value = btn.textContent;
       autoResize(input);
       sendQuestion();
     }
@@ -882,7 +745,9 @@
         sendQuestion();
       }
     }
   </script>
 </body>
 </html>

   <style>
     /* ── Reset & Base ─────────────────────────────────────────── */
+    *, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
     :root {
       --bg: #0d0b07;
       --cream: #f0e6cc;
       --muted: #7a6a4a;
       --gita: #e07b3b;
       --quran: #3bba85;
       --bible: #5b8ce0;
       --granth: #b07ce0;
+      --danger: #e06060;
     }
+    html, body {
       height: 100%;
       background: var(--bg);
       color: var(--cream);
       overflow: hidden;
     }
     body::before {
       content: '';
       position: fixed;
       inset: 0;
       background:
+        radial-gradient(ellipse 80% 60% at 20% 10%, rgba(201,153,58,.07) 0%, transparent 60%),
+        radial-gradient(ellipse 60% 80% at 80% 90%, rgba(91,140,224,.05) 0%, transparent 60%),
+        radial-gradient(ellipse 50% 50% at 50% 50%, rgba(176,124,224,.04) 0%, transparent 60%),
         url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='400' height='400'%3E%3Cfilter id='n'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='0.75' numOctaves='4' stitchTiles='stitch'/%3E%3CfeColorMatrix type='saturate' values='0'/%3E%3C/filter%3E%3Crect width='400' height='400' filter='url(%23n)' opacity='0.04'/%3E%3C/svg%3E");
       pointer-events: none;
       z-index: 0;
     /* ── Header ───────────────────────────────────────────────── */
     header {
+      padding: 20px 0 14px;
       text-align: center;
       border-bottom: 1px solid var(--border);
+      position: relative;
     }
     .mandala {
+      font-size: 1.8rem;
       letter-spacing: .5rem;
       color: var(--gold);
       opacity: .6;
+      margin-bottom: 6px;
       animation: spin 60s linear infinite;
       display: inline-block;
     }
+    @keyframes spin { to { transform: rotate(360deg); } }
     h1 {
       font-family: 'Cinzel Decorative', serif;
+      font-size: clamp(1.1rem, 3vw, 1.7rem);
       font-weight: 400;
       color: var(--gold-pale);
       letter-spacing: .12em;
+      text-shadow: 0 0 40px rgba(201,153,58,.3);
     }
     .subtitle {
       font-family: 'IM Fell English', serif;
       font-style: italic;
+      font-size: .9rem;
       color: var(--muted);
+      margin-top: 3px;
     }
     .badges {
       display: flex;
       justify-content: center;
+      gap: 10px;
+      margin-top: 10px;
       flex-wrap: wrap;
     }
     .badge {
+      font-size: .7rem;
       letter-spacing: .1em;
       text-transform: uppercase;
+      padding: 2px 9px;
       border-radius: 20px;
       border: 1px solid;
       font-family: 'Cormorant Garamond', serif;
       font-weight: 600;
     }
+    .badge-gita   { color: var(--gita);   border-color: var(--gita);   background: rgba(224,123,59,.1); }
+    .badge-quran  { color: var(--quran);  border-color: var(--quran);  background: rgba(59,186,133,.1); }
+    .badge-bible  { color: var(--bible);  border-color: var(--bible);  background: rgba(91,140,224,.1); }
+    .badge-granth { color: var(--granth); border-color: var(--granth); background: rgba(176,124,224,.1); }
+    /* ── Session bar ──────────────────────────────────────────── */
+    .session-bar {
+      display: none;   /* hidden until a conversation starts */
+      align-items: center;
+      justify-content: space-between;
+      gap: 8px;
+      margin-top: 10px;
+      padding: 5px 10px;
+      border: 1px solid var(--border);
+      border-radius: 8px;
+      background: var(--surface);
+      font-size: .75rem;
+      color: var(--muted);
     }
+    .session-bar.visible { display: flex; }
+    .session-turn-count {
+      font-family: 'Cormorant Garamond', serif;
+      font-style: italic;
     }
+    .session-turn-count span {
+      color: var(--gold-light);
+      font-weight: 600;
     }
+    .new-convo-btn {
+      display: flex;
+      align-items: center;
+      gap: 5px;
+      background: none;
+      border: 1px solid var(--border);
+      color: var(--muted);
+      padding: 3px 10px;
+      border-radius: 6px;
+      font-family: 'Cormorant Garamond', serif;
+      font-size: .75rem;
+      cursor: pointer;
+      transition: all .2s;
+    }
+    .new-convo-btn:hover {
+      border-color: var(--danger);
+      color: var(--danger);
     }
     /* ── Chat Window ──────────────────────────────────────────── */
     .chat-window {
       overflow-y: auto;
+      padding: 24px 0;
       display: flex;
       flex-direction: column;
       gap: 24px;
       scrollbar-width: thin;
       scrollbar-color: var(--border) transparent;
     }
+    .chat-window::-webkit-scrollbar { width: 4px; }
+    .chat-window::-webkit-scrollbar-thumb { background: var(--border); border-radius: 4px; }
     /* ── Welcome State ────────────────────────────────────────── */
     .welcome {
       padding: 20px;
       max-width: 500px;
     }
+    .welcome-icon { font-size: 3.2rem; margin-bottom: 14px; filter: drop-shadow(0 0 20px rgba(201,153,58,.4)); }
     .welcome h2 {
       font-family: 'IM Fell English', serif;
       font-style: italic;
+      font-size: 1.4rem;
       color: var(--gold-light);
+      margin-bottom: 8px;
     }
+    .welcome p { font-size: .92rem; color: var(--muted); line-height: 1.8; }
+    .suggested-queries { margin-top: 20px; display: flex; flex-direction: column; gap: 7px; }
     .suggested-queries button {
       background: var(--surface);
       border: 1px solid var(--border);
       color: var(--cream);
+      padding: 9px 14px;
       border-radius: 8px;
       font-family: 'Cormorant Garamond', serif;
+      font-size: .92rem;
       font-style: italic;
       cursor: pointer;
       transition: all .2s;
       text-align: left;
     }
+    .suggested-queries button:hover { border-color: var(--gold); color: var(--gold-pale); background: var(--surface-2); }
     /* ── Messages ─────────────────────────────────────────────── */
     .message {
       display: flex;
       flex-direction: column;
+      gap: 6px;
       animation: fadeUp .4s ease both;
     }
+    @keyframes fadeUp { from { opacity: 0; transform: translateY(10px); } to { opacity: 1; transform: translateY(0); } }
+    .message-user      { align-items: flex-end; }
+    .message-assistant { align-items: flex-start; }
     .msg-label {
+      font-size: .68rem;
       letter-spacing: .15em;
       text-transform: uppercase;
       color: var(--muted);
     .msg-bubble {
       max-width: 92%;
+      padding: 14px 18px;
       border-radius: 12px;
       line-height: 1.75;
     }
       border: 1px solid var(--border);
       color: var(--cream);
       font-style: italic;
+      font-size: .97rem;
       border-bottom-right-radius: 4px;
     }
     .message-assistant .msg-bubble {
+      background: linear-gradient(135deg, var(--surface) 0%, rgba(30,26,17,.95) 100%);
+      border: 1px solid rgba(201,153,58,.2);
       color: var(--cream);
+      font-size: .97rem;
       border-bottom-left-radius: 4px;
+      box-shadow: 0 4px 24px rgba(0,0,0,.4), inset 0 1px 0 rgba(201,153,58,.1);
     }
+    .msg-bubble p            { margin-bottom: 1em; }
+    .msg-bubble p:last-child { margin-bottom: 0; }
+    .msg-bubble strong       { color: var(--gold-light); font-weight: 600; }
+    /* Follow-up continuation pill */
+    .followup-pill {
+      font-size: .68rem;
+      padding: 2px 8px;
+      border-radius: 10px;
+      background: rgba(201,153,58,.08);
+      border: 1px solid rgba(201,153,58,.2);
+      color: var(--muted);
+      margin-left: 6px;
+      font-style: italic;
+      vertical-align: middle;
     }
     /* ── Sources Panel ────────────────────────────────────────── */
+    .sources { max-width: 92%; margin-top: 4px; }
     .sources-label {
+      font-size: .7rem;
       letter-spacing: .12em;
       text-transform: uppercase;
       color: var(--muted);
       align-items: center;
       gap: 6px;
     }
+    .sources-label::before, .sources-label::after { content: ''; flex: 1; height: 1px; background: var(--border); }
+    .sources-label::before { max-width: 20px; }
+    .source-tags { display: flex; flex-wrap: wrap; gap: 6px; }
     .source-tag {
+      font-size: .76rem;
       padding: 4px 10px;
       border-radius: 6px;
       border: 1px solid;
       cursor: default;
       transition: all .2s;
     }
+    .source-tag:hover { transform: translateY(-1px); filter: brightness(1.2); }
+    .source-gita   { color: var(--gita);   border-color: rgba(224,123,59,.4);  background: rgba(224,123,59,.08); }
+    .source-quran  { color: var(--quran);  border-color: rgba(59,186,133,.4);  background: rgba(59,186,133,.08); }
+    .source-bible  { color: var(--bible);  border-color: rgba(91,140,224,.4);  background: rgba(91,140,224,.08); }
+    .source-granth { color: var(--granth); border-color: rgba(176,124,224,.4); background: rgba(176,124,224,.08); }
+    .source-other  { color: var(--gold-light); border-color: rgba(201,153,58,.4); background: rgba(201,153,58,.08); }
     /* ── Loading ──────────────────────────────────────────────── */
     .loading {
       display: flex;
       align-items: center;
+      gap: 10px;
+      padding: 12px 16px;
+      border: 1px solid rgba(201,153,58,.15);
       border-radius: 12px;
       background: var(--surface);
       width: fit-content;
       max-width: 280px;
     }
+    .loading-dots { display: flex; gap: 5px; }
     .loading-dots span {
+      width: 6px; height: 6px;
       border-radius: 50%;
       background: var(--gold);
       animation: dot-pulse 1.4s ease-in-out infinite;
     }
+    .loading-dots span:nth-child(2) { animation-delay: .2s; }
+    .loading-dots span:nth-child(3) { animation-delay: .4s; }
     @keyframes dot-pulse {
+      0%,80%,100% { opacity: .2; transform: scale(.8); }
+      40%          { opacity: 1;  transform: scale(1.1); }
     }
+    .loading-text { font-size: .82rem; font-style: italic; color: var(--muted); }
+    /* ── Thinking dots (streaming) ────────────────────────────── */
+    .thinking-dots { display: inline-flex; gap: 4px; margin-left: 4px; }
+    .thinking-dots span {
+      width: 4px; height: 4px;
+      background: var(--gold);
+      border-radius: 50%;
+      animation: bounce 1.4s infinite ease-in-out;
     }
+    @keyframes bounce { 0%,80%,100% { transform: scale(0); } 40% { transform: scale(1); } }
     /* ── Error ────────────────────────────────────────────────── */
     .error-bubble {
+      background: rgba(180,60,60,.1);
+      border: 1px solid rgba(180,60,60,.3);
       color: #e08080;
       padding: 12px 16px;
       border-radius: 10px;
     }
     /* ── Input Area ───────────────────────────────────────────── */
+    .input-area { padding: 14px 0 22px; border-top: 1px solid var(--border); }
+    .input-row  { display: flex; gap: 10px; align-items: flex-end; }
     textarea {
       flex: 1;
       background: var(--surface);
       border: 1px solid var(--border);
       color: var(--cream);
+      padding: 13px 15px;
       border-radius: 12px;
       font-family: 'Cormorant Garamond', serif;
+      font-size: .97rem;
       line-height: 1.6;
       resize: none;
+      min-height: 50px;
+      max-height: 130px;
       outline: none;
       transition: border-color .2s, box-shadow .2s;
     }
+    textarea::placeholder { color: var(--muted); font-style: italic; }
     textarea:focus {
+      border-color: rgba(201,153,58,.5);
+      box-shadow: 0 0 0 3px rgba(201,153,58,.08);
     }
     .send-btn {
+      width: 50px; height: 50px;
       border-radius: 12px;
+      border: 1px solid rgba(201,153,58,.4);
+      background: linear-gradient(135deg, rgba(201,153,58,.2), rgba(201,153,58,.05));
       color: var(--gold);
+      font-size: 1.25rem;
       cursor: pointer;
       transition: all .2s;
       display: flex;
       justify-content: center;
       flex-shrink: 0;
     }
     .send-btn:hover:not(:disabled) {
+      background: linear-gradient(135deg, rgba(201,153,58,.35), rgba(201,153,58,.15));
       border-color: var(--gold);
       transform: translateY(-1px);
+      box-shadow: 0 4px 16px rgba(201,153,58,.2);
     }
+    .send-btn:disabled { opacity: .3; cursor: not-allowed; transform: none; }
+    .input-hint { font-size: .7rem; color: var(--muted); margin-top: 7px; text-align: center; font-style: italic; }
   </style>
 </head>
         <span class="badge badge-bible">Bible</span>
         <span class="badge badge-granth">Guru Granth Sahib</span>
       </div>
+      <!-- Session status bar — visible once conversation starts -->
+      <div class="session-bar" id="sessionBar">
+        <span class="session-turn-count" id="turnCountLabel">
+          Turn <span id="turnCount">0</span>
+        </span>
+        <button class="new-convo-btn" onclick="startNewConversation()" title="Clear history and start fresh">
+          ↺ New Conversation
+        </button>
+      </div>
     </header>
     <!-- Chat Window -->
       <div class="welcome" id="welcomePane">
         <div class="welcome-icon">🕊️</div>
         <h2>"Seek, and it shall be given unto you"</h2>
+        <p>Ask any spiritual or philosophical question. Answers are drawn exclusively from the
+           Bhagavad Gita, Quran, Bible, and Guru Granth Sahib.<br><br>
+           <em style="color:var(--gold-light); font-size:.9rem;">
+             You can now ask follow-up questions — the guide remembers the conversation.
+           </em>
+        </p>
         <div class="suggested-queries">
           <button onclick="askSuggested(this)">What do the scriptures say about forgiveness?</button>
           <button onclick="askSuggested(this)">How should one face fear and death?</button>
           <button onclick="askSuggested(this)">What is the purpose of prayer and worship?</button>
           <button onclick="askSuggested(this)">What is the nature of the soul according to each religion?</button>
+          <button onclick="askSuggested(this)">What do the scriptures teach about humility and selfless service?</button>
         </div>
       </div>
     </div>
     <!-- Input -->
     <div class="input-area">
       <div class="input-row">
+        <textarea id="questionInput"
+                  placeholder="Ask a question, or follow up on the previous answer…"
+                  rows="1"
+                  onkeydown="handleKey(event)"
+                  oninput="autoResize(this)"></textarea>
+        <button class="send-btn" id="sendBtn" onclick="sendQuestion()" title="Ask (Enter)">✦</button>
       </div>
+      <p class="input-hint">Enter to ask · Shift+Enter for new line · Follow-ups like "elaborate on point 2" work!</p>
     </div>
   </div>
   <script>
     const API_BASE = window.location.origin;
+    let isLoading  = false;
+    let sessionId  = null;      // persisted across the page session
+    let turnCount  = 0;         // how many full turns this session
+    // ── Session helpers ────────────────────────────────────────
+    function loadSession() {
+      sessionId = localStorage.getItem("rag_session_id") || null;
+    }
+    function saveSession(id) {
+      sessionId = id;
+      localStorage.setItem("rag_session_id", id);
+    }
+    function updateSessionBar() {
+      const bar   = document.getElementById("sessionBar");
+      const count = document.getElementById("turnCount");
+      if (turnCount > 0) {
+        bar.classList.add("visible");
+        count.textContent = turnCount;
+      } else {
+        bar.classList.remove("visible");
+      }
+    }
+    async function startNewConversation() {
+      if (!sessionId) return;
+      if (turnCount > 0 && !confirm("Start a new conversation? This will clear all history.")) return;
+      try {
+        await fetch(`${API_BASE}/clear`, {
+          method: "POST",
+          headers: { "Content-Type": "application/json" },
+          body: JSON.stringify({ session_id: sessionId }),
+        });
+      } catch (_) {}
+      // Reset everything
+      sessionId  = null;
+      turnCount  = 0;
+      localStorage.removeItem("rag_session_id");
+      updateSessionBar();
+      const chatWindow = document.getElementById("chatWindow");
+      chatWindow.innerHTML = `
+        <div class="welcome" id="welcomePane">
+          <div class="welcome-icon">🕊️</div>
+          <h2>"Seek, and it shall be given unto you"</h2>
+          <p>Ask any spiritual or philosophical question. Answers are drawn exclusively from the
+             Bhagavad Gita, Quran, Bible, and Guru Granth Sahib.<br><br>
+             <em style="color:var(--gold-light); font-size:.9rem;">
+               You can now ask follow-up questions — the guide remembers the conversation.
+             </em>
+          </p>
+          <div class="suggested-queries">
+            <button onclick="askSuggested(this)">What do the scriptures say about forgiveness?</button>
+            <button onclick="askSuggested(this)">How should one face fear and death?</button>
+            <button onclick="askSuggested(this)">What is the purpose of prayer and worship?</button>
+            <button onclick="askSuggested(this)">What is the nature of the soul according to each religion?</button>
+            <button onclick="askSuggested(this)">What do the scriptures teach about humility and selfless service?</button>
+          </div>
+        </div>`;
+    }
+    // ── DOM Helpers ────────────────────────────────────────────
     function getSourceClass(book) {
       const b = book.toLowerCase();
+      if (b.includes("gita"))              return "source-gita";
       if (b.includes("quran") || b.includes("koran")) return "source-quran";
       if (b.includes("bible") || b.includes("testament")) return "source-bible";
       if (b.includes("granth") || b.includes("guru")) return "source-granth";
     function autoResize(el) {
       el.style.height = "auto";
+      el.style.height = Math.min(el.scrollHeight, 130) + "px";
     }
     function formatAnswer(text) {
       text = text.replace(/\*\*(.*?)\*\*/g, "<strong>$1</strong>");
       return text.split(/\n\n+/).filter(p => p.trim()).map(p => `<p>${p.trim()}</p>`).join("");
     }
+    function escapeHtml(str) {
+      return str.replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;");
+    }
+    // ── Message rendering ──────────────────────────────────────
+    function appendUserMessage(question, isFollowup) {
       const w = document.getElementById("chatWindow");
       const div = document.createElement("div");
       div.className = "message message-user";
+      const pill = isFollowup
+        ? `<span class="followup-pill">follow-up</span>`
+        : "";
       div.innerHTML = `
+        <span class="msg-label">You${pill}</span>
         <div class="msg-bubble">${escapeHtml(question)}</div>
       `;
       w.appendChild(div);
       return div;
     }
+    function renderSourcesInPlace(container, sources) {
+      const sourceTags = (sources || []).map(s => {
         const cls = getSourceClass(s.book);
+        return `<span class="source-tag ${cls}" title="${escapeHtml(s.snippet || '')}">📖 ${escapeHtml(s.book)}</span>`;
       }).join("");
+      if (sourceTags) {
+        container.innerHTML = `
+          <div class="sources">
+            <div class="sources-label">Citations</div>
+            <div class="source-tags">${sourceTags}</div>
+          </div>`;
+      }
     }
+    // ── Core send flow ─────────────────────────────────────────
     async function sendQuestion() {
       if (isLoading) return;
+      const input    = document.getElementById("questionInput");
       const question = input.value.trim();
       if (!question) return;
       hideWelcome();
+      const isFollowup = turnCount > 0;
       isLoading = true;
       document.getElementById("sendBtn").disabled = true;
       input.value = "";
       input.style.height = "auto";
+      appendUserMessage(question, isFollowup);
       const loadingEl = appendLoading();
       try {
+        const payload = { question };
+        if (sessionId) payload.session_id = sessionId;
         const res = await fetch(`${API_BASE}/ask`, {
+          method:  "POST",
           headers: { "Content-Type": "application/json" },
+          body:    JSON.stringify(payload),
         });
         if (!res.ok) {
           throw new Error(err.detail || "Server error");
         }
+        // Capture session ID returned by the server
+        const returnedSession = res.headers.get("X-Session-Id");
+        if (returnedSession) saveSession(returnedSession);
+        // Set up streaming bubble
         loadingEl.innerHTML = `
+          <span class="msg-label">Sacred Texts</span>
+          <div class="msg-bubble" id="currentStreamingMsg">
+            <div class="loading-text">The scriptures are being revealed
+              <span class="thinking-dots"><span></span><span></span><span></span></span>
+            </div>
+          </div>
+          <div id="currentStreamingSources"></div>`;
+        const bubble           = document.getElementById("currentStreamingMsg");
         const sourcesContainer = document.getElementById("currentStreamingSources");
+        let fullAnswer         = "";
+        let buffer             = "";
+        let firstToken         = false;
+        const reader  = res.body.getReader();
+        const decoder = new TextDecoder();
         while (true) {
           const { done, value } = await reader.read();
           if (done) break;
           buffer += decoder.decode(value, { stream: true });
           const lines = buffer.split("\n");
+          buffer = lines.pop();   // keep incomplete line in buffer
           for (const line of lines) {
             if (!line.trim()) continue;
               const parsed = JSON.parse(line);
               if (parsed.type === "token") {
+                if (!firstToken) { bubble.innerHTML = ""; firstToken = true; }
                 fullAnswer += parsed.data;
                 bubble.innerHTML = formatAnswer(fullAnswer);
                 scrollToBottom();
               }
               else if (parsed.type === "sources") {
+                renderSourcesInPlace(sourcesContainer, parsed.data);
               }
               else if (parsed.type === "cache") {
                 bubble.innerHTML = formatAnswer(parsed.data.answer);
                 scrollToBottom();
               }
             } catch (e) {
+              console.warn("Stream parse error:", e);
             }
           }
         }
+        // Increment turn counter
+        turnCount++;
+        updateSessionBar();
+        // Clean up streaming IDs
         bubble.removeAttribute("id");
         sourcesContainer.removeAttribute("id");
       } catch (err) {
+        loadingEl.innerHTML = `
+          <span class="msg-label">Error</span>
+          <div class="error-bubble">⚠️ ${escapeHtml(err.message)}</div>`;
+        scrollToBottom();
       } finally {
         isLoading = false;
         document.getElementById("sendBtn").disabled = false;
       }
     }
     function askSuggested(btn) {
       const input = document.getElementById("questionInput");
+      input.value = btn.textContent.trim();
       autoResize(input);
       sendQuestion();
     }
         sendQuestion();
       }
     }
+    // ── Init ───────────────────────────────────────────────────
+    loadSession();
   </script>
 </body>
 </html>

rag_chain.py CHANGED Viewed

@@ -1,43 +1,38 @@
 """
-rag_chain.py — Core RAG chain using LangChain + Gemini.
-KEY FIX: Uses per-book retrieval (guaranteed slots per scripture) instead of
-a single similarity search — so no book gets starved from the context window
-when the query is semantically closer to another book's language.
-This module exposes a single function:
-    answer = query_sacred_texts(user_question)
-Returns a dict with:
-    {
-        "answer": "...",
-        "sources": [
-            {"book": "Bhagavad Gita", "page": 42, "snippet": "..."},
-            ...
-        ]
-    }
 """
 import os
-from pydoc import doc
 from dotenv import load_dotenv
 from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, ChatNVIDIA, NVIDIARerank
 from langchain_chroma import Chroma
-from langchain_core.prompts import ChatPromptTemplate
 from langchain_core.output_parsers import StrOutputParser
 from langchain_community.retrievers import BM25Retriever
 from langchain_classic.retrievers import EnsembleRetriever, ContextualCompressionRetriever
-load_dotenv()
-import json
-NVIDIA_API_KEY = os.getenv("NVIDIA_API_KEY")
-CHROMA_DB_PATH = os.getenv("CHROMA_DB_PATH", "./chroma_db")
-COLLECTION_NAME = os.getenv("COLLECTION_NAME", "sacred_texts")
-# Chunks retrieved PER BOOK — guarantees every scripture contributes to the answer
-CHUNKS_PER_BOOK = int(os.getenv("CHUNKS_PER_BOOK", "3"))
-# All books currently in the knowledge base — add new books here as you ingest them
 KNOWN_BOOKS = [
     "Bhagavad Gita",
     "Quran",
@@ -45,8 +40,32 @@ KNOWN_BOOKS = [
     "Guru Granth Sahib",
 ]
-# Create a separate collection for semantic cache
-CACHE_COLLECTION = "semantic_cache"
 # ─── System Prompt ────────────────────────────────────────────────────────────
@@ -62,6 +81,10 @@ STRICT RULES you must ALWAYS follow:
    address EACH of those books separately, then synthesise the common thread.
 6. Be respectful and neutral toward all faiths — treat each text with equal reverence.
 7. Do NOT speculate, invent verses, or add information beyond the context.
 FORMAT your response as:
 - A clear, thoughtful answer (2–4 paragraphs)
@@ -73,8 +96,6 @@ Context passages from the sacred texts (guaranteed passages from each book):
 ────────────────────────────────────────
 """
-HUMAN_PROMPT = "Question: {question}"
 # ─── Embeddings & Vector Store ────────────────────────────────────────────────
@@ -94,42 +115,17 @@ def get_vector_store(embeddings):
     )
-# ─── Per-Book Retrieval ───────────────────────────────────────────────────────
-def get_reranked_retriever(base_retriever):
-    """
-    Wraps your Hybrid/Per-Book retriever with a Reranking layer.
-    """
-    # 1. Initialize the NVIDIA Reranker (NIM or API Catalog)
-    # Using nvidia/llama-3.2-nv-rerankqa-1b-v2 or similar
-    reranker = NVIDIARerank(
-        model="nvidia/llama-3.2-nv-rerankqa-1b-v2",
-        api_key=NVIDIA_API_KEY,
-        top_n=5 # Only send the top 5 most relevant chunks to the LLM
-    )
-    # 2. Wrap the base retriever
-    compression_retriever = ContextualCompressionRetriever(
-        base_compressor=reranker,
-        base_retriever=base_retriever
-    )
-    return compression_retriever
 def retrieve_per_book(question: str, vector_store: Chroma) -> list:
     """
-    Retrieve CHUNKS_PER_BOOK chunks from EACH known book independently,
-    using a metadata filter. This guarantees every scripture is represented
-    in the context — no book can be crowded out by higher-scoring chunks
-    from another book.
     """
     all_candidates = []
-    # Detect if user is asking about a specific book
-    target_books = []
     question_lower = question.lower()
-    # Check for keywords in the question
     if any(kw in question_lower for kw in ["gita", "bhagavad", "hindu", "hinduism"]):
         target_books.append("Bhagavad Gita")
     if any(kw in question_lower for kw in ["quran", "koran", "islam", "muslim", "muhammad"]):
@@ -138,63 +134,52 @@ def retrieve_per_book(question: str, vector_store: Chroma) -> list:
         target_books.append("Bible")
     if any(kw in question_lower for kw in ["granth", "guru", "sikh", "sikhism", "nanak"]):
         target_books.append("Guru Granth Sahib")
-    # If no specific book is detected, use all books
     books_to_search = target_books if target_books else KNOWN_BOOKS
     print(f"🎯 Routing query to: {books_to_search}")
     for book in books_to_search:
         try:
-            # Increase k for the base retrieval to 10
-            CANDIDATE_COUNT = 10
-            # Get the full collection of documents for this book to build BM25
-            # For small demo, we can pull into memory; for larger corpora, consider a more efficient approach
             book_data = vector_store.get(where={"book": book})
-            book_docs = []
-            from langchain_core.documents import Document
-            book_docs = [Document(page_content=d, metadata=m)
-                         for d, m in zip(book_data["documents"], book_data["metadatas"])]
             if not book_docs:
                 continue
-            # Setup BM25
             bm25_retriever = BM25Retriever.from_documents(book_docs)
             bm25_retriever.k = CANDIDATE_COUNT
-            # Setup vector retriever
-            vector_retriever = vector_store.as_retriever(search_kwargs={"k": CANDIDATE_COUNT, "filter": {"book": book}})
-            #  Combine into ensemble retriever
-            ensemble_retriver = EnsembleRetriever(retrievers=[bm25_retriever, vector_retriever], weights=[0.5, 0.5])
-            # Colect candidates without reranking yet
-            book_candidates = ensemble_retriver.invoke(question)
             all_candidates.extend(book_candidates)
-            print(f"  📦 {book}: Found {len(book_candidates)} candidates")
         except Exception as e:
             print(f"  ❌  {book}: retrieval error — {e}")
-    # Rerank the entire pool at once
     if not all_candidates:
         return []
     print(f"🚀 Reranking {len(all_candidates)} total candidates...")
     reranker = NVIDIARerank(
-        model="nvidia/llama-3.2-nv-rerankqa-1b-v2",
         api_key=NVIDIA_API_KEY,
-        top_n=5 # Final count for LLM context
     )
-    # Use the reranker directly to compress the full list
-    final_docs = reranker.compress_documents(all_candidates, question)
     for i, doc in enumerate(final_docs):
         score = doc.metadata.get("relevance_score", "N/A")
         print(f"Rank {i+1} [{doc.metadata['book']}]: Score {score}")
@@ -205,11 +190,6 @@ def retrieve_per_book(question: str, vector_store: Chroma) -> list:
 # ─── Format Retrieved Docs ────────────────────────────────────────────────────
 def format_docs(docs: list) -> str:
-    """
-    Format retrieved documents grouped by book for clarity.
-    Each chunk is labelled with book and page number.
-    """
-    # Group by book to keep context readable
     by_book: dict[str, list] = {}
     for doc in docs:
         book = doc.metadata.get("book", "Unknown")
@@ -220,19 +200,16 @@ def format_docs(docs: list) -> str:
         header = f"═══ {book} ═══"
         chunks = []
         for i, doc in enumerate(book_docs, 1):
-            page = doc.metadata.get("page", "?")
-            ch = doc.metadata.get("chapter")
-            vs = doc.metadata.get("verse")
             ang = doc.metadata.get("ang")
-            # Create a clean citation string
             if ang:
                 citation = f"Ang {ang}"
             elif ch and vs:
                 citation = f"{ch}:{vs}"
             else:
                 citation = f"Page {doc.metadata.get('page', '?')}"
-            chunks.append(f"  [{i}] ({citation}): {doc.page_content.strip()}")
         sections.append(header + "\n" + "\n\n".join(chunks))
     return "\n\n".join(sections)
@@ -241,8 +218,7 @@ def format_docs(docs: list) -> str:
 # ─── Build the RAG Chain ──────────────────────────────────────────────────────
 def build_chain():
-    """Build and return the LLM chain and vector store."""
-    embeddings = get_embeddings()
     vector_store = get_vector_store(embeddings)
     llm = ChatNVIDIA(
@@ -253,137 +229,141 @@ def build_chain():
         max_output_tokens=2048,
     )
     prompt = ChatPromptTemplate.from_messages([
         ("system", SYSTEM_PROMPT),
-        ("human", HUMAN_PROMPT),
     ])
-    # Chain: prompt → LLM → string output
-    # (retrieval is handled manually in query_sacred_texts for per-book control)
     llm_chain = prompt | llm | StrOutputParser()
     return llm_chain, vector_store
-# ─── Public API ───────────────────────────────────────────────────────────────
-_llm_chain = None
 _vector_store = None
-def query_sacred_texts(question: str):
-    """
-    Query the sacred texts knowledge base with guaranteed per-book retrieval.
-    Args:
-        question: The user's spiritual/philosophical question.
-    Returns:
-        {
-            "answer": str,
-            "sources": list[dict]   # [{book, page, snippet}, ...]
-        }
     """
     global _llm_chain, _vector_store
     if _llm_chain is None:
         print("🔧  Initialising RAG chain (first call)...")
         _llm_chain, _vector_store = build_chain()
-    # --- Semantic cache check ---
-    cache_coll = _vector_store._client.get_or_create_collection(CACHE_COLLECTION)
-    cache_results = cache_coll.query(
-        query_texts=[question],
-        n_results=1
-    )
-    THRESHOLD = 0.35
-    # FIXED: Added check for cache_results['ids'] and ensuring distances is not empty
-    if cache_results['ids'] and cache_results['ids'][0]:
-        distance = cache_results['distances'][0][0]
-        if distance < THRESHOLD:  # Similarity threshold
-            print(f"⚡️ Semantic Cache Hit! (Distance: {distance:.4f})")
-            yield json.dumps({"type": "cache","data": json.loads(cache_results['metadatas'][0][0]['response_json'])}) + "\n"
-            return
-    # Step 1: Retrieve per-book (guaranteed slots for every scripture)
-    print(f"\n🔍  Retrieving {CHUNKS_PER_BOOK} chunks per book for: '{question}'")
-    source_docs = retrieve_per_book(question, _vector_store)
     if not source_docs:
         yield json.dumps({"type": "token", "data": "No content found in the knowledge base."}) + "\n"
         return
-    # 3. Step 2: Format sources for the UI immediately
-    seen_sources = set()
     sources = []
     for doc in source_docs:
         book = doc.metadata.get("book", "Unknown")
-        ch = doc.metadata.get("chapter")
-        vs = doc.metadata.get("verse")
-        ang = doc.metadata.get("ang")
         if ang:
             cite_val = f"Ang {ang}"
         elif ch and vs:
             cite_val = f"{ch}:{vs}"
         else:
             cite_val = f"p. {doc.metadata.get('page', '?')}"
         display_name = f"{book} {cite_val}"
         snippet = doc.page_content[:200].strip() + "..."
         if display_name not in seen_sources:
             seen_sources.add(display_name)
-            print("Display name:", display_name)
-            print("Page:", cite_val)
             sources.append({"book": display_name, "page": cite_val, "snippet": snippet})
-    # Step 2: Format context grouped by book
-    context = format_docs(source_docs)
-    full_answer =""
-    # Step 3: Stream from the chain:
-    for chunk in _llm_chain.invoke({"context": context, "question": question}):
         full_answer += chunk
-        yield json.dumps({"type": "token", "data": chunk}) + "\n"  # Stream the answer as it's generated
-    # Filter sources to only those the LLM actually referenced
-    final_sources = []
-    ansnwer_lower = full_answer.lower()
-    for s in sources:
-        if s["book"].lower() in ansnwer_lower:
-            final_sources.append(s)
-    # If the LLM didn't explicitly reference any sources, we can optionally include all retrieved ones or none
-    display_sources = final_sources if final_sources else []
-    # Step 4: After streaming is done, save to semantic cache for future similar queries
-    result = {
-        "answer": full_answer,
-        "sources": display_sources,
-    }
-    cache_coll.add(
-        documents=[question],
-        metadatas=[{"response_json": json.dumps(result)}],
-        ids=[question]
-    )
-    # Send sources as a final message after the answer is fully streamed
     yield json.dumps({"type": "sources", "data": sources}) + "\n"
 # ─── Quick CLI Test ───────────────────────────────────────────────────────────
 if __name__ == "__main__":
-    test_q = "In what aspects do the Quran and Gita teach the same thing?"
     print(f"\n🔍  Test query: {test_q}\n")
-    result = query_sacred_texts(test_q)
-    print("📝  Answer:\n")
-    print(result["answer"])
-    print("\n📚  Sources retrieved:")
-    for s in result["sources"]:
-        print(f"  - {s['book']} (page {s['page']})")

 """
+rag_chain.py — Core RAG chain using LangChain + NVIDIA.
+KEY FEATURES:
+- Per-book retrieval (guaranteed slots per scripture)
+- Hybrid BM25 + vector search with NVIDIA reranking
+- Semantic cache for repeated/similar questions
+- Multi-turn conversation memory (session-based ConversationBufferMemory)
+Public API:
+    query_sacred_texts(question, session_id) -> Generator[str, None, None]
+    clear_session(session_id)
 """
 import os
+import json
 from dotenv import load_dotenv
 from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, ChatNVIDIA, NVIDIARerank
 from langchain_chroma import Chroma
+from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
 from langchain_core.output_parsers import StrOutputParser
+from langchain_core.messages import HumanMessage, AIMessage
 from langchain_community.retrievers import BM25Retriever
 from langchain_classic.retrievers import EnsembleRetriever, ContextualCompressionRetriever
+from langchain_core.documents import Document
+load_dotenv()
+NVIDIA_API_KEY    = os.getenv("NVIDIA_API_KEY")
+CHROMA_DB_PATH    = os.getenv("CHROMA_DB_PATH", "./chroma_db")
+COLLECTION_NAME   = os.getenv("COLLECTION_NAME", "sacred_texts")
+CHUNKS_PER_BOOK   = int(os.getenv("CHUNKS_PER_BOOK", "3"))
+CACHE_COLLECTION  = "semantic_cache"
+MAX_HISTORY_TURNS = int(os.getenv("MAX_HISTORY_TURNS", "6"))   # last N human+AI pairs kept
 KNOWN_BOOKS = [
     "Bhagavad Gita",
     "Quran",
     "Guru Granth Sahib",
 ]
+# ─── In-memory session store ──────────────────────────────────────────────────
+# { session_id: [HumanMessage | AIMessage, ...] }
+_session_store: dict[str, list] = {}
+def get_history(session_id: str) -> list:
+    return _session_store.get(session_id, [])
+def append_turn(session_id: str, human_msg: str, ai_msg: str):
+    history = _session_store.setdefault(session_id, [])
+    history.append(HumanMessage(content=human_msg))
+    history.append(AIMessage(content=ai_msg))
+    # Trim to last MAX_HISTORY_TURNS pairs (each pair = 2 messages)
+    if len(history) > MAX_HISTORY_TURNS * 2:
+        _session_store[session_id] = history[-(MAX_HISTORY_TURNS * 2):]
+def clear_session(session_id: str):
+    """Wipe the conversation history for a session."""
+    _session_store.pop(session_id, None)
+def list_sessions() -> list[str]:
+    return list(_session_store.keys())
 # ─── System Prompt ────────────────────────────────────────────────────────────
    address EACH of those books separately, then synthesise the common thread.
 6. Be respectful and neutral toward all faiths — treat each text with equal reverence.
 7. Do NOT speculate, invent verses, or add information beyond the context.
+8. You have access to the conversation history. Use it to:
+   - Understand follow-up questions (e.g. "elaborate on the second point", "what about the Bible?")
+   - Maintain continuity across turns without repeating yourself unnecessarily
+   - Resolve pronouns and references ("it", "that teaching", "the verse you mentioned") from history
 FORMAT your response as:
 - A clear, thoughtful answer (2–4 paragraphs)
 ────────────────────────────────────────
 """
 # ─── Embeddings & Vector Store ────────────────────────────────────────────────
     )
+# ─── Per-Book Hybrid Retrieval ────────────────────────────────────────────────
 def retrieve_per_book(question: str, vector_store: Chroma) -> list:
     """
+    Retrieve CHUNKS_PER_BOOK chunks from EACH known book independently using
+    a hybrid BM25+vector ensemble, then rerank the pooled candidates.
     """
     all_candidates = []
     question_lower = question.lower()
+    target_books = []
     if any(kw in question_lower for kw in ["gita", "bhagavad", "hindu", "hinduism"]):
         target_books.append("Bhagavad Gita")
     if any(kw in question_lower for kw in ["quran", "koran", "islam", "muslim", "muhammad"]):
         target_books.append("Bible")
     if any(kw in question_lower for kw in ["granth", "guru", "sikh", "sikhism", "nanak"]):
         target_books.append("Guru Granth Sahib")
     books_to_search = target_books if target_books else KNOWN_BOOKS
     print(f"🎯 Routing query to: {books_to_search}")
+    CANDIDATE_COUNT = 10
     for book in books_to_search:
         try:
             book_data = vector_store.get(where={"book": book})
+            book_docs = [
+                Document(page_content=d, metadata=m)
+                for d, m in zip(book_data["documents"], book_data["metadatas"])
+            ]
             if not book_docs:
                 continue
             bm25_retriever = BM25Retriever.from_documents(book_docs)
             bm25_retriever.k = CANDIDATE_COUNT
+            vector_retriever = vector_store.as_retriever(
+                search_kwargs={"k": CANDIDATE_COUNT, "filter": {"book": book}}
+            )
+            ensemble = EnsembleRetriever(
+                retrievers=[bm25_retriever, vector_retriever],
+                weights=[0.5, 0.5],
+            )
+            book_candidates = ensemble.invoke(question)
             all_candidates.extend(book_candidates)
+            print(f"  📦 {book}: {len(book_candidates)} candidates")
         except Exception as e:
             print(f"  ❌  {book}: retrieval error — {e}")
     if not all_candidates:
         return []
     print(f"🚀 Reranking {len(all_candidates)} total candidates...")
     reranker = NVIDIARerank(
+        model="nvidia/llama-3.2-nv-rerankqa-1b-v2",
         api_key=NVIDIA_API_KEY,
+        top_n=5,
     )
+    final_docs = reranker.compress_documents(all_candidates, question)
     for i, doc in enumerate(final_docs):
         score = doc.metadata.get("relevance_score", "N/A")
         print(f"Rank {i+1} [{doc.metadata['book']}]: Score {score}")
 # ─── Format Retrieved Docs ────────────────────────────────────────────────────
 def format_docs(docs: list) -> str:
     by_book: dict[str, list] = {}
     for doc in docs:
         book = doc.metadata.get("book", "Unknown")
         header = f"═══ {book} ═══"
         chunks = []
         for i, doc in enumerate(book_docs, 1):
             ang = doc.metadata.get("ang")
+            ch  = doc.metadata.get("chapter")
+            vs  = doc.metadata.get("verse")
             if ang:
                 citation = f"Ang {ang}"
             elif ch and vs:
                 citation = f"{ch}:{vs}"
             else:
                 citation = f"Page {doc.metadata.get('page', '?')}"
+            chunks.append(f"  [{i}] ({citation}): {doc.page_content.strip()}")
         sections.append(header + "\n" + "\n\n".join(chunks))
     return "\n\n".join(sections)
 # ─── Build the RAG Chain ──────────────────────────────────────────────────────
 def build_chain():
+    embeddings   = get_embeddings()
     vector_store = get_vector_store(embeddings)
     llm = ChatNVIDIA(
         max_output_tokens=2048,
     )
+    # Prompt now includes a chat-history placeholder so prior turns are visible
     prompt = ChatPromptTemplate.from_messages([
         ("system", SYSTEM_PROMPT),
+        MessagesPlaceholder(variable_name="history"),   # ← injected per-request
+        ("human", "{question}"),
     ])
     llm_chain = prompt | llm | StrOutputParser()
     return llm_chain, vector_store
+# ─── Singleton init ───────────────────────────────────────────────────────────
+_llm_chain    = None
 _vector_store = None
+# ─── Public API ───────────────────────────────────────────────────────────────
+def query_sacred_texts(question: str, session_id: str = "default"):
+    """
+    Stream an answer grounded in the sacred texts, maintaining per-session
+    conversation history for natural follow-up questions.
+    Yields JSON-lines of the form:
+        {"type": "token",   "data": "<chunk>"}
+        {"type": "sources", "data": [...]}
+        {"type": "cache",   "data": {"answer": "...", "sources": [...]}}
     """
     global _llm_chain, _vector_store
     if _llm_chain is None:
         print("🔧  Initialising RAG chain (first call)...")
         _llm_chain, _vector_store = build_chain()
+    # ── Semantic cache check (skip for follow-ups that reference history) ──
+    history = get_history(session_id)
+    is_followup = len(history) > 0
+    if not is_followup:
+        cache_coll = _vector_store._client.get_or_create_collection(CACHE_COLLECTION)
+        cache_results = cache_coll.query(query_texts=[question], n_results=1)
+        THRESHOLD = 0.35
+        if cache_results["ids"] and cache_results["ids"][0]:
+            distance = cache_results["distances"][0][0]
+            if distance < THRESHOLD:
+                print(f"⚡️ Semantic Cache Hit! (Distance: {distance:.4f})")
+                cached = json.loads(cache_results["metadatas"][0][0]["response_json"])
+                # Store this cache hit in session memory too
+                append_turn(session_id, question, cached["answer"])
+                yield json.dumps({"type": "cache", "data": cached}) + "\n"
+                return
+    # ── Retrieval ──────────────────────────────────────────────────────────
+    # For follow-ups, augment the question with the last human turn for better
+    # semantic search (the follow-up itself may be too short/vague)
+    retrieval_query = question
+    if is_followup and len(question.split()) < 8:
+        last_human = next(
+            (m.content for m in reversed(history) if isinstance(m, HumanMessage)), ""
+        )
+        retrieval_query = f"{last_human} {question}".strip()
+        print(f"🔁 Follow-up detected — augmented retrieval query: '{retrieval_query}'")
+    print(f"\n🔍  Retrieving chunks for: '{retrieval_query}'")
+    source_docs = retrieve_per_book(retrieval_query, _vector_store)
     if not source_docs:
         yield json.dumps({"type": "token", "data": "No content found in the knowledge base."}) + "\n"
         return
+    # ── Build sources list ─────────────────────────────────────────────────
+    seen_sources: set[str] = set()
     sources = []
     for doc in source_docs:
         book = doc.metadata.get("book", "Unknown")
+        ang  = doc.metadata.get("ang")
+        ch   = doc.metadata.get("chapter")
+        vs   = doc.metadata.get("verse")
         if ang:
             cite_val = f"Ang {ang}"
         elif ch and vs:
             cite_val = f"{ch}:{vs}"
         else:
             cite_val = f"p. {doc.metadata.get('page', '?')}"
         display_name = f"{book} {cite_val}"
         snippet = doc.page_content[:200].strip() + "..."
         if display_name not in seen_sources:
             seen_sources.add(display_name)
             sources.append({"book": display_name, "page": cite_val, "snippet": snippet})
+    context   = format_docs(source_docs)
+    full_answer = ""
+    # ── Stream LLM response (history injected here) ────────────────────────
+    for chunk in _llm_chain.stream({
+        "context":  context,
+        "question": question,
+        "history":  history,          # ← the conversation so far
+    }):
         full_answer += chunk
+        yield json.dumps({"type": "token", "data": chunk}) + "\n"
+    # ── Filter sources to those actually cited in the answer ───────────────
+    answer_lower = full_answer.lower()
+    final_sources = [s for s in sources if s["book"].lower() in answer_lower] or []
+    # ── Persist this turn into session memory ─────────────────────────────
+    append_turn(session_id, question, full_answer)
+    print(f"💾 Session '{session_id}': {len(get_history(session_id)) // 2} turn(s) stored")
+    # ── Cache first-turn answers only ─────────────────────────────────────
+    if not is_followup:
+        result_to_cache = {"answer": full_answer, "sources": final_sources}
+        try:
+            cache_coll = _vector_store._client.get_or_create_collection(CACHE_COLLECTION)
+            cache_coll.add(
+                documents=[question],
+                metadatas=[{"response_json": json.dumps(result_to_cache)}],
+                ids=[question],
+            )
+        except Exception as e:
+            print(f"⚠️  Cache write failed: {e}")
     yield json.dumps({"type": "sources", "data": sources}) + "\n"
 # ─── Quick CLI Test ───────────────────────────────────────────────────────────
 if __name__ == "__main__":
+    test_q = "What do the scriptures say about forgiveness?"
     print(f"\n🔍  Test query: {test_q}\n")
+    for line in query_sacred_texts(test_q, session_id="cli-test"):
+        obj = json.loads(line)
+        if obj["type"] == "token":
+            print(obj["data"], end="", flush=True)
+    print("\n")