Spaces:

Shouvik99
/

LifeGuide

Sleeping

App Files Files Community

Shouvik599 commited on 22 days ago

Commit

56da115

1 Parent(s): 3611fcd

Added the multi turn conversation feature

Browse files

Files changed (4) hide show

README.md +314 -41
app.py +127 -42
frontend/index.html +302 -437
rag_chain.py +178 -198

README.md CHANGED Viewed

@@ -1,3 +1,190 @@
 ---
 title: Sacred Texts RAG
 emoji: 🕊️
@@ -7,52 +194,52 @@ sdk: docker
 app_port: 7860
 pinned: false
 ---
 # 🕊️ Sacred Texts RAG — Multi-Religion Knowledge Base
-A Retrieval-Augmented Generation (RAG) application that answers spiritual queries using Bhagavad Gita, Quran, Bible and the Guru Granth Sahib as the sole knowledge sources.
 ---
 ## 📁 Project Structure
 ```
 sacred-texts-rag/
 ├── README.md
 ├── requirements.txt
 ├── .env.example
 ├── ingest.py               # Step 1: Load PDFs → chunk → embed → store
-├── rag_chain.py            # Core RAG chain logic
 ├── app.py                  # FastAPI backend server
 └── frontend/
-    └── index.html          # Chat UI (open in browser)
 ```
 ---
 ## ⚙️ Setup Instructions
 ### 1. Install Dependencies
 ```bash
 pip install -r requirements.txt
 ```
 ### 2. Configure Environment
 ```bash
 cp .env.example .env
 # Edit .env and add your NVIDIA_API_KEY
 ```
 ### 3. Add Your PDF Books
 Place your PDF files in a `books/` folder:
 ```
 books/
 ├── bhagavad_gita.pdf
 ├── quran.pdf
-└── bible.pdf
 └── guru_granth_sahib.pdf
 ```
 ### 4. Ingest the Books (Run Once)
 ```bash
 python ingest.py
@@ -62,58 +249,144 @@ This will:
 - Split into semantic chunks
 - Create embeddings using NVIDIA's `llama-nemotron-embed-vl-1b-v2` model
 - Store in a local ChromaDB vector store (`./chroma_db/`)
 ### 5. Start the Backend
 ```bash
 python app.py
 ```
-Server runs at: `http://localhost:8000`
 ### 6. Open the Frontend
-Open `frontend/index.html` in your browser — no server needed for the UI.
 ---
 ## 🔑 Environment Variables
-| Variable | Description |
-|---|---|
-| `NVIDIA_API_KEY` | Your NVIDIA API key |
-| `CHROMA_DB_PATH` | Path to ChromaDB storage (default: `./chroma_db`) |
-| `CHUNKS_PER_BOOK` | Number of chunks to retrieve per query (default: `3`) |
 ---
 ## 🧠 How It Works
 ```
 User Query
     │
     ▼
-[Embedding Model]  ←── NVIDIA llama-nemotron-embed-vl-1b-v2
     │
     ▼
-[ChromaDB Vector Store]  ←── Semantic similarity search
-    │  (retrieves top-K chunks from Gita, Quran, Bible, and the Guru Granth Sahib)
     │
     ▼
-[Prompt with Context]
     │
     ▼
 [Llama-3.3-70b-instruct]  ←── Answer grounded ONLY in retrieved texts
     │
     ▼
-Response with source citations (book + chapter/verse)
 ```
 ---
 ## 📝 Notes
 - The LLM is instructed **never** to answer from outside the provided texts
-- Each response includes **source citations** (which book the answer came from)
 - Responses synthesize wisdom **across all books** when relevant
 ## 🎬 Demo
-App Link : https://shouvik99-lifeguide.hf.space/

+Can you connect to my github account i have added as a connector
+12:50 AM
+Connector search
+1 connector
+Microsoft Learn
+It looks like GitHub isn't showing up as a connected or available connector in the registry right now. A few things to check:
+Make sure it's enabled in the Tools menu — Click the tools/connector icon in the chat interface and confirm GitHub is toggled on for this conversation.
+Re-authenticate — Sometimes connectors need to be reconnected. Go to your connector settings and try reconnecting your GitHub account.
+Check the connector was added to the right workspace — If you added it under a different account or session, it may not carry over.
+Once it's connected and enabled, I'll be able to interact with your GitHub repos, issues, PRs, and more. Let me know if you run into any trouble setting it up!
+Shouvik599/LifeGuide
+main
+GITHUB
+from the mentioned list of features to add, choose and and work on it
+12:51 AM
+Looking at your features_to_add.txt, let me pick the best candidate based on impact vs. implementation effort for your current stack.
+The strongest choice is Multi-turn conversation — it directly improves UX, fits naturally into your existing streaming architecture, and the LangChain ConversationBufferMemory pattern slots cleanly into rag_chain.py without touching ingestion or the vector store.
+Let me also check what's in your features list more carefully before diving in.
+The features listed are:
+Contextual chunk expansion
+HyDE (Hypothetical Document Embedding)
+Multi-turn conversation ← best ROI right now
+Answer faithfulness scoring
+Query rewriting
+Multi-language support
+Incremental ingestion
+Book versioning
+Snippet preview on hover
+Query suggestions
+Topic explorer
+Compare mode
+Hallucination guardrail
+Out-of-scope detection
+Rate limiting
+API key security
+I'll implement Multi-turn conversation — here's what changes:
+rag_chain.py — session-based ConversationBufferMemory, history injected into the prompt
+app.py — session ID via header/cookie, memory store per session
+frontend/index.html — passes session_id, shows conversation continuity
+Let me build it:
+Check skills available
+Check skills available
+Now let me write the updated files. Starting with rag_chain.py:
+Created 2 files
+Created 2 files
+Now the frontend — this is the most involved change. Adding session persistence, a "New Conversation" button, and follow-up UX cues:
+Created a file, ran a command, read a file
+Created a file, ran a command, read a file
+Updated frontend index.html with multi-turn conversation support
+index.html
+Copy files to outputs
+Script
+Presented 3 files
+Done
+Here's everything that changed and why:
+rag_chain.py
+The core addition is an in-memory _session_store dict mapping session_id → [HumanMessage, AIMessage, ...]. The prompt template now uses MessagesPlaceholder(variable_name="history") so prior turns get injected directly into the LLM's context. Two important nuances: for follow-up questions that are too short/vague (under 8 words like "elaborate on point 2"), the retrieval query is augmented with the previous human turn so ChromaDB can find relevant chunks. The semantic cache is also bypassed for follow-ups since the same question in a different context should retrieve fresh context. History is trimmed to the last 6 turns (configurable via MAX_HISTORY_TURNS in .env) to avoid context window bloat.
+app.py
+Three new endpoints: POST /clear to reset a session's history, GET /history for debugging/inspection. The /ask endpoint now accepts an optional session_id in the request body and also reads/sets it via an httponly cookie as fallback. The session ID is echoed back in an X-Session-Id response header so the frontend can capture it.
+frontend/index.html
+The session ID is persisted to localStorage across page reloads. A session bar appears in the header once a conversation starts showing the turn count and a "↺ New Conversation" button that calls /clear and resets the UI. User messages get a subtle follow-up pill badge after turn 1, and the textarea placeholder changes to hint that follow-ups work. The session_id is included in every /ask payload.
+Rag chain
+PY
+App
+PY
+Index
+Code · HTML
+create a feature branch and push the changes to that branch
+12:57 AM
+Connector search
+1 connector
+Microsoft Learn
+GitHub isn't available as a connected MCP tool, so I can't push directly. However, I can give you the exact commands to do it yourself — it'll take about 30 seconds.
+bash
+# 1. Navigate to your project root
+cd /path/to/sacred-texts-rag
+# 2. Make sure you're on main and up to date
+git checkout main
+git pull origin main
+# 3. Create and switch to the feature branch
+git checkout -b feature/multi-turn-conversation
+# 4. Copy the updated files in (if you downloaded them)
+#    Or just overwrite in-place if you edited directly
+# 5. Stage the three changed files
+git add rag_chain.py app.py frontend/index.html
+# 6. Commit
+git commit -m "feat: add multi-turn conversation memory
+- Add session-based ConversationBufferMemory in rag_chain.py
+- Inject chat history via MessagesPlaceholder into LLM prompt
+- Augment retrieval query for short follow-up questions
+- Bypass semantic cache for follow-up turns
+- Add /clear and /history endpoints in app.py
+- Session ID via cookie + X-Session-Id header + request body
+- Frontend: session bar, turn counter, follow-up pill badge,
+  localStorage session persistence, New Conversation button"
+# 7. Push the branch
+git push -u origin feature/multi-turn-conversation
+Once pushed, GitHub will show a "Compare & pull request" banner on the repo page if you want to open a PR into main.
+If you want to connect your GitHub account so I can do this directly next time, you can add the GitHub connector from the Tools menu in settings and I'll be able to create branches, commit, and push on your behalf.
+Any update for the README file
+1:05 AM
+Yes, the README needs updating to reflect the new multi-turn conversation feature, the two new API endpoints, and the new environment variable. Here's the updated file:
+Created a file, ran a command, read a file
+Created a file, ran a command, read a file
+Updated README.md with multi-turn conversation feature docs
+README.md
+Copy to outputs
+Script
+Presented file
+Done
+Here's what was updated vs. the original:
+Added / changed sections:
+The intro line now mentions multi-turn conversation
+MAX_HISTORY_TURNS added to the environment variables table
+The "How It Works" diagram now shows the two new steps — Session Memory and Query Augmentation — at the top of the pipeline
+A new 💬 Multi-Turn Conversation section explains the follow-up UX with an example dialogue, how session IDs work, and the "New Conversation" button
+The API Endpoints table now includes /clear and /history, and has a proper request/response example for /ask showing the NDJSON streaming format including cache hits
+The notes section now documents the cache threshold and follow-up query augmentation behaviour
+The features list at the bottom is cleaned up and deduplicated against what's already built
+To push this along with the other files:
+bash
+git add README.md
+git commit -m "docs: update README for multi-turn conversation feature"
+git push
+Readme
+Document · MD
+You've used 90% of your session limit
+Upgrade
+Claude is AI and can make mistakes. Please double-check responses.
+Readme · MD
+Copy
 ---
 title: Sacred Texts RAG
 emoji: 🕊️
 app_port: 7860
 pinned: false
 ---
 # 🕊️ Sacred Texts RAG — Multi-Religion Knowledge Base
+A Retrieval-Augmented Generation (RAG) application that answers spiritual queries using the Bhagavad Gita, Quran, Bible, and Guru Granth Sahib as the sole knowledge sources. Now with **multi-turn conversation memory** — ask follow-up questions naturally, just like a real dialogue.
 ---
 ## 📁 Project Structure
 ```
 sacred-texts-rag/
 ├── README.md
 ├── requirements.txt
 ├── .env.example
 ├── ingest.py               # Step 1: Load PDFs → chunk → embed → store
+├── rag_chain.py            # Core RAG chain logic (with session memory)
 ├── app.py                  # FastAPI backend server
 └── frontend/
+    └── index.html          # Chat UI (served by FastAPI)
 ```
 ---
 ## ⚙️ Setup Instructions
 ### 1. Install Dependencies
 ```bash
 pip install -r requirements.txt
 ```
 ### 2. Configure Environment
 ```bash
 cp .env.example .env
 # Edit .env and add your NVIDIA_API_KEY
 ```
 ### 3. Add Your PDF Books
 Place your PDF files in a `books/` folder:
 ```
 books/
 ├── bhagavad_gita.pdf
 ├── quran.pdf
+├── bible.pdf
 └── guru_granth_sahib.pdf
 ```
 ### 4. Ingest the Books (Run Once)
 ```bash
 python ingest.py
 - Split into semantic chunks
 - Create embeddings using NVIDIA's `llama-nemotron-embed-vl-1b-v2` model
 - Store in a local ChromaDB vector store (`./chroma_db/`)
 ### 5. Start the Backend
 ```bash
 python app.py
 ```
+Server runs at: `http://localhost:7860`
 ### 6. Open the Frontend
+Navigate to `http://localhost:7860` in your browser — the FastAPI server serves the UI directly.
 ---
 ## 🔑 Environment Variables
+| Variable | Description | Default |
+|---|---|---|
+| `NVIDIA_API_KEY` | Your NVIDIA API key | — |
+| `CHROMA_DB_PATH` | Path to ChromaDB storage | `./chroma_db` |
+| `COLLECTION_NAME` | ChromaDB collection name | `sacred_texts` |
+| `CHUNKS_PER_BOOK` | Chunks retrieved per book per query | `3` |
+| `MAX_HISTORY_TURNS` | Max conversation turns kept in memory per session | `6` |
+| `HOST` | Server bind host | `0.0.0.0` |
+| `PORT` | Server port | `7860` |
 ---
 ## 🧠 How It Works
 ```
 User Query
     │
     ▼
+[Session Memory]  ←── Injects prior conversation turns into LLM context
+    │
+    ▼
+[Query Augmentation]  ←── Short follow-ups are enriched with previous question
+    │
+    ▼
+[Hybrid Retrieval: BM25 + Vector Search]  ←── Per-book guaranteed slots
     │
     ▼
+[NVIDIA Reranker]  ←── llama-3.2-nv-rerankqa-1b-v2 re-scores pooled candidates
     │
     ▼
+[Semantic Cache Check]  ←── Skip LLM if a similar question was answered before
+    │
+    ▼
+[Prompt with Context + History]
     │
     ▼
 [Llama-3.3-70b-instruct]  ←── Answer grounded ONLY in retrieved texts
     │
     ▼
+Streamed response with source citations (book + chapter/verse)
 ```
 ---
+## 💬 Multi-Turn Conversation
+The app maintains per-session conversation history so you can ask natural follow-up questions:
+```
+You:  "What do the scriptures say about forgiveness?"
+AI:   [Answer citing Gita, Quran, Bible, Guru Granth Sahib]
+You:  "Elaborate on the second point"       ← follow-up, no context needed
+AI:   [Continues from previous answer]
+You:  "What does the Bible say specifically?"  ← drill-down
+AI:   [Focuses on Bible passages from the thread]
+```
+**How sessions work:**
+- A session ID is created automatically on your first question and persisted in the browser's `localStorage`
+- The server keeps the last `MAX_HISTORY_TURNS` (default: 6) human+AI pairs in memory
+- Click **↺ New Conversation** in the header to clear history and start fresh
+- Sessions are scoped to the server process — they reset on server restart
+---
+## 🌐 API Endpoints
+| Method | Endpoint | Description |
+|---|---|---|
+| `POST` | `/ask` | Ask a question; streams NDJSON response |
+| `POST` | `/clear` | Clear conversation history for a session |
+| `GET` | `/history` | Inspect conversation history for a session |
+| `GET` | `/books` | List all books indexed in the knowledge base |
+| `GET` | `/health` | Health check |
+| `GET` | `/` | Serves the frontend UI |
+| `GET` | `/docs` | Swagger UI |
+### `/ask` Request Body
+```json
+{
+  "question": "What do the scriptures say about compassion?",
+  "session_id": "optional-uuid-string"
+}
+```
+### `/ask` Response (streamed NDJSON)
+```json
+{"type": "token",   "data": "The Bhagavad Gita teaches..."}
+{"type": "token",   "data": " compassion as..."}
+{"type": "sources", "data": [{"book": "Bhagavad Gita 2:47", "page": "2:47", "snippet": "..."}]}
+```
+Cache hits return a single `{"type": "cache", "data": {"answer": "...", "sources": [...]}}` line.
+---
 ## 📝 Notes
 - The LLM is instructed **never** to answer from outside the provided texts
+- Each response includes **source citations** (book + chapter/verse where available)
 - Responses synthesize wisdom **across all books** when relevant
+- The semantic cache skips the LLM for repeated or near-identical questions (cosine distance < 0.35)
+- Follow-up retrieval automatically augments vague short queries with the previous question for better semantic matching
+---
+## 🗺️ Planned Features
+- Contextual chunk expansion (fetch ±1 surrounding chunks)
+- HyDE — Hypothetical Document Embedding for abstract queries
+- Answer faithfulness scoring (LLM-as-judge)
+- Query rewriting for vague inputs
+- Snippet preview on source hover
+- Query suggestions after each answer
+- Compare mode — side-by-side view across books
+- Hallucination guardrail
+- Out-of-scope detection
+- Rate limiting & API key hardening
+---
 ## 🎬 Demo
+App Link: https://shouvik99-lifeguide.hf.space/

app.py CHANGED Viewed

@@ -2,7 +2,9 @@
 app.py — FastAPI backend server for the Sacred Texts RAG application.
 Endpoints:
-    POST /ask          — Ask a question, get an answer with sources
     GET  /health       — Health check
     GET  /books        — List books currently in the knowledge base
@@ -11,13 +13,20 @@ Run with:
 """
 import os
-from fastapi import FastAPI, HTTPException
 from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel, Field
 from dotenv import load_dotenv
-from fastapi.responses import StreamingResponse, FileResponse
-from rag_chain import query_sacred_texts, get_embeddings, get_vector_store  # ← FIXED
-from starlette.concurrency import run_in_threadpool
 load_dotenv()
@@ -26,34 +35,54 @@ load_dotenv()
 app = FastAPI(
     title="Sacred Texts RAG API",
     description="Ask questions answered exclusively from Bhagavad Gita, Quran, Bible, and Guru Granth Sahib",
-    version="1.0.0",
 )
-# Allow requests from the local frontend (index.html opened as file://)
 app.add_middleware(
     CORSMiddleware,
-    allow_origins=["*"],           # Restrict in production
     allow_credentials=True,
     allow_methods=["*"],
     allow_headers=["*"],
 )
 # ─── Request / Response Models ────────────────────────────────────────────────
 class AskRequest(BaseModel):
     question: str = Field(..., min_length=3, max_length=1000,
                           example="What do the scriptures say about compassion?")
-class Source(BaseModel):
-    book: str
-    page: int | str
-    snippet: str
-class AskResponse(BaseModel):
-    question: str
-    answer: str
-    sources: list[Source]
 class HealthResponse(BaseModel):
     status: str
@@ -63,49 +92,67 @@ class BooksResponse(BaseModel):
     books: list[str]
     total_chunks: int
 # ─── Routes ───────────────────────────────────────────────────────────────────
 @app.get("/health", response_model=HealthResponse, tags=["System"])
 def health_check():
-    """Check that the API is running."""
     return {"status": "ok", "message": "Sacred Texts RAG is running 🕊️"}
 @app.get("/books", response_model=BooksResponse, tags=["Knowledge Base"])
 def list_books():
-    """List all books currently indexed in the knowledge base."""
     try:
-        embeddings = get_embeddings()               # ← FIXED Step 1
-        vector_store = get_vector_store(embeddings) # ← FIXED Step 2
-        collection = vector_store._collection
-        results = collection.get(include=["metadatas"])
-        metadatas = results.get("metadatas", [])
-        books = sorted(set(
-            m.get("book", "Unknown")
-            for m in metadatas
-            if m  # guard against None
-        ))
         return {"books": books, "total_chunks": len(metadatas)}
     except Exception as e:
         raise HTTPException(status_code=500, detail=f"Could not read knowledge base: {e}")
 @app.post("/ask", tags=["Query"])
-async def ask(request: AskRequest):
     """
     Ask a spiritual or philosophical question.
-    The answer is grounded strictly in the sacred texts.
     """
-    if not request.question.strip():
         raise HTTPException(status_code=400, detail="Question cannot be empty.")
     try:
         return StreamingResponse(
-        query_sacred_texts(request.question),
-        media_type="application/json"
         )
     except FileNotFoundError:
         raise HTTPException(
@@ -115,26 +162,64 @@ async def ask(request: AskRequest):
     except Exception as e:
         raise HTTPException(status_code=500, detail=str(e))
 @app.get("/", include_in_schema=False)
 async def serve_frontend():
-    """Serves the static frontend HTML file."""
     frontend_path = "frontend/index.html"
     if os.path.exists(frontend_path):
         return FileResponse(frontend_path)
     return {"message": "Sacred Texts RAG API is live. Visit /docs for Swagger UI."}
 # ─── Entry Point ──────────────────────────────────────────────────────────────
 if __name__ == "__main__":
     import uvicorn
-    # HF Spaces uses 7860 by default
     host = os.getenv("HOST", "0.0.0.0")
-    port = int(os.getenv("PORT", "7860"))
-    print(f"\n🕊️  Sacred Texts RAG — API Server")
     print(f"{'─' * 40}")
     print(f"🌐  Running at : http://{host}:{port}")
     print(f"{'─' * 40}\n")
-    uvicorn.run("app:app", host=host, port=port, reload=False) # reload=False for production

 app.py — FastAPI backend server for the Sacred Texts RAG application.
 Endpoints:
+    POST /ask          — Ask a question, get a streamed answer with sources
+    POST /clear        — Clear conversation history for a session
+    GET  /history      — Retrieve conversation history for a session
     GET  /health       — Health check
     GET  /books        — List books currently in the knowledge base
 """
 import os
+import uuid
+from fastapi import FastAPI, HTTPException, Request, Response
 from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel, Field
 from dotenv import load_dotenv
+from fastapi.responses import StreamingResponse, FileResponse, JSONResponse
+from rag_chain import (
+    query_sacred_texts,
+    get_embeddings,
+    get_vector_store,
+    clear_session,
+    get_history,
+)
+from langchain_core.messages import HumanMessage, AIMessage
 load_dotenv()
 app = FastAPI(
     title="Sacred Texts RAG API",
     description="Ask questions answered exclusively from Bhagavad Gita, Quran, Bible, and Guru Granth Sahib",
+    version="2.0.0",
 )
 app.add_middleware(
     CORSMiddleware,
+    allow_origins=["*"],
     allow_credentials=True,
     allow_methods=["*"],
     allow_headers=["*"],
+    expose_headers=["X-Session-Id"],
 )
+SESSION_COOKIE = "rag_session_id"
+# ─── Helpers ─────────────────────────────────────────────────────────────────
+def get_or_create_session(request: Request, response: Response) -> str:
+    """
+    Read the session ID from the cookie (or X-Session-Id header).
+    If absent, generate a new one and set it on the response cookie.
+    """
+    session_id = (
+        request.cookies.get(SESSION_COOKIE)
+        or request.headers.get("X-Session-Id")
+    )
+    if not session_id:
+        session_id = str(uuid.uuid4())
+        response.set_cookie(
+            key=SESSION_COOKIE,
+            value=session_id,
+            httponly=True,
+            samesite="lax",
+            max_age=60 * 60 * 24,   # 24 hours
+        )
+    return session_id
 # ─── Request / Response Models ────────────────────────────────────────────────
 class AskRequest(BaseModel):
     question: str = Field(..., min_length=3, max_length=1000,
                           example="What do the scriptures say about compassion?")
+    session_id: str | None = Field(
+        default=None,
+        description="Optional session ID for multi-turn conversations. "
+                    "If omitted, the server reads/creates one via cookie.",
+    )
 class HealthResponse(BaseModel):
     status: str
     books: list[str]
     total_chunks: int
+class ClearRequest(BaseModel):
+    session_id: str | None = None
+class HistoryItem(BaseModel):
+    role: str          # "human" | "ai"
+    content: str
+class HistoryResponse(BaseModel):
+    session_id: str
+    turns: int
+    messages: list[HistoryItem]
 # ─── Routes ───────────────────────────────────────────────────────────────────
 @app.get("/health", response_model=HealthResponse, tags=["System"])
 def health_check():
     return {"status": "ok", "message": "Sacred Texts RAG is running 🕊️"}
 @app.get("/books", response_model=BooksResponse, tags=["Knowledge Base"])
 def list_books():
     try:
+        embeddings   = get_embeddings()
+        vector_store = get_vector_store(embeddings)
+        collection   = vector_store._collection
+        results      = collection.get(include=["metadatas"])
+        metadatas    = results.get("metadatas", [])
+        books = sorted(set(m.get("book", "Unknown") for m in metadatas if m))
         return {"books": books, "total_chunks": len(metadatas)}
     except Exception as e:
         raise HTTPException(status_code=500, detail=f"Could not read knowledge base: {e}")
 @app.post("/ask", tags=["Query"])
+async def ask(request_body: AskRequest, request: Request, response: Response):
     """
     Ask a spiritual or philosophical question.
+    Streams the answer as NDJSON (one JSON object per line).
+    Maintains per-session conversation history automatically via cookie or
+    the `session_id` field in the request body.
     """
+    if not request_body.question.strip():
         raise HTTPException(status_code=400, detail="Question cannot be empty.")
+    # Resolve session: body field > cookie/header > new
+    if request_body.session_id:
+        session_id = request_body.session_id
+    else:
+        session_id = get_or_create_session(request, response)
     try:
+        stream = query_sacred_texts(request_body.question, session_id=session_id)
+        # We need to forward the session_id so the frontend can persist it
+        headers = {"X-Session-Id": session_id}
         return StreamingResponse(
+            stream,
+            media_type="application/x-ndjson",
+            headers=headers,
         )
     except FileNotFoundError:
         raise HTTPException(
     except Exception as e:
         raise HTTPException(status_code=500, detail=str(e))
+@app.post("/clear", tags=["Session"])
+async def clear_conversation(body: ClearRequest, request: Request, response: Response):
+    """
+    Clear the conversation history for the given session.
+    If session_id is omitted, clears the session identified by cookie.
+    """
+    session_id = body.session_id or request.cookies.get(SESSION_COOKIE)
+    if not session_id:
+        raise HTTPException(status_code=400, detail="No session to clear.")
+    clear_session(session_id)
+    return {"status": "cleared", "session_id": session_id}
+@app.get("/history", response_model=HistoryResponse, tags=["Session"])
+async def conversation_history(session_id: str | None = None, request: Request = None):
+    """
+    Return the conversation history for a session (for debugging / display).
+    """
+    sid = session_id or (request.cookies.get(SESSION_COOKIE) if request else None)
+    if not sid:
+        raise HTTPException(status_code=400, detail="Provide session_id query param or cookie.")
+    messages = get_history(sid)
+    items = []
+    for msg in messages:
+        if isinstance(msg, HumanMessage):
+            items.append(HistoryItem(role="human", content=msg.content))
+        elif isinstance(msg, AIMessage):
+            items.append(HistoryItem(role="ai", content=msg.content))
+    return HistoryResponse(
+        session_id=sid,
+        turns=len(items) // 2,
+        messages=items,
+    )
 @app.get("/", include_in_schema=False)
 async def serve_frontend():
     frontend_path = "frontend/index.html"
     if os.path.exists(frontend_path):
         return FileResponse(frontend_path)
     return {"message": "Sacred Texts RAG API is live. Visit /docs for Swagger UI."}
 # ─── Entry Point ──────────────────────────────────────────────────────────────
 if __name__ == "__main__":
     import uvicorn
     host = os.getenv("HOST", "0.0.0.0")
+    port = int(os.getenv("PORT", "7860"))
+    print(f"\n🕊️  Sacred Texts RAG — API Server v2.0")
     print(f"{'─' * 40}")
     print(f"🌐  Running at : http://{host}:{port}")
+    print(f"🧠  Multi-turn conversation: ENABLED")
     print(f"{'─' * 40}\n")
+    uvicorn.run("app:app", host=host, port=port, reload=False)

frontend/index.html CHANGED Viewed

@@ -13,13 +13,7 @@
   <style>
     /* ── Reset & Base ─────────────────────────────────────────── */
-    *,
-    *::before,
-    *::after {
-      box-sizing: border-box;
-      margin: 0;
-      padding: 0;
-    }
     :root {
       --bg: #0d0b07;
@@ -32,68 +26,13 @@
       --cream: #f0e6cc;
       --muted: #7a6a4a;
       --gita: #e07b3b;
-      /* saffron */
       --quran: #3bba85;
-      /* green */
       --bible: #5b8ce0;
-      /* blue */
       --granth: #b07ce0;
-      /* violet — Sikh royal purple */
-    }
-    /* Animated Thinking state for streaming */
-    .thinking-dots {
-      display: inline-flex;
-      gap: 4px;
-      margin-left: 4px;
-    }
-    .thinking-dots span {
-      width: 4px;
-      height: 4px;
-      background: var(--gold);
-      border-radius: 50%;
-      animation: bounce 1.4s infinite ease-in-out;
-    }
-    @keyframes bounce {
-      0%,
-      80%,
-      100% {
-        transform: scale(0);
-      }
-      40% {
-        transform: scale(1);
-      }
     }
-    /* Make streaming text fade in slightly for smoothness */
-    #currentStreamingMsg p {
-      animation: fadeIn 0.3s ease-in;
-    }
-    @keyframes fadeIn {
-      from {
-        opacity: 0.7;
-      }
-      to {
-        opacity: 1;
-      }
-    }
-    /* Ensure the bubble has a minimum height so it doesn't look like a "small block" */
-    .msg-bubble:empty::before {
-      content: "Writing wisdom...";
-      color: var(--muted);
-      font-style: italic;
-      font-size: 0.9rem;
-    }
-    html,
-    body {
       height: 100%;
       background: var(--bg);
       color: var(--cream);
@@ -103,15 +42,14 @@
       overflow: hidden;
     }
-    /* ── Background texture ───────────────────────────────────── */
     body::before {
       content: '';
       position: fixed;
       inset: 0;
       background:
-        radial-gradient(ellipse 80% 60% at 20% 10%, rgba(201, 153, 58, .07) 0%, transparent 60%),
-        radial-gradient(ellipse 60% 80% at 80% 90%, rgba(91, 140, 224, .05) 0%, transparent 60%),
-        radial-gradient(ellipse 50% 50% at 50% 50%, rgba(176, 124, 224, .04) 0%, transparent 60%),
         url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='400' height='400'%3E%3Cfilter id='n'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='0.75' numOctaves='4' stitchTiles='stitch'/%3E%3CfeColorMatrix type='saturate' values='0'/%3E%3C/filter%3E%3Crect width='400' height='400' filter='url(%23n)' opacity='0.04'/%3E%3C/svg%3E");
       pointer-events: none;
       z-index: 0;
@@ -131,106 +69,121 @@
     /* ── Header ───────────────────────────────────────────────── */
     header {
-      padding: 28px 0 18px;
       text-align: center;
       border-bottom: 1px solid var(--border);
     }
     .mandala {
-      font-size: 2rem;
       letter-spacing: .5rem;
       color: var(--gold);
       opacity: .6;
-      margin-bottom: 8px;
       animation: spin 60s linear infinite;
       display: inline-block;
     }
-    @keyframes spin {
-      to {
-        transform: rotate(360deg);
-      }
-    }
     h1 {
       font-family: 'Cinzel Decorative', serif;
-      font-size: clamp(1.2rem, 3vw, 1.9rem);
       font-weight: 400;
       color: var(--gold-pale);
       letter-spacing: .12em;
-      text-shadow: 0 0 40px rgba(201, 153, 58, .3);
     }
     .subtitle {
       font-family: 'IM Fell English', serif;
       font-style: italic;
-      font-size: .95rem;
       color: var(--muted);
-      margin-top: 4px;
     }
     .badges {
       display: flex;
       justify-content: center;
-      gap: 12px;
-      margin-top: 12px;
       flex-wrap: wrap;
     }
     .badge {
-      font-size: .72rem;
       letter-spacing: .1em;
       text-transform: uppercase;
-      padding: 3px 10px;
       border-radius: 20px;
       border: 1px solid;
       font-family: 'Cormorant Garamond', serif;
       font-weight: 600;
     }
-    .badge-gita {
-      color: var(--gita);
-      border-color: var(--gita);
-      background: rgba(224, 123, 59, .1);
     }
-    .badge-quran {
-      color: var(--quran);
-      border-color: var(--quran);
-      background: rgba(59, 186, 133, .1);
     }
-    .badge-bible {
-      color: var(--bible);
-      border-color: var(--bible);
-      background: rgba(91, 140, 224, .1);
     }
-    .badge-granth {
-      color: var(--granth);
-      border-color: var(--granth);
-      background: rgba(176, 124, 224, .1);
     }
     /* ── Chat Window ──────────────────────────────────────────── */
     .chat-window {
       overflow-y: auto;
-      padding: 28px 0;
       display: flex;
       flex-direction: column;
       gap: 24px;
       scrollbar-width: thin;
       scrollbar-color: var(--border) transparent;
     }
-    .chat-window::-webkit-scrollbar {
-      width: 4px;
-    }
-    .chat-window::-webkit-scrollbar-thumb {
-      background: var(--border);
-      border-radius: 4px;
-    }
     /* ── Welcome State ────────────────────────────────────────── */
     .welcome {
@@ -239,84 +192,46 @@
       padding: 20px;
       max-width: 500px;
     }
-    .welcome-icon {
-      font-size: 3.5rem;
-      margin-bottom: 16px;
-      filter: drop-shadow(0 0 20px rgba(201, 153, 58, .4));
-    }
     .welcome h2 {
       font-family: 'IM Fell English', serif;
       font-style: italic;
-      font-size: 1.5rem;
       color: var(--gold-light);
-      margin-bottom: 10px;
-    }
-    .welcome p {
-      font-size: .95rem;
-      color: var(--muted);
-      line-height: 1.8;
-    }
-    .suggested-queries {
-      margin-top: 24px;
-      display: flex;
-      flex-direction: column;
-      gap: 8px;
     }
     .suggested-queries button {
       background: var(--surface);
       border: 1px solid var(--border);
       color: var(--cream);
-      padding: 10px 16px;
       border-radius: 8px;
       font-family: 'Cormorant Garamond', serif;
-      font-size: .95rem;
       font-style: italic;
       cursor: pointer;
       transition: all .2s;
       text-align: left;
     }
-    .suggested-queries button:hover {
-      border-color: var(--gold);
-      color: var(--gold-pale);
-      background: var(--surface-2);
-    }
     /* ── Messages ─────────────────────────────────────────────── */
     .message {
       display: flex;
       flex-direction: column;
-      gap: 8px;
       animation: fadeUp .4s ease both;
     }
-    @keyframes fadeUp {
-      from {
-        opacity: 0;
-        transform: translateY(12px);
-      }
-      to {
-        opacity: 1;
-        transform: translateY(0);
-      }
-    }
-    .message-user {
-      align-items: flex-end;
-    }
-    .message-assistant {
-      align-items: flex-start;
-    }
     .msg-label {
-      font-size: .7rem;
       letter-spacing: .15em;
       text-transform: uppercase;
       color: var(--muted);
@@ -326,7 +241,7 @@
     .msg-bubble {
       max-width: 92%;
-      padding: 16px 20px;
       border-radius: 12px;
       line-height: 1.75;
     }
@@ -336,40 +251,40 @@
       border: 1px solid var(--border);
       color: var(--cream);
       font-style: italic;
-      font-size: 1rem;
       border-bottom-right-radius: 4px;
     }
     .message-assistant .msg-bubble {
-      background: linear-gradient(135deg, var(--surface) 0%, rgba(30, 26, 17, .95) 100%);
-      border: 1px solid rgba(201, 153, 58, .2);
       color: var(--cream);
-      font-size: 1rem;
       border-bottom-left-radius: 4px;
-      box-shadow: 0 4px 24px rgba(0, 0, 0, .4), inset 0 1px 0 rgba(201, 153, 58, .1);
-    }
-    .msg-bubble p {
-      margin-bottom: 1em;
     }
-    .msg-bubble p:last-child {
-      margin-bottom: 0;
-    }
-    .msg-bubble strong {
-      color: var(--gold-light);
-      font-weight: 600;
     }
     /* ── Sources Panel ────────────────────────────────────────── */
-    .sources {
-      max-width: 92%;
-      margin-top: 4px;
-    }
     .sources-label {
-      font-size: .72rem;
       letter-spacing: .12em;
       text-transform: uppercase;
       color: var(--muted);
@@ -378,27 +293,12 @@
       align-items: center;
       gap: 6px;
     }
-    .sources-label::before,
-    .sources-label::after {
-      content: '';
-      flex: 1;
-      height: 1px;
-      background: var(--border);
-    }
-    .sources-label::before {
-      max-width: 20px;
-    }
-    .source-tags {
-      display: flex;
-      flex-wrap: wrap;
-      gap: 6px;
-    }
     .source-tag {
-      font-size: .78rem;
       padding: 4px 10px;
       border-radius: 6px;
       border: 1px solid;
@@ -406,101 +306,55 @@
       cursor: default;
       transition: all .2s;
     }
-    .source-tag:hover {
-      transform: translateY(-1px);
-      filter: brightness(1.2);
-    }
-    .source-gita {
-      color: var(--gita);
-      border-color: rgba(224, 123, 59, .4);
-      background: rgba(224, 123, 59, .08);
-    }
-    .source-quran {
-      color: var(--quran);
-      border-color: rgba(59, 186, 133, .4);
-      background: rgba(59, 186, 133, .08);
-    }
-    .source-bible {
-      color: var(--bible);
-      border-color: rgba(91, 140, 224, .4);
-      background: rgba(91, 140, 224, .08);
-    }
-    .source-granth {
-      color: var(--granth);
-      border-color: rgba(176, 124, 224, .4);
-      background: rgba(176, 124, 224, .08);
-    }
-    .source-other {
-      color: var(--gold-light);
-      border-color: rgba(201, 153, 58, .4);
-      background: rgba(201, 153, 58, .08);
-    }
     /* ── Loading ──────────────────────────────────────────────── */
     .loading {
       display: flex;
       align-items: center;
-      gap: 12px;
-      padding: 14px 18px;
-      border: 1px solid rgba(201, 153, 58, .15);
       border-radius: 12px;
       background: var(--surface);
       width: fit-content;
       max-width: 280px;
     }
-    .loading-dots {
-      display: flex;
-      gap: 5px;
-    }
     .loading-dots span {
-      width: 6px;
-      height: 6px;
       border-radius: 50%;
       background: var(--gold);
       animation: dot-pulse 1.4s ease-in-out infinite;
     }
-    .loading-dots span:nth-child(2) {
-      animation-delay: .2s;
-    }
-    .loading-dots span:nth-child(3) {
-      animation-delay: .4s;
-    }
     @keyframes dot-pulse {
-      0%,
-      80%,
-      100% {
-        opacity: .2;
-        transform: scale(.8);
-      }
-      40% {
-        opacity: 1;
-        transform: scale(1.1);
-      }
     }
-    .loading-text {
-      font-size: .85rem;
-      font-style: italic;
-      color: var(--muted);
     }
     /* ── Error ────────────────────────────────────────────────── */
     .error-bubble {
-      background: rgba(180, 60, 60, .1);
-      border: 1px solid rgba(180, 60, 60, .3);
       color: #e08080;
       padding: 12px 16px;
       border-radius: 10px;
@@ -509,52 +363,38 @@
     }
     /* ── Input Area ───────────────────────────────────────────── */
-    .input-area {
-      padding: 16px 0 24px;
-      border-top: 1px solid var(--border);
-    }
-    .input-row {
-      display: flex;
-      gap: 10px;
-      align-items: flex-end;
-    }
     textarea {
       flex: 1;
       background: var(--surface);
       border: 1px solid var(--border);
       color: var(--cream);
-      padding: 14px 16px;
       border-radius: 12px;
       font-family: 'Cormorant Garamond', serif;
-      font-size: 1rem;
       line-height: 1.6;
       resize: none;
-      min-height: 52px;
-      max-height: 140px;
       outline: none;
       transition: border-color .2s, box-shadow .2s;
     }
-    textarea::placeholder {
-      color: var(--muted);
-      font-style: italic;
-    }
     textarea:focus {
-      border-color: rgba(201, 153, 58, .5);
-      box-shadow: 0 0 0 3px rgba(201, 153, 58, .08);
     }
     .send-btn {
-      width: 52px;
-      height: 52px;
       border-radius: 12px;
-      border: 1px solid rgba(201, 153, 58, .4);
-      background: linear-gradient(135deg, rgba(201, 153, 58, .2), rgba(201, 153, 58, .05));
       color: var(--gold);
-      font-size: 1.3rem;
       cursor: pointer;
       transition: all .2s;
       display: flex;
@@ -562,36 +402,15 @@
       justify-content: center;
       flex-shrink: 0;
     }
     .send-btn:hover:not(:disabled) {
-      background: linear-gradient(135deg, rgba(201, 153, 58, .35), rgba(201, 153, 58, .15));
       border-color: var(--gold);
       transform: translateY(-1px);
-      box-shadow: 0 4px 16px rgba(201, 153, 58, .2);
     }
-    .send-btn:disabled {
-      opacity: .3;
-      cursor: not-allowed;
-      transform: none;
-    }
-    .input-hint {
-      font-size: .72rem;
-      color: var(--muted);
-      margin-top: 8px;
-      text-align: center;
-      font-style: italic;
-    }
-    /* ── Divider line ─────────────────────────────────────────── */
-    .ornament {
-      text-align: center;
-      color: var(--border);
-      font-size: .8rem;
-      letter-spacing: .4em;
-      margin: 4px 0;
-    }
   </style>
 </head>
@@ -609,6 +428,16 @@
         <span class="badge badge-bible">Bible</span>
         <span class="badge badge-granth">Guru Granth Sahib</span>
       </div>
     </header>
     <!-- Chat Window -->
@@ -616,15 +445,18 @@
       <div class="welcome" id="welcomePane">
         <div class="welcome-icon">🕊️</div>
         <h2>"Seek, and it shall be given unto you"</h2>
-        <p>Ask any spiritual or philosophical question. Answers are drawn exclusively from the Bhagavad Gita, Quran,
-          Bible, and Guru Granth Sahib.</p>
         <div class="suggested-queries">
           <button onclick="askSuggested(this)">What do the scriptures say about forgiveness?</button>
           <button onclick="askSuggested(this)">How should one face fear and death?</button>
           <button onclick="askSuggested(this)">What is the purpose of prayer and worship?</button>
           <button onclick="askSuggested(this)">What is the nature of the soul according to each religion?</button>
-          <button onclick="askSuggested(this)">What do the scriptures teach about humility and selfless
-            service?</button>
         </div>
       </div>
     </div>
@@ -632,26 +464,88 @@
     <!-- Input -->
     <div class="input-area">
       <div class="input-row">
-        <textarea id="questionInput" placeholder="Ask a question from the sacred texts…" rows="1"
-          onkeydown="handleKey(event)" oninput="autoResize(this)"></textarea>
-        <button class="send-btn" id="sendBtn" onclick="sendQuestion()" title="Ask (Enter)">
-          ✦
-        </button>
       </div>
-      <p class="input-hint">Press Enter to ask · Shift+Enter for new line · Answers grounded strictly in the sacred
-        texts</p>
     </div>
   </div>
   <script>
     const API_BASE = window.location.origin;
-    let isLoading = false;
-    // ── Helpers ────────────────────────────────────────────────
     function getSourceClass(book) {
       const b = book.toLowerCase();
-      if (b.includes("gita")) return "source-gita";
       if (b.includes("quran") || b.includes("koran")) return "source-quran";
       if (b.includes("bible") || b.includes("testament")) return "source-bible";
       if (b.includes("granth") || b.includes("guru")) return "source-granth";
@@ -670,23 +564,28 @@
     function autoResize(el) {
       el.style.height = "auto";
-      el.style.height = Math.min(el.scrollHeight, 140) + "px";
     }
     function formatAnswer(text) {
-      // Convert markdown-ish bold (**text**) to <strong>
       text = text.replace(/\*\*(.*?)\*\*/g, "<strong>$1</strong>");
-      // Wrap paragraphs
       return text.split(/\n\n+/).filter(p => p.trim()).map(p => `<p>${p.trim()}</p>`).join("");
     }
-    // ── Append message to chat ─────────────────────────────────
-    function appendUserMessage(question) {
       const w = document.getElementById("chatWindow");
       const div = document.createElement("div");
       div.className = "message message-user";
       div.innerHTML = `
-        <span class="msg-label">You</span>
         <div class="msg-bubble">${escapeHtml(question)}</div>
       `;
       w.appendChild(div);
@@ -710,63 +609,46 @@
       return div;
     }
-    function replaceLoadingWithAnswer(loadingEl, data) {
-      const w = document.getElementById("chatWindow");
-      // Build source tags
-      const sourceTags = (data.sources || []).map(s => {
         const cls = getSourceClass(s.book);
-        return `<span class="source-tag ${cls}" title="Page ${s.page}">📖 ${s.book}</span>`;
       }).join("");
-      const sourcesHtml = sourceTags ? `
-        <div class="sources">
-          <div class="sources-label">References</div>
-          <div class="source-tags">${sourceTags}</div>
-        </div>
-      ` : "";
-      loadingEl.innerHTML = `
-        <span class="msg-label">Sacred Texts</span>
-        <div class="msg-bubble">${formatAnswer(data.answer)}</div>
-        ${sourcesHtml}
-      `;
-      scrollToBottom();
-    }
-    function replaceLoadingWithError(loadingEl, msg) {
-      loadingEl.innerHTML = `
-        <span class="msg-label">Error</span>
-        <div class="error-bubble">⚠️ ${escapeHtml(msg)}</div>
-      `;
-      scrollToBottom();
-    }
-    function escapeHtml(str) {
-      return str.replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;");
     }
-    // ── Send question ──────────────────────────────────────────
     async function sendQuestion() {
       if (isLoading) return;
-      const input = document.getElementById("questionInput");
       const question = input.value.trim();
       if (!question) return;
       hideWelcome();
       isLoading = true;
       document.getElementById("sendBtn").disabled = true;
       input.value = "";
       input.style.height = "auto";
-      appendUserMessage(question);
       const loadingEl = appendLoading();
       try {
         const res = await fetch(`${API_BASE}/ask`, {
-          method: "POST",
           headers: { "Content-Type": "application/json" },
-          body: JSON.stringify({ question }),
         });
         if (!res.ok) {
@@ -774,36 +656,36 @@
           throw new Error(err.detail || "Server error");
         }
-        // Initialize variables to build the UI
-        const reader = res.body.getReader();
-        const decoder = new TextDecoder();
-        let fullAnswer = "";
-        let buffer = "";
-        // Prepare the assistant UI bubble immediately
         loadingEl.innerHTML = `
-  <span class="msg-label">Sacred Texts</span>
-  <div class="msg-bubble" id="currentStreamingMsg">
-    <div class="loading-text">The scriptures are being revealed<span class="thinking-dots"><span></span><span></span><span></span></span></div>
-  </div>
-  <div id="currentStreamingSources"></div>
-`;
-        const bubble = document.getElementById("currentStreamingMsg");
         const sourcesContainer = document.getElementById("currentStreamingSources");
-        let firstTokenReceived = false;
         while (true) {
           const { done, value } = await reader.read();
           if (done) break;
-          // Append new data to the buffer
           buffer += decoder.decode(value, { stream: true });
-          // Split by newline
           const lines = buffer.split("\n");
-          buffer = lines.pop();
           for (const line of lines) {
             if (!line.trim()) continue;
@@ -811,20 +693,13 @@
               const parsed = JSON.parse(line);
               if (parsed.type === "token") {
-                //Remove the loading text as soon as the first word arrives
-                if (!firstTokenReceived) {
-                  bubble.innerHTML = "";
-                  firstTokenReceived = true;
-                }
                 fullAnswer += parsed.data;
-                // Dynamically update the bubble with formatted markdown/paragraphs
                 bubble.innerHTML = formatAnswer(fullAnswer);
                 scrollToBottom();
               }
               else if (parsed.type === "sources") {
-                sourcesData = parsed.data;
-                renderSourcesInPlace(sourcesContainer, sourcesData);
               }
               else if (parsed.type === "cache") {
                 bubble.innerHTML = formatAnswer(parsed.data.answer);
@@ -832,18 +707,24 @@
                 scrollToBottom();
               }
             } catch (e) {
-              console.error("Stream parsing error", e);
             }
           }
         }
-        // Clean up IDs once done so next messages don't conflict
         bubble.removeAttribute("id");
         sourcesContainer.removeAttribute("id");
       } catch (err) {
-        let msg = err.message;
-        replaceLoadingWithError(loadingEl, msg);
       } finally {
         isLoading = false;
         document.getElementById("sendBtn").disabled = false;
@@ -851,27 +732,9 @@
       }
     }
-    // Helper to render sources inside the streaming flow
-    function renderSourcesInPlace(container, sources) {
-      const sourceTags = (sources || []).map(s => {
-        const cls = getSourceClass(s.book);
-        // Use verse citations as the primary text
-        return `<span class="source-tag ${cls}" title="${s.snippet}">📖 ${s.book}</span>`;
-      }).join("");
-      if (sourceTags) {
-        container.innerHTML = `
-      <div class="sources">
-        <div class="sources-label">Citations</div>
-        <div class="source-tags">${sourceTags}</div>
-      </div>
-    `;
-      }
-    }
     function askSuggested(btn) {
       const input = document.getElementById("questionInput");
-      input.value = btn.textContent;
       autoResize(input);
       sendQuestion();
     }
@@ -882,7 +745,9 @@
         sendQuestion();
       }
     }
   </script>
 </body>
 </html>

   <style>
     /* ── Reset & Base ─────────────────────────────────────────── */
+    *, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
     :root {
       --bg: #0d0b07;
       --cream: #f0e6cc;
       --muted: #7a6a4a;
       --gita: #e07b3b;
       --quran: #3bba85;
       --bible: #5b8ce0;
       --granth: #b07ce0;
+      --danger: #e06060;
     }
+    html, body {
       height: 100%;
       background: var(--bg);
       color: var(--cream);
       overflow: hidden;
     }
     body::before {
       content: '';
       position: fixed;
       inset: 0;
       background:
+        radial-gradient(ellipse 80% 60% at 20% 10%, rgba(201,153,58,.07) 0%, transparent 60%),
+        radial-gradient(ellipse 60% 80% at 80% 90%, rgba(91,140,224,.05) 0%, transparent 60%),
+        radial-gradient(ellipse 50% 50% at 50% 50%, rgba(176,124,224,.04) 0%, transparent 60%),
         url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='400' height='400'%3E%3Cfilter id='n'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='0.75' numOctaves='4' stitchTiles='stitch'/%3E%3CfeColorMatrix type='saturate' values='0'/%3E%3C/filter%3E%3Crect width='400' height='400' filter='url(%23n)' opacity='0.04'/%3E%3C/svg%3E");
       pointer-events: none;
       z-index: 0;
     /* ── Header ───────────────────────────────────────────────── */
     header {
+      padding: 20px 0 14px;
       text-align: center;
       border-bottom: 1px solid var(--border);
+      position: relative;
     }
     .mandala {
+      font-size: 1.8rem;
       letter-spacing: .5rem;
       color: var(--gold);
       opacity: .6;
+      margin-bottom: 6px;
       animation: spin 60s linear infinite;
       display: inline-block;
     }
+    @keyframes spin { to { transform: rotate(360deg); } }
     h1 {
       font-family: 'Cinzel Decorative', serif;
+      font-size: clamp(1.1rem, 3vw, 1.7rem);
       font-weight: 400;
       color: var(--gold-pale);
       letter-spacing: .12em;
+      text-shadow: 0 0 40px rgba(201,153,58,.3);
     }
     .subtitle {
       font-family: 'IM Fell English', serif;
       font-style: italic;
+      font-size: .9rem;
       color: var(--muted);
+      margin-top: 3px;
     }
     .badges {
       display: flex;
       justify-content: center;
+      gap: 10px;
+      margin-top: 10px;
       flex-wrap: wrap;
     }
     .badge {
+      font-size: .7rem;
       letter-spacing: .1em;
       text-transform: uppercase;
+      padding: 2px 9px;
       border-radius: 20px;
       border: 1px solid;
       font-family: 'Cormorant Garamond', serif;
       font-weight: 600;
     }
+    .badge-gita   { color: var(--gita);   border-color: var(--gita);   background: rgba(224,123,59,.1); }
+    .badge-quran  { color: var(--quran);  border-color: var(--quran);  background: rgba(59,186,133,.1); }
+    .badge-bible  { color: var(--bible);  border-color: var(--bible);  background: rgba(91,140,224,.1); }
+    .badge-granth { color: var(--granth); border-color: var(--granth); background: rgba(176,124,224,.1); }
+    /* ── Session bar ──────────────────────────────────────────── */
+    .session-bar {
+      display: none;   /* hidden until a conversation starts */
+      align-items: center;
+      justify-content: space-between;
+      gap: 8px;
+      margin-top: 10px;
+      padding: 5px 10px;
+      border: 1px solid var(--border);
+      border-radius: 8px;
+      background: var(--surface);
+      font-size: .75rem;
+      color: var(--muted);
     }
+    .session-bar.visible { display: flex; }
+    .session-turn-count {
+      font-family: 'Cormorant Garamond', serif;
+      font-style: italic;
     }
+    .session-turn-count span {
+      color: var(--gold-light);
+      font-weight: 600;
     }
+    .new-convo-btn {
+      display: flex;
+      align-items: center;
+      gap: 5px;
+      background: none;
+      border: 1px solid var(--border);
+      color: var(--muted);
+      padding: 3px 10px;
+      border-radius: 6px;
+      font-family: 'Cormorant Garamond', serif;
+      font-size: .75rem;
+      cursor: pointer;
+      transition: all .2s;
+    }
+    .new-convo-btn:hover {
+      border-color: var(--danger);
+      color: var(--danger);
     }
     /* ── Chat Window ──────────────────────────────────────────── */
     .chat-window {
       overflow-y: auto;
+      padding: 24px 0;
       display: flex;
       flex-direction: column;
       gap: 24px;
       scrollbar-width: thin;
       scrollbar-color: var(--border) transparent;
     }
+    .chat-window::-webkit-scrollbar { width: 4px; }
+    .chat-window::-webkit-scrollbar-thumb { background: var(--border); border-radius: 4px; }
     /* ── Welcome State ────────────────────────────────────────── */
     .welcome {
       padding: 20px;
       max-width: 500px;
     }
+    .welcome-icon { font-size: 3.2rem; margin-bottom: 14px; filter: drop-shadow(0 0 20px rgba(201,153,58,.4)); }
     .welcome h2 {
       font-family: 'IM Fell English', serif;
       font-style: italic;
+      font-size: 1.4rem;
       color: var(--gold-light);
+      margin-bottom: 8px;
     }
+    .welcome p { font-size: .92rem; color: var(--muted); line-height: 1.8; }
+    .suggested-queries { margin-top: 20px; display: flex; flex-direction: column; gap: 7px; }
     .suggested-queries button {
       background: var(--surface);
       border: 1px solid var(--border);
       color: var(--cream);
+      padding: 9px 14px;
       border-radius: 8px;
       font-family: 'Cormorant Garamond', serif;
+      font-size: .92rem;
       font-style: italic;
       cursor: pointer;
       transition: all .2s;
       text-align: left;
     }
+    .suggested-queries button:hover { border-color: var(--gold); color: var(--gold-pale); background: var(--surface-2); }
     /* ── Messages ─────────────────────────────────────────────── */
     .message {
       display: flex;
       flex-direction: column;
+      gap: 6px;
       animation: fadeUp .4s ease both;
     }
+    @keyframes fadeUp { from { opacity: 0; transform: translateY(10px); } to { opacity: 1; transform: translateY(0); } }
+    .message-user      { align-items: flex-end; }
+    .message-assistant { align-items: flex-start; }
     .msg-label {
+      font-size: .68rem;
       letter-spacing: .15em;
       text-transform: uppercase;
       color: var(--muted);
     .msg-bubble {
       max-width: 92%;
+      padding: 14px 18px;
       border-radius: 12px;
       line-height: 1.75;
     }
       border: 1px solid var(--border);
       color: var(--cream);
       font-style: italic;
+      font-size: .97rem;
       border-bottom-right-radius: 4px;
     }
     .message-assistant .msg-bubble {
+      background: linear-gradient(135deg, var(--surface) 0%, rgba(30,26,17,.95) 100%);
+      border: 1px solid rgba(201,153,58,.2);
       color: var(--cream);
+      font-size: .97rem;
       border-bottom-left-radius: 4px;
+      box-shadow: 0 4px 24px rgba(0,0,0,.4), inset 0 1px 0 rgba(201,153,58,.1);
     }
+    .msg-bubble p            { margin-bottom: 1em; }
+    .msg-bubble p:last-child { margin-bottom: 0; }
+    .msg-bubble strong       { color: var(--gold-light); font-weight: 600; }
+    /* Follow-up continuation pill */
+    .followup-pill {
+      font-size: .68rem;
+      padding: 2px 8px;
+      border-radius: 10px;
+      background: rgba(201,153,58,.08);
+      border: 1px solid rgba(201,153,58,.2);
+      color: var(--muted);
+      margin-left: 6px;
+      font-style: italic;
+      vertical-align: middle;
     }
     /* ── Sources Panel ────────────────────────────────────────── */
+    .sources { max-width: 92%; margin-top: 4px; }
     .sources-label {
+      font-size: .7rem;
       letter-spacing: .12em;
       text-transform: uppercase;
       color: var(--muted);
       align-items: center;
       gap: 6px;
     }
+    .sources-label::before, .sources-label::after { content: ''; flex: 1; height: 1px; background: var(--border); }
+    .sources-label::before { max-width: 20px; }
+    .source-tags { display: flex; flex-wrap: wrap; gap: 6px; }
     .source-tag {
+      font-size: .76rem;
       padding: 4px 10px;
       border-radius: 6px;
       border: 1px solid;
       cursor: default;
       transition: all .2s;
     }
+    .source-tag:hover { transform: translateY(-1px); filter: brightness(1.2); }
+    .source-gita   { color: var(--gita);   border-color: rgba(224,123,59,.4);  background: rgba(224,123,59,.08); }
+    .source-quran  { color: var(--quran);  border-color: rgba(59,186,133,.4);  background: rgba(59,186,133,.08); }
+    .source-bible  { color: var(--bible);  border-color: rgba(91,140,224,.4);  background: rgba(91,140,224,.08); }
+    .source-granth { color: var(--granth); border-color: rgba(176,124,224,.4); background: rgba(176,124,224,.08); }
+    .source-other  { color: var(--gold-light); border-color: rgba(201,153,58,.4); background: rgba(201,153,58,.08); }
     /* ── Loading ──────────────────────────────────────────────── */
     .loading {
       display: flex;
       align-items: center;
+      gap: 10px;
+      padding: 12px 16px;
+      border: 1px solid rgba(201,153,58,.15);
       border-radius: 12px;
       background: var(--surface);
       width: fit-content;
       max-width: 280px;
     }
+    .loading-dots { display: flex; gap: 5px; }
     .loading-dots span {
+      width: 6px; height: 6px;
       border-radius: 50%;
       background: var(--gold);
       animation: dot-pulse 1.4s ease-in-out infinite;
     }
+    .loading-dots span:nth-child(2) { animation-delay: .2s; }
+    .loading-dots span:nth-child(3) { animation-delay: .4s; }
     @keyframes dot-pulse {
+      0%,80%,100% { opacity: .2; transform: scale(.8); }
+      40%          { opacity: 1;  transform: scale(1.1); }
     }
+    .loading-text { font-size: .82rem; font-style: italic; color: var(--muted); }
+    /* ── Thinking dots (streaming) ────────────────────────────── */
+    .thinking-dots { display: inline-flex; gap: 4px; margin-left: 4px; }
+    .thinking-dots span {
+      width: 4px; height: 4px;
+      background: var(--gold);
+      border-radius: 50%;
+      animation: bounce 1.4s infinite ease-in-out;
     }
+    @keyframes bounce { 0%,80%,100% { transform: scale(0); } 40% { transform: scale(1); } }
     /* ── Error ────────────────────────────────────────────────── */
     .error-bubble {
+      background: rgba(180,60,60,.1);
+      border: 1px solid rgba(180,60,60,.3);
       color: #e08080;
       padding: 12px 16px;
       border-radius: 10px;
     }
     /* ── Input Area ───────────────────────────────────────────── */
+    .input-area { padding: 14px 0 22px; border-top: 1px solid var(--border); }
+    .input-row  { display: flex; gap: 10px; align-items: flex-end; }
     textarea {
       flex: 1;
       background: var(--surface);
       border: 1px solid var(--border);
       color: var(--cream);
+      padding: 13px 15px;
       border-radius: 12px;
       font-family: 'Cormorant Garamond', serif;
+      font-size: .97rem;
       line-height: 1.6;
       resize: none;
+      min-height: 50px;
+      max-height: 130px;
       outline: none;
       transition: border-color .2s, box-shadow .2s;
     }
+    textarea::placeholder { color: var(--muted); font-style: italic; }
     textarea:focus {
+      border-color: rgba(201,153,58,.5);
+      box-shadow: 0 0 0 3px rgba(201,153,58,.08);
     }
     .send-btn {
+      width: 50px; height: 50px;
       border-radius: 12px;
+      border: 1px solid rgba(201,153,58,.4);
+      background: linear-gradient(135deg, rgba(201,153,58,.2), rgba(201,153,58,.05));
       color: var(--gold);
+      font-size: 1.25rem;
       cursor: pointer;
       transition: all .2s;
       display: flex;
       justify-content: center;
       flex-shrink: 0;
     }
     .send-btn:hover:not(:disabled) {
+      background: linear-gradient(135deg, rgba(201,153,58,.35), rgba(201,153,58,.15));
       border-color: var(--gold);
       transform: translateY(-1px);
+      box-shadow: 0 4px 16px rgba(201,153,58,.2);
     }
+    .send-btn:disabled { opacity: .3; cursor: not-allowed; transform: none; }
+    .input-hint { font-size: .7rem; color: var(--muted); margin-top: 7px; text-align: center; font-style: italic; }
   </style>
 </head>
         <span class="badge badge-bible">Bible</span>
         <span class="badge badge-granth">Guru Granth Sahib</span>
       </div>
+      <!-- Session status bar — visible once conversation starts -->
+      <div class="session-bar" id="sessionBar">
+        <span class="session-turn-count" id="turnCountLabel">
+          Turn <span id="turnCount">0</span>
+        </span>
+        <button class="new-convo-btn" onclick="startNewConversation()" title="Clear history and start fresh">
+          ↺ New Conversation
+        </button>
+      </div>
     </header>
     <!-- Chat Window -->
       <div class="welcome" id="welcomePane">
         <div class="welcome-icon">🕊️</div>
         <h2>"Seek, and it shall be given unto you"</h2>
+        <p>Ask any spiritual or philosophical question. Answers are drawn exclusively from the
+           Bhagavad Gita, Quran, Bible, and Guru Granth Sahib.<br><br>
+           <em style="color:var(--gold-light); font-size:.9rem;">
+             You can now ask follow-up questions — the guide remembers the conversation.
+           </em>
+        </p>
         <div class="suggested-queries">
           <button onclick="askSuggested(this)">What do the scriptures say about forgiveness?</button>
           <button onclick="askSuggested(this)">How should one face fear and death?</button>
           <button onclick="askSuggested(this)">What is the purpose of prayer and worship?</button>
           <button onclick="askSuggested(this)">What is the nature of the soul according to each religion?</button>
+          <button onclick="askSuggested(this)">What do the scriptures teach about humility and selfless service?</button>
         </div>
       </div>
     </div>
     <!-- Input -->
     <div class="input-area">
       <div class="input-row">
+        <textarea id="questionInput"
+                  placeholder="Ask a question, or follow up on the previous answer…"
+                  rows="1"
+                  onkeydown="handleKey(event)"
+                  oninput="autoResize(this)"></textarea>
+        <button class="send-btn" id="sendBtn" onclick="sendQuestion()" title="Ask (Enter)">✦</button>
       </div>
+      <p class="input-hint">Enter to ask · Shift+Enter for new line · Follow-ups like "elaborate on point 2" work!</p>
     </div>
   </div>
   <script>
     const API_BASE = window.location.origin;
+    let isLoading  = false;
+    let sessionId  = null;      // persisted across the page session
+    let turnCount  = 0;         // how many full turns this session
+    // ── Session helpers ────────────────────────────────────────
+    function loadSession() {
+      sessionId = localStorage.getItem("rag_session_id") || null;
+    }
+    function saveSession(id) {
+      sessionId = id;
+      localStorage.setItem("rag_session_id", id);
+    }
+    function updateSessionBar() {
+      const bar   = document.getElementById("sessionBar");
+      const count = document.getElementById("turnCount");
+      if (turnCount > 0) {
+        bar.classList.add("visible");
+        count.textContent = turnCount;
+      } else {
+        bar.classList.remove("visible");
+      }
+    }
+    async function startNewConversation() {
+      if (!sessionId) return;
+      if (turnCount > 0 && !confirm("Start a new conversation? This will clear all history.")) return;
+      try {
+        await fetch(`${API_BASE}/clear`, {
+          method: "POST",
+          headers: { "Content-Type": "application/json" },
+          body: JSON.stringify({ session_id: sessionId }),
+        });
+      } catch (_) {}
+      // Reset everything
+      sessionId  = null;
+      turnCount  = 0;
+      localStorage.removeItem("rag_session_id");
+      updateSessionBar();
+      const chatWindow = document.getElementById("chatWindow");
+      chatWindow.innerHTML = `
+        <div class="welcome" id="welcomePane">
+          <div class="welcome-icon">🕊️</div>
+          <h2>"Seek, and it shall be given unto you"</h2>
+          <p>Ask any spiritual or philosophical question. Answers are drawn exclusively from the
+             Bhagavad Gita, Quran, Bible, and Guru Granth Sahib.<br><br>
+             <em style="color:var(--gold-light); font-size:.9rem;">
+               You can now ask follow-up questions — the guide remembers the conversation.
+             </em>
+          </p>
+          <div class="suggested-queries">
+            <button onclick="askSuggested(this)">What do the scriptures say about forgiveness?</button>
+            <button onclick="askSuggested(this)">How should one face fear and death?</button>
+            <button onclick="askSuggested(this)">What is the purpose of prayer and worship?</button>
+            <button onclick="askSuggested(this)">What is the nature of the soul according to each religion?</button>
+            <button onclick="askSuggested(this)">What do the scriptures teach about humility and selfless service?</button>
+          </div>
+        </div>`;
+    }
+    // ── DOM Helpers ────────────────────────────────────────────
     function getSourceClass(book) {
       const b = book.toLowerCase();
+      if (b.includes("gita"))              return "source-gita";
       if (b.includes("quran") || b.includes("koran")) return "source-quran";
       if (b.includes("bible") || b.includes("testament")) return "source-bible";
       if (b.includes("granth") || b.includes("guru")) return "source-granth";
     function autoResize(el) {
       el.style.height = "auto";
+      el.style.height = Math.min(el.scrollHeight, 130) + "px";
     }
     function formatAnswer(text) {
       text = text.replace(/\*\*(.*?)\*\*/g, "<strong>$1</strong>");
       return text.split(/\n\n+/).filter(p => p.trim()).map(p => `<p>${p.trim()}</p>`).join("");
     }
+    function escapeHtml(str) {
+      return str.replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;");
+    }
+    // ── Message rendering ──────────────────────────────────────
+    function appendUserMessage(question, isFollowup) {
       const w = document.getElementById("chatWindow");
       const div = document.createElement("div");
       div.className = "message message-user";
+      const pill = isFollowup
+        ? `<span class="followup-pill">follow-up</span>`
+        : "";
       div.innerHTML = `
+        <span class="msg-label">You${pill}</span>
         <div class="msg-bubble">${escapeHtml(question)}</div>
       `;
       w.appendChild(div);
       return div;
     }
+    function renderSourcesInPlace(container, sources) {
+      const sourceTags = (sources || []).map(s => {
         const cls = getSourceClass(s.book);
+        return `<span class="source-tag ${cls}" title="${escapeHtml(s.snippet || '')}">📖 ${escapeHtml(s.book)}</span>`;
       }).join("");
+      if (sourceTags) {
+        container.innerHTML = `
+          <div class="sources">
+            <div class="sources-label">Citations</div>
+            <div class="source-tags">${sourceTags}</div>
+          </div>`;
+      }
     }
+    // ── Core send flow ─────────────────────────────────────────
     async function sendQuestion() {
       if (isLoading) return;
+      const input    = document.getElementById("questionInput");
       const question = input.value.trim();
       if (!question) return;
       hideWelcome();
+      const isFollowup = turnCount > 0;
       isLoading = true;
       document.getElementById("sendBtn").disabled = true;
       input.value = "";
       input.style.height = "auto";
+      appendUserMessage(question, isFollowup);
       const loadingEl = appendLoading();
       try {
+        const payload = { question };
+        if (sessionId) payload.session_id = sessionId;
         const res = await fetch(`${API_BASE}/ask`, {
+          method:  "POST",
           headers: { "Content-Type": "application/json" },
+          body:    JSON.stringify(payload),
         });
         if (!res.ok) {
           throw new Error(err.detail || "Server error");
         }
+        // Capture session ID returned by the server
+        const returnedSession = res.headers.get("X-Session-Id");
+        if (returnedSession) saveSession(returnedSession);
+        // Set up streaming bubble
         loadingEl.innerHTML = `
+          <span class="msg-label">Sacred Texts</span>
+          <div class="msg-bubble" id="currentStreamingMsg">
+            <div class="loading-text">The scriptures are being revealed
+              <span class="thinking-dots"><span></span><span></span><span></span></span>
+            </div>
+          </div>
+          <div id="currentStreamingSources"></div>`;
+        const bubble           = document.getElementById("currentStreamingMsg");
         const sourcesContainer = document.getElementById("currentStreamingSources");
+        let fullAnswer         = "";
+        let buffer             = "";
+        let firstToken         = false;
+        const reader  = res.body.getReader();
+        const decoder = new TextDecoder();
         while (true) {
           const { done, value } = await reader.read();
           if (done) break;
           buffer += decoder.decode(value, { stream: true });
           const lines = buffer.split("\n");
+          buffer = lines.pop();   // keep incomplete line in buffer
           for (const line of lines) {
             if (!line.trim()) continue;
               const parsed = JSON.parse(line);
               if (parsed.type === "token") {
+                if (!firstToken) { bubble.innerHTML = ""; firstToken = true; }
                 fullAnswer += parsed.data;
                 bubble.innerHTML = formatAnswer(fullAnswer);
                 scrollToBottom();
               }
               else if (parsed.type === "sources") {
+                renderSourcesInPlace(sourcesContainer, parsed.data);
               }
               else if (parsed.type === "cache") {
                 bubble.innerHTML = formatAnswer(parsed.data.answer);
                 scrollToBottom();
               }
             } catch (e) {
+              console.warn("Stream parse error:", e);
             }
           }
         }
+        // Increment turn counter
+        turnCount++;
+        updateSessionBar();
+        // Clean up streaming IDs
         bubble.removeAttribute("id");
         sourcesContainer.removeAttribute("id");
       } catch (err) {
+        loadingEl.innerHTML = `
+          <span class="msg-label">Error</span>
+          <div class="error-bubble">⚠️ ${escapeHtml(err.message)}</div>`;
+        scrollToBottom();
       } finally {
         isLoading = false;
         document.getElementById("sendBtn").disabled = false;
       }
     }
     function askSuggested(btn) {
       const input = document.getElementById("questionInput");
+      input.value = btn.textContent.trim();
       autoResize(input);
       sendQuestion();
     }
         sendQuestion();
       }
     }
+    // ── Init ───────────────────────────────────────────────────
+    loadSession();
   </script>
 </body>
 </html>

rag_chain.py CHANGED Viewed

@@ -1,43 +1,38 @@
 """
-rag_chain.py — Core RAG chain using LangChain + Gemini.
-KEY FIX: Uses per-book retrieval (guaranteed slots per scripture) instead of
-a single similarity search — so no book gets starved from the context window
-when the query is semantically closer to another book's language.
-This module exposes a single function:
-    answer = query_sacred_texts(user_question)
-Returns a dict with:
-    {
-        "answer": "...",
-        "sources": [
-            {"book": "Bhagavad Gita", "page": 42, "snippet": "..."},
-            ...
-        ]
-    }
 """
 import os
-from pydoc import doc
 from dotenv import load_dotenv
 from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, ChatNVIDIA, NVIDIARerank
 from langchain_chroma import Chroma
-from langchain_core.prompts import ChatPromptTemplate
 from langchain_core.output_parsers import StrOutputParser
 from langchain_community.retrievers import BM25Retriever
 from langchain_classic.retrievers import EnsembleRetriever, ContextualCompressionRetriever
-load_dotenv()
-import json
-NVIDIA_API_KEY = os.getenv("NVIDIA_API_KEY")
-CHROMA_DB_PATH = os.getenv("CHROMA_DB_PATH", "./chroma_db")
-COLLECTION_NAME = os.getenv("COLLECTION_NAME", "sacred_texts")
-# Chunks retrieved PER BOOK — guarantees every scripture contributes to the answer
-CHUNKS_PER_BOOK = int(os.getenv("CHUNKS_PER_BOOK", "3"))
-# All books currently in the knowledge base — add new books here as you ingest them
 KNOWN_BOOKS = [
     "Bhagavad Gita",
     "Quran",
@@ -45,8 +40,32 @@ KNOWN_BOOKS = [
     "Guru Granth Sahib",
 ]
-# Create a separate collection for semantic cache
-CACHE_COLLECTION = "semantic_cache"
 # ─── System Prompt ────────────────────────────────────────────────────────────
@@ -62,6 +81,10 @@ STRICT RULES you must ALWAYS follow:
    address EACH of those books separately, then synthesise the common thread.
 6. Be respectful and neutral toward all faiths — treat each text with equal reverence.
 7. Do NOT speculate, invent verses, or add information beyond the context.
 FORMAT your response as:
 - A clear, thoughtful answer (2–4 paragraphs)
@@ -73,8 +96,6 @@ Context passages from the sacred texts (guaranteed passages from each book):
 ────────────────────────────────────────
 """
-HUMAN_PROMPT = "Question: {question}"
 # ─── Embeddings & Vector Store ────────────────────────────────────────────────
@@ -94,42 +115,17 @@ def get_vector_store(embeddings):
     )
-# ─── Per-Book Retrieval ───────────────────────────────────────────────────────
-def get_reranked_retriever(base_retriever):
-    """
-    Wraps your Hybrid/Per-Book retriever with a Reranking layer.
-    """
-    # 1. Initialize the NVIDIA Reranker (NIM or API Catalog)
-    # Using nvidia/llama-3.2-nv-rerankqa-1b-v2 or similar
-    reranker = NVIDIARerank(
-        model="nvidia/llama-3.2-nv-rerankqa-1b-v2",
-        api_key=NVIDIA_API_KEY,
-        top_n=5 # Only send the top 5 most relevant chunks to the LLM
-    )
-    # 2. Wrap the base retriever
-    compression_retriever = ContextualCompressionRetriever(
-        base_compressor=reranker,
-        base_retriever=base_retriever
-    )
-    return compression_retriever
 def retrieve_per_book(question: str, vector_store: Chroma) -> list:
     """
-    Retrieve CHUNKS_PER_BOOK chunks from EACH known book independently,
-    using a metadata filter. This guarantees every scripture is represented
-    in the context — no book can be crowded out by higher-scoring chunks
-    from another book.
     """
     all_candidates = []
-    # Detect if user is asking about a specific book
-    target_books = []
     question_lower = question.lower()
-    # Check for keywords in the question
     if any(kw in question_lower for kw in ["gita", "bhagavad", "hindu", "hinduism"]):
         target_books.append("Bhagavad Gita")
     if any(kw in question_lower for kw in ["quran", "koran", "islam", "muslim", "muhammad"]):
@@ -138,63 +134,52 @@ def retrieve_per_book(question: str, vector_store: Chroma) -> list:
         target_books.append("Bible")
     if any(kw in question_lower for kw in ["granth", "guru", "sikh", "sikhism", "nanak"]):
         target_books.append("Guru Granth Sahib")
-    # If no specific book is detected, use all books
     books_to_search = target_books if target_books else KNOWN_BOOKS
     print(f"🎯 Routing query to: {books_to_search}")
     for book in books_to_search:
         try:
-            # Increase k for the base retrieval to 10
-            CANDIDATE_COUNT = 10
-            # Get the full collection of documents for this book to build BM25
-            # For small demo, we can pull into memory; for larger corpora, consider a more efficient approach
             book_data = vector_store.get(where={"book": book})
-            book_docs = []
-            from langchain_core.documents import Document
-            book_docs = [Document(page_content=d, metadata=m)
-                         for d, m in zip(book_data["documents"], book_data["metadatas"])]
             if not book_docs:
                 continue
-            # Setup BM25
             bm25_retriever = BM25Retriever.from_documents(book_docs)
             bm25_retriever.k = CANDIDATE_COUNT
-            # Setup vector retriever
-            vector_retriever = vector_store.as_retriever(search_kwargs={"k": CANDIDATE_COUNT, "filter": {"book": book}})
-            #  Combine into ensemble retriever
-            ensemble_retriver = EnsembleRetriever(retrievers=[bm25_retriever, vector_retriever], weights=[0.5, 0.5])
-            # Colect candidates without reranking yet
-            book_candidates = ensemble_retriver.invoke(question)
             all_candidates.extend(book_candidates)
-            print(f"  📦 {book}: Found {len(book_candidates)} candidates")
         except Exception as e:
             print(f"  ❌  {book}: retrieval error — {e}")
-    # Rerank the entire pool at once
     if not all_candidates:
         return []
     print(f"🚀 Reranking {len(all_candidates)} total candidates...")
     reranker = NVIDIARerank(
-        model="nvidia/llama-3.2-nv-rerankqa-1b-v2",
         api_key=NVIDIA_API_KEY,
-        top_n=5 # Final count for LLM context
     )
-    # Use the reranker directly to compress the full list
-    final_docs = reranker.compress_documents(all_candidates, question)
     for i, doc in enumerate(final_docs):
         score = doc.metadata.get("relevance_score", "N/A")
         print(f"Rank {i+1} [{doc.metadata['book']}]: Score {score}")
@@ -205,11 +190,6 @@ def retrieve_per_book(question: str, vector_store: Chroma) -> list:
 # ─── Format Retrieved Docs ────────────────────────────────────────────────────
 def format_docs(docs: list) -> str:
-    """
-    Format retrieved documents grouped by book for clarity.
-    Each chunk is labelled with book and page number.
-    """
-    # Group by book to keep context readable
     by_book: dict[str, list] = {}
     for doc in docs:
         book = doc.metadata.get("book", "Unknown")
@@ -220,19 +200,16 @@ def format_docs(docs: list) -> str:
         header = f"═══ {book} ═══"
         chunks = []
         for i, doc in enumerate(book_docs, 1):
-            page = doc.metadata.get("page", "?")
-            ch = doc.metadata.get("chapter")
-            vs = doc.metadata.get("verse")
             ang = doc.metadata.get("ang")
-            # Create a clean citation string
             if ang:
                 citation = f"Ang {ang}"
             elif ch and vs:
                 citation = f"{ch}:{vs}"
             else:
                 citation = f"Page {doc.metadata.get('page', '?')}"
-            chunks.append(f"  [{i}] ({citation}): {doc.page_content.strip()}")
         sections.append(header + "\n" + "\n\n".join(chunks))
     return "\n\n".join(sections)
@@ -241,8 +218,7 @@ def format_docs(docs: list) -> str:
 # ─── Build the RAG Chain ──────────────────────────────────────────────────────
 def build_chain():
-    """Build and return the LLM chain and vector store."""
-    embeddings = get_embeddings()
     vector_store = get_vector_store(embeddings)
     llm = ChatNVIDIA(
@@ -253,137 +229,141 @@ def build_chain():
         max_output_tokens=2048,
     )
     prompt = ChatPromptTemplate.from_messages([
         ("system", SYSTEM_PROMPT),
-        ("human", HUMAN_PROMPT),
     ])
-    # Chain: prompt → LLM → string output
-    # (retrieval is handled manually in query_sacred_texts for per-book control)
     llm_chain = prompt | llm | StrOutputParser()
     return llm_chain, vector_store
-# ─── Public API ───────────────────────────────────────────────────────────────
-_llm_chain = None
 _vector_store = None
-def query_sacred_texts(question: str):
-    """
-    Query the sacred texts knowledge base with guaranteed per-book retrieval.
-    Args:
-        question: The user's spiritual/philosophical question.
-    Returns:
-        {
-            "answer": str,
-            "sources": list[dict]   # [{book, page, snippet}, ...]
-        }
     """
     global _llm_chain, _vector_store
     if _llm_chain is None:
         print("🔧  Initialising RAG chain (first call)...")
         _llm_chain, _vector_store = build_chain()
-    # --- Semantic cache check ---
-    cache_coll = _vector_store._client.get_or_create_collection(CACHE_COLLECTION)
-    cache_results = cache_coll.query(
-        query_texts=[question],
-        n_results=1
-    )
-    THRESHOLD = 0.35
-    # FIXED: Added check for cache_results['ids'] and ensuring distances is not empty
-    if cache_results['ids'] and cache_results['ids'][0]:
-        distance = cache_results['distances'][0][0]
-        if distance < THRESHOLD:  # Similarity threshold
-            print(f"⚡️ Semantic Cache Hit! (Distance: {distance:.4f})")
-            yield json.dumps({"type": "cache","data": json.loads(cache_results['metadatas'][0][0]['response_json'])}) + "\n"
-            return
-    # Step 1: Retrieve per-book (guaranteed slots for every scripture)
-    print(f"\n🔍  Retrieving {CHUNKS_PER_BOOK} chunks per book for: '{question}'")
-    source_docs = retrieve_per_book(question, _vector_store)
     if not source_docs:
         yield json.dumps({"type": "token", "data": "No content found in the knowledge base."}) + "\n"
         return
-    # 3. Step 2: Format sources for the UI immediately
-    seen_sources = set()
     sources = []
     for doc in source_docs:
         book = doc.metadata.get("book", "Unknown")
-        ch = doc.metadata.get("chapter")
-        vs = doc.metadata.get("verse")
-        ang = doc.metadata.get("ang")
         if ang:
             cite_val = f"Ang {ang}"
         elif ch and vs:
             cite_val = f"{ch}:{vs}"
         else:
             cite_val = f"p. {doc.metadata.get('page', '?')}"
         display_name = f"{book} {cite_val}"
         snippet = doc.page_content[:200].strip() + "..."
         if display_name not in seen_sources:
             seen_sources.add(display_name)
-            print("Display name:", display_name)
-            print("Page:", cite_val)
             sources.append({"book": display_name, "page": cite_val, "snippet": snippet})
-    # Step 2: Format context grouped by book
-    context = format_docs(source_docs)
-    full_answer =""
-    # Step 3: Stream from the chain:
-    for chunk in _llm_chain.invoke({"context": context, "question": question}):
         full_answer += chunk
-        yield json.dumps({"type": "token", "data": chunk}) + "\n"  # Stream the answer as it's generated
-    # Filter sources to only those the LLM actually referenced
-    final_sources = []
-    ansnwer_lower = full_answer.lower()
-    for s in sources:
-        if s["book"].lower() in ansnwer_lower:
-            final_sources.append(s)
-    # If the LLM didn't explicitly reference any sources, we can optionally include all retrieved ones or none
-    display_sources = final_sources if final_sources else []
-    # Step 4: After streaming is done, save to semantic cache for future similar queries
-    result = {
-        "answer": full_answer,
-        "sources": display_sources,
-    }
-    cache_coll.add(
-        documents=[question],
-        metadatas=[{"response_json": json.dumps(result)}],
-        ids=[question]
-    )
-    # Send sources as a final message after the answer is fully streamed
     yield json.dumps({"type": "sources", "data": sources}) + "\n"
 # ─── Quick CLI Test ───────────────────────────────────────────────────────────
 if __name__ == "__main__":
-    test_q = "In what aspects do the Quran and Gita teach the same thing?"
     print(f"\n🔍  Test query: {test_q}\n")
-    result = query_sacred_texts(test_q)
-    print("📝  Answer:\n")
-    print(result["answer"])
-    print("\n📚  Sources retrieved:")
-    for s in result["sources"]:
-        print(f"  - {s['book']} (page {s['page']})")

 """
+rag_chain.py — Core RAG chain using LangChain + NVIDIA.
+KEY FEATURES:
+- Per-book retrieval (guaranteed slots per scripture)
+- Hybrid BM25 + vector search with NVIDIA reranking
+- Semantic cache for repeated/similar questions
+- Multi-turn conversation memory (session-based ConversationBufferMemory)
+Public API:
+    query_sacred_texts(question, session_id) -> Generator[str, None, None]
+    clear_session(session_id)
 """
 import os
+import json
 from dotenv import load_dotenv
 from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, ChatNVIDIA, NVIDIARerank
 from langchain_chroma import Chroma
+from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
 from langchain_core.output_parsers import StrOutputParser
+from langchain_core.messages import HumanMessage, AIMessage
 from langchain_community.retrievers import BM25Retriever
 from langchain_classic.retrievers import EnsembleRetriever, ContextualCompressionRetriever
+from langchain_core.documents import Document
+load_dotenv()
+NVIDIA_API_KEY    = os.getenv("NVIDIA_API_KEY")
+CHROMA_DB_PATH    = os.getenv("CHROMA_DB_PATH", "./chroma_db")
+COLLECTION_NAME   = os.getenv("COLLECTION_NAME", "sacred_texts")
+CHUNKS_PER_BOOK   = int(os.getenv("CHUNKS_PER_BOOK", "3"))
+CACHE_COLLECTION  = "semantic_cache"
+MAX_HISTORY_TURNS = int(os.getenv("MAX_HISTORY_TURNS", "6"))   # last N human+AI pairs kept
 KNOWN_BOOKS = [
     "Bhagavad Gita",
     "Quran",
     "Guru Granth Sahib",
 ]
+# ─── In-memory session store ──────────────────────────────────────────────────
+# { session_id: [HumanMessage | AIMessage, ...] }
+_session_store: dict[str, list] = {}
+def get_history(session_id: str) -> list:
+    return _session_store.get(session_id, [])
+def append_turn(session_id: str, human_msg: str, ai_msg: str):
+    history = _session_store.setdefault(session_id, [])
+    history.append(HumanMessage(content=human_msg))
+    history.append(AIMessage(content=ai_msg))
+    # Trim to last MAX_HISTORY_TURNS pairs (each pair = 2 messages)
+    if len(history) > MAX_HISTORY_TURNS * 2:
+        _session_store[session_id] = history[-(MAX_HISTORY_TURNS * 2):]
+def clear_session(session_id: str):
+    """Wipe the conversation history for a session."""
+    _session_store.pop(session_id, None)
+def list_sessions() -> list[str]:
+    return list(_session_store.keys())
 # ─── System Prompt ────────────────────────────────────────────────────────────
    address EACH of those books separately, then synthesise the common thread.
 6. Be respectful and neutral toward all faiths — treat each text with equal reverence.
 7. Do NOT speculate, invent verses, or add information beyond the context.
+8. You have access to the conversation history. Use it to:
+   - Understand follow-up questions (e.g. "elaborate on the second point", "what about the Bible?")
+   - Maintain continuity across turns without repeating yourself unnecessarily
+   - Resolve pronouns and references ("it", "that teaching", "the verse you mentioned") from history
 FORMAT your response as:
 - A clear, thoughtful answer (2–4 paragraphs)
 ────────────────────────────────────────
 """
 # ─── Embeddings & Vector Store ────────────────────────────────────────────────
     )
+# ─── Per-Book Hybrid Retrieval ────────────────────────────────────────────────
 def retrieve_per_book(question: str, vector_store: Chroma) -> list:
     """
+    Retrieve CHUNKS_PER_BOOK chunks from EACH known book independently using
+    a hybrid BM25+vector ensemble, then rerank the pooled candidates.
     """
     all_candidates = []
     question_lower = question.lower()
+    target_books = []
     if any(kw in question_lower for kw in ["gita", "bhagavad", "hindu", "hinduism"]):
         target_books.append("Bhagavad Gita")
     if any(kw in question_lower for kw in ["quran", "koran", "islam", "muslim", "muhammad"]):
         target_books.append("Bible")
     if any(kw in question_lower for kw in ["granth", "guru", "sikh", "sikhism", "nanak"]):
         target_books.append("Guru Granth Sahib")
     books_to_search = target_books if target_books else KNOWN_BOOKS
     print(f"🎯 Routing query to: {books_to_search}")
+    CANDIDATE_COUNT = 10
     for book in books_to_search:
         try:
             book_data = vector_store.get(where={"book": book})
+            book_docs = [
+                Document(page_content=d, metadata=m)
+                for d, m in zip(book_data["documents"], book_data["metadatas"])
+            ]
             if not book_docs:
                 continue
             bm25_retriever = BM25Retriever.from_documents(book_docs)
             bm25_retriever.k = CANDIDATE_COUNT
+            vector_retriever = vector_store.as_retriever(
+                search_kwargs={"k": CANDIDATE_COUNT, "filter": {"book": book}}
+            )
+            ensemble = EnsembleRetriever(
+                retrievers=[bm25_retriever, vector_retriever],
+                weights=[0.5, 0.5],
+            )
+            book_candidates = ensemble.invoke(question)
             all_candidates.extend(book_candidates)
+            print(f"  📦 {book}: {len(book_candidates)} candidates")
         except Exception as e:
             print(f"  ❌  {book}: retrieval error — {e}")
     if not all_candidates:
         return []
     print(f"🚀 Reranking {len(all_candidates)} total candidates...")
     reranker = NVIDIARerank(
+        model="nvidia/llama-3.2-nv-rerankqa-1b-v2",
         api_key=NVIDIA_API_KEY,
+        top_n=5,
     )
+    final_docs = reranker.compress_documents(all_candidates, question)
     for i, doc in enumerate(final_docs):
         score = doc.metadata.get("relevance_score", "N/A")
         print(f"Rank {i+1} [{doc.metadata['book']}]: Score {score}")
 # ─── Format Retrieved Docs ────────────────────────────────────────────────────
 def format_docs(docs: list) -> str:
     by_book: dict[str, list] = {}
     for doc in docs:
         book = doc.metadata.get("book", "Unknown")
         header = f"═══ {book} ═══"
         chunks = []
         for i, doc in enumerate(book_docs, 1):
             ang = doc.metadata.get("ang")
+            ch  = doc.metadata.get("chapter")
+            vs  = doc.metadata.get("verse")
             if ang:
                 citation = f"Ang {ang}"
             elif ch and vs:
                 citation = f"{ch}:{vs}"
             else:
                 citation = f"Page {doc.metadata.get('page', '?')}"
+            chunks.append(f"  [{i}] ({citation}): {doc.page_content.strip()}")
         sections.append(header + "\n" + "\n\n".join(chunks))
     return "\n\n".join(sections)
 # ─── Build the RAG Chain ──────────────────────────────────────────────────────
 def build_chain():
+    embeddings   = get_embeddings()
     vector_store = get_vector_store(embeddings)
     llm = ChatNVIDIA(
         max_output_tokens=2048,
     )
+    # Prompt now includes a chat-history placeholder so prior turns are visible
     prompt = ChatPromptTemplate.from_messages([
         ("system", SYSTEM_PROMPT),
+        MessagesPlaceholder(variable_name="history"),   # ← injected per-request
+        ("human", "{question}"),
     ])
     llm_chain = prompt | llm | StrOutputParser()
     return llm_chain, vector_store
+# ─── Singleton init ───────────────────────────────────────────────────────────
+_llm_chain    = None
 _vector_store = None
+# ─── Public API ───────────────────────────────────────────────────────────────
+def query_sacred_texts(question: str, session_id: str = "default"):
+    """
+    Stream an answer grounded in the sacred texts, maintaining per-session
+    conversation history for natural follow-up questions.
+    Yields JSON-lines of the form:
+        {"type": "token",   "data": "<chunk>"}
+        {"type": "sources", "data": [...]}
+        {"type": "cache",   "data": {"answer": "...", "sources": [...]}}
     """
     global _llm_chain, _vector_store
     if _llm_chain is None:
         print("🔧  Initialising RAG chain (first call)...")
         _llm_chain, _vector_store = build_chain()
+    # ── Semantic cache check (skip for follow-ups that reference history) ──
+    history = get_history(session_id)
+    is_followup = len(history) > 0
+    if not is_followup:
+        cache_coll = _vector_store._client.get_or_create_collection(CACHE_COLLECTION)
+        cache_results = cache_coll.query(query_texts=[question], n_results=1)
+        THRESHOLD = 0.35
+        if cache_results["ids"] and cache_results["ids"][0]:
+            distance = cache_results["distances"][0][0]
+            if distance < THRESHOLD:
+                print(f"⚡️ Semantic Cache Hit! (Distance: {distance:.4f})")
+                cached = json.loads(cache_results["metadatas"][0][0]["response_json"])
+                # Store this cache hit in session memory too
+                append_turn(session_id, question, cached["answer"])
+                yield json.dumps({"type": "cache", "data": cached}) + "\n"
+                return
+    # ── Retrieval ──────────────────────────────────────────────────────────
+    # For follow-ups, augment the question with the last human turn for better
+    # semantic search (the follow-up itself may be too short/vague)
+    retrieval_query = question
+    if is_followup and len(question.split()) < 8:
+        last_human = next(
+            (m.content for m in reversed(history) if isinstance(m, HumanMessage)), ""
+        )
+        retrieval_query = f"{last_human} {question}".strip()
+        print(f"🔁 Follow-up detected — augmented retrieval query: '{retrieval_query}'")
+    print(f"\n🔍  Retrieving chunks for: '{retrieval_query}'")
+    source_docs = retrieve_per_book(retrieval_query, _vector_store)
     if not source_docs:
         yield json.dumps({"type": "token", "data": "No content found in the knowledge base."}) + "\n"
         return
+    # ── Build sources list ─────────────────────────────────────────────────
+    seen_sources: set[str] = set()
     sources = []
     for doc in source_docs:
         book = doc.metadata.get("book", "Unknown")
+        ang  = doc.metadata.get("ang")
+        ch   = doc.metadata.get("chapter")
+        vs   = doc.metadata.get("verse")
         if ang:
             cite_val = f"Ang {ang}"
         elif ch and vs:
             cite_val = f"{ch}:{vs}"
         else:
             cite_val = f"p. {doc.metadata.get('page', '?')}"
         display_name = f"{book} {cite_val}"
         snippet = doc.page_content[:200].strip() + "..."
         if display_name not in seen_sources:
             seen_sources.add(display_name)
             sources.append({"book": display_name, "page": cite_val, "snippet": snippet})
+    context   = format_docs(source_docs)
+    full_answer = ""
+    # ── Stream LLM response (history injected here) ────────────────────────
+    for chunk in _llm_chain.stream({
+        "context":  context,
+        "question": question,
+        "history":  history,          # ← the conversation so far
+    }):
         full_answer += chunk
+        yield json.dumps({"type": "token", "data": chunk}) + "\n"
+    # ── Filter sources to those actually cited in the answer ───────────────
+    answer_lower = full_answer.lower()
+    final_sources = [s for s in sources if s["book"].lower() in answer_lower] or []
+    # ── Persist this turn into session memory ─────────────────────────────
+    append_turn(session_id, question, full_answer)
+    print(f"💾 Session '{session_id}': {len(get_history(session_id)) // 2} turn(s) stored")
+    # ── Cache first-turn answers only ─────────────────────────────────────
+    if not is_followup:
+        result_to_cache = {"answer": full_answer, "sources": final_sources}
+        try:
+            cache_coll = _vector_store._client.get_or_create_collection(CACHE_COLLECTION)
+            cache_coll.add(
+                documents=[question],
+                metadatas=[{"response_json": json.dumps(result_to_cache)}],
+                ids=[question],
+            )
+        except Exception as e:
+            print(f"⚠️  Cache write failed: {e}")
     yield json.dumps({"type": "sources", "data": sources}) + "\n"
 # ─── Quick CLI Test ───────────────────────────────────────────────────────────
 if __name__ == "__main__":
+    test_q = "What do the scriptures say about forgiveness?"
     print(f"\n🔍  Test query: {test_q}\n")
+    for line in query_sacred_texts(test_q, session_id="cli-test"):
+        obj = json.loads(line)
+        if obj["type"] == "token":
+            print(obj["data"], end="", flush=True)
+    print("\n")