Contextual chunk expansion — when a chunk is retrieved, also fetch the surrounding chunks (±1) to avoid cut-off verses losing their meaning Hypothetical Document Embedding (HyDE) — generate a hypothetical ideal answer first, embed that, then search — dramatically improves recall for abstract questions Multi-turn conversation — add chat history using LangChain ConversationBufferMemory so users can ask follow-up questions like "Elaborate on the second point" Answer faithfulness scoring — use an LLM-as-judge step to self-check whether the answer is actually grounded in the retrieved chunks before returning it Query rewriting — if the user query is vague, have the LLM rephrase it into a better search query before retrieval (improves semantic matching) Multi-language support — ingest Arabic Quran + Sanskrit Gita alongside English translations; embed both and let users query in their preferred language Incremental ingestion — track which PDFs have been ingested (via a manifest file) so re-running ingest.py only processes new books, not the whole library Book versioning — support multiple translations of the same book (e.g. KJV vs NIV Bible) and let users choose Snippet preview on hover — show the actual retrieved passage when hovering over a source badge in the UI Query suggestions — after each answer, suggest 2-3 related follow-up questions Topic explorer — a sidebar with pre-grouped themes (Death & Afterlife, Compassion, Duty, Prayer) that users can browse Compare mode — a dedicated side-by-side view for "How does Book A vs Book B address X" Hallucination guardrail — run a separate verification pass checking every claim in the answer maps back to a retrieved chunk; flag or remove unsupported claims Out-of-scope detection — classify queries before retrieval; politely decline non-spiritual questions (e.g. "Write me code") with a prompt-level or classifier-level guard Rate limiting — add per-IP request throttling in FastAPI to prevent API key exhaustion API key security — move to server-side key storage properly; never expose NVIDIA_API_KEY or GEMINI_API_KEY in frontend calls Need to debug - 1. General questions not citing verses 2. For exact verses cache threshold score is returning same for chapter 2 verse 4 and chapter 1 verse 10 3.