LifeGuide / features_to_add.txt
Shouvik599
Next set of feature improvements
e31b9ae
Contextual chunk expansion β€” when a chunk is retrieved, also fetch the surrounding chunks (Β±1) to avoid cut-off verses losing their meaning
Hypothetical Document Embedding (HyDE) β€” generate a hypothetical ideal answer first, embed that, then search β€” dramatically improves recall for abstract questions
Multi-turn conversation β€” add chat history using LangChain ConversationBufferMemory so users can ask follow-up questions like "Elaborate on the second point"
Answer faithfulness scoring β€” use an LLM-as-judge step to self-check whether the answer is actually grounded in the retrieved chunks before returning it
Query rewriting β€” if the user query is vague, have the LLM rephrase it into a better search query before retrieval (improves semantic matching)
Multi-language support β€” ingest Arabic Quran + Sanskrit Gita alongside English translations; embed both and let users query in their preferred language
Incremental ingestion β€” track which PDFs have been ingested (via a manifest file) so re-running ingest.py only processes new books, not the whole library
Book versioning β€” support multiple translations of the same book (e.g. KJV vs NIV Bible) and let users choose
Snippet preview on hover β€” show the actual retrieved passage when hovering over a source badge in the UI
Query suggestions β€” after each answer, suggest 2-3 related follow-up questions
Topic explorer β€” a sidebar with pre-grouped themes (Death & Afterlife, Compassion, Duty, Prayer) that users can browse
Compare mode β€” a dedicated side-by-side view for "How does Book A vs Book B address X"
Hallucination guardrail β€” run a separate verification pass checking every claim in the answer maps back to a retrieved chunk; flag or remove unsupported claims
Out-of-scope detection β€” classify queries before retrieval; politely decline non-spiritual questions (e.g. "Write me code") with a prompt-level or classifier-level guard
Rate limiting β€” add per-IP request throttling in FastAPI to prevent API key exhaustion
API key security β€” move to server-side key storage properly; never expose NVIDIA_API_KEY or GEMINI_API_KEY in frontend calls
Need to debug -
1. General questions not citing verses
2. For exact verses cache threshold score is returning same for chapter 2 verse 4 and chapter 1 verse 10
3.