Spaces:

pkgprateek
/

ai-rag-document

Sleeping

App Files Files Community

pkgprateek commited on Dec 28, 2025

Commit

fa8d5c5

unverified ·

2 Parent(s): 29b217b 643f470

Merge pull request #7 from pkgprateek/feature/multi-upload-streaming

Browse files

Files changed (4) hide show

README-HF.md +32 -27
README.md +76 -83
app/main.py +120 -53
app/rag_pipeline.py +100 -0

README-HF.md CHANGED Viewed

@@ -12,45 +12,52 @@ short_description: Document intelligence for Legal, Research, FinOps
 full_width: true
 ---
-# 🚀 Enterprise RAG Platform
-**Question your documents. Get cited answers in seconds.**
-Upload contracts, research papers, or financial reports → Ask questions in plain English → Get precise answers with page citations.
 ---
 ## How It Works
-```mermaid
-graph LR
-    A["📄 Upload"] --> B["✂️ Chunk"]
-    B --> C["🧠 Embed"]
-    C --> D["💬 Ask"]
-    D --> E["✨ Cited Answer"]
 ```
-**3 steps**: Upload → Ask → Get answers with citations.
 ---
 ## Try It Now
-1. **Select a vertical** (Legal, Research, or FinOps) — pre-loaded samples ready
-2. **Ask a sample question** or type your own
-3. **See the magic** — cited answers in seconds
-No signup required. Your documents are processed locally and auto-deleted after 7 days.
 ---
 ## Features
-- **Multi-format**: PDF, DOCX, TXT
-- **Citations**: Every answer references source documents
-- **Domain demos**: Legal, Research, FinOps pre-loaded
-- **Privacy-first**: Local processing, auto-delete after 7 days
-- **Fast**: 1-3 second response time
 ---
@@ -60,30 +67,28 @@ No signup required. Your documents are processed locally and auto-deleted after
 git clone https://github.com/pkgprateek/rag-document-qa-workflow.git
 cd rag-document-qa-workflow
 echo "GROQ_API_KEY=your_key" > .env
-echo "OPENROUTER_API_KEY=your_key" >> .env
 docker compose up
 # → http://localhost:7860
 ```
-**Get Free API Keys:** [Groq](https://console.groq.com/keys) (Required) · [OpenRouter](https://openrouter.ai/keys) (Optional)
-[View source on GitHub](https://github.com/pkgprateek/rag-document-qa-workflow)
 ---
 ## 🔒 Privacy
-- Documents processed locally (never sent externally)
-- Stored in encrypted ChromaDB
 - Auto-deleted after 7 days
-- Never used for model training
 ---
 ## Enterprise Pilots
-**2-week paid pilots** for teams ready to deploy RAG on their documents.
-📅 [Book discovery call](https://cal.com/your-link)
 ---

 full_width: true
 ---
+# Enterprise RAG Platform
+**Turn documents into answers. Instantly.**
+Upload contracts, research papers, or financial reports → Ask questions → Get cited answers in seconds.
+---
+## ✨ What's New
+- **Multi-document upload** — Process multiple files at once
+- **Streaming answers** — Watch responses generate in real-time
+- **Thinking indicator** — See "🔍 Analyzing documents..." before streaming starts
 ---
 ## How It Works
+```
+📄 Upload → ✂️ Chunk → 🧠 Embed → 💬 Ask → ✨ Cited Answer
 ```
+**3 steps**: Upload your documents → Ask questions → Get answers with page citations.
 ---
 ## Try It Now
+1. **Select a vertical** — Legal, Research, or FinOps samples pre-loaded
+2. **Or upload your own** — PDF, DOCX, TXT supported (batch upload enabled)
+3. **Ask anything** — Natural language questions
+4. **Get streaming answers** — Watch the AI think and respond in real-time
+No signup required. Documents auto-deleted after 7 days.
 ---
 ## Features
+| Feature | Description |
+|---------|-------------|
+| **Multi-upload** | Upload multiple files at once |
+| **Streaming** | Real-time token-by-token answers |
+| **Citations** | Every answer links to source + page |
+| **3 AI models** | GPT-OSS 120B, Llama 3.3, Gemma 3 |
+| **Privacy** | Session isolation, 7-day auto-delete |
 ---
 git clone https://github.com/pkgprateek/rag-document-qa-workflow.git
 cd rag-document-qa-workflow
 echo "GROQ_API_KEY=your_key" > .env
 docker compose up
 # → http://localhost:7860
 ```
+**API Keys:** [Groq](https://console.groq.com/keys) (Required) · [OpenRouter](https://openrouter.ai/keys) (Optional)
 ---
 ## 🔒 Privacy
+- Documents processed locally
+- Session-isolated storage
 - Auto-deleted after 7 days
+- Never used for training
 ---
 ## Enterprise Pilots
+**2-week paid pilots** for teams ready to deploy RAG on their infrastructure.
+📅 [Book discovery call](https://cal.com/prateekgoel/30m-discovery-call)
 ---

README.md CHANGED Viewed

@@ -1,80 +1,101 @@
-# QA Enterprise RAG Platform
-**Question your documents. Get cited answers in seconds. Secure, Scalable, Agentic Document Intelligence for the Modern Enterprise.**
 [![Live Demo](https://img.shields.io/badge/🔴_LIVE-Try_Demo-blue?style=for-the-badge)](https://pkgprateek-ai-rag-document.hf.space/)
 [![Deploy](https://github.com/pkgprateek/ai-rag-document/actions/workflows/deploy-to-hf.yml/badge.svg)](https://github.com/pkgprateek/ai-rag-document/actions/workflows/deploy-to-hf.yml)
 [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
-[![MIT License](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
-<!-- Replace with actual screenshot: assets/demo-screenshot.png -->
-<p align="center">
-  <a href="https://pkgprateek-ai-rag-document.hf.space/">
-    <img src="assets/demo-screenshot.jpeg" alt="Enterprise RAG Demo" width="700"/>
-  </a>
-</p>
 ---
-## Why This Matters
-Knowledge workers **spend 2.5 hours daily** searching for information buried in documents. Enterprise RAG eliminates that friction—upload your contracts, research papers, or financial reports, ask questions in plain English, and get precise answers with page citations in under 5 seconds.
 ---
 ## Architecture
 ```mermaid
-flowchart TB
-    subgraph Ingestion ["📥 Ingestion"]
-        A["📄 PDF / DOCX / TXT"]
-        B["✂️ RecursiveTextSplitter<br/>1000 chars · 200 overlap"]
-        A --> B
     end
-    subgraph Indexing ["📊 Indexing"]
-        C["🧠 bge-small-en-v1.5<br/>384-dim embeddings"]
-        D[("💾 ChromaDB<br/>Persistent")]
-        B --> C --> D
     end
-    subgraph Retrieval ["🔍 Retrieval"]
-        E["💬 Question"]
-        F["🎯 Top-4 Similarity"]
-        E --> F
-        D --> F
     end
-    subgraph Generation ["✨ Generation"]
-        G["🤖 Multi-Provider LLM<br/>GPT-OSS 120B (default)<br/>Llama 3.3 70B · Gemma 3 27B"]
-        H["📝 Cited Answer"]
-        F --> G --> H
-    end
 ```
-**Stack**: LangChain 1.0.7 · ChromaDB 1.3.4 · sentence-transformers · Groq + OpenRouter
 ---
-## One-Minute Quickstart
 ```bash
-# Clone and enter
 git clone https://github.com/pkgprateek/rag-document-qa-workflow.git
 cd rag-document-qa-workflow
-# Set your API keys (both free)
-echo "GROQ_API_KEY=your_key_here" > .env
-echo "OPENROUTER_API_KEY=your_key_here" >> .env
-# Run with Docker (recommended)
 docker compose up
 ```
-Open **http://localhost:7860** → Done.
-<details>
-<summary>Alternative: UV (10× faster than pip)</summary>
 ```bash
 uv venv && source .venv/bin/activate
@@ -82,32 +103,9 @@ uv pip install -r requirements.txt
 python app/main.py
 ```
-</details>
-🔑 **Get Your Free API Keys**
-- [Groq API key](https://console.groq.com/keys) (Required - GPT-OSS & Llama models)
-- [OpenRouter API key](https://openrouter.ai/keys) (Optional - Gemma model)
----
-## Production Features Checklist
-> 10 criteria for enterprise-grade RAG. Each is satisfied by this platform.
-| Feature | Description |
-|----------|----------|
-| **Multi-format ingestion** | PDF, DOCX, TXT with intelligent parsing |
-| **Semantic chunking** | 1000-char chunks, 200-char overlap |
-| **Production embeddings** | bge-small-en-v1.5 (MTEB optimized) |
-| **Persistent storage** | ChromaDB survives restarts |
-| **Citation tracking** | Every answer links to source chunks |
-| **Rate limiting** | 10 queries/hour (configurable) |
-| **Privacy controls** | Auto-delete after 7 days |
-| **Monitoring hooks** | Health checks, error logging |
-| **Fast** | 50-200ms response time (p50) |
-| **Portable** | Docker-ready, one-command deploy |
-**[Design Decisions →](docs/DESIGN_DECISIONS.md)** — Deep dive into architectural choices.
 ---
@@ -115,30 +113,27 @@ python app/main.py
 | Metric | Value |
 |--------|-------|
-| **End-to-end Latency (p95)** | 50-200ms |
-| **Latency (p99)** | 200-400ms |
-| **100-page contract** | 3-4s process, 150ms query |
 | **Citation accuracy** | 93-96% relevance |
-| **Throughput** | 1000+ requests/min |
-*Powered by Groq's lightning-fast inference and optimized retrieval*
 ---
-## Consulting & Pilots
-**2-week paid pilots** for enterprise teams:
 | Week | Deliverables |
 |------|--------------|
-| **Week 1** | Ingest your documents, tune chunking for your domain |
-| **Week 2** | Deploy on your infrastructure, team training, ROI analysis |
-**Includes**: Custom RAG system · Performance benchmarks · 30-day support
 <p align="center">
-  <a href="https://cal.com/your-link">
-    <img src="https://img.shields.io/badge/📅_Book_Discovery_Call-blue?style=for-the-badge" alt="Book Call"/>
   </a>
 </p>
@@ -148,14 +143,12 @@ python app/main.py
 **Prateek Kumar Goel**
-[![Live Demo](https://img.shields.io/badge/🚀_Demo-HuggingFace-yellow)](https://huggingface.co/spaces/pkgprateek/ai-rag-document)
 [![GitHub](https://img.shields.io/badge/💻_Code-GitHub-black)](https://github.com/pkgprateek)
 [![HuggingFace](https://img.shields.io/badge/🤗_Profile-HuggingFace-orange)](https://huggingface.co/pkgprateek)
 ---
 <p align="center">
-  <sub>
-    MIT License · Built with production-grade MLOps practices
-  </sub>
 </p>

+# Enterprise RAG Platform
+<div align="center">
+**Turn documents into answers. Instantly.**
+Upload contracts, research papers, or financial reports. Ask questions in plain English. Get precise, cited answers in seconds.
 [![Live Demo](https://img.shields.io/badge/🔴_LIVE-Try_Demo-blue?style=for-the-badge)](https://pkgprateek-ai-rag-document.hf.space/)
 [![Deploy](https://github.com/pkgprateek/ai-rag-document/actions/workflows/deploy-to-hf.yml/badge.svg)](https://github.com/pkgprateek/ai-rag-document/actions/workflows/deploy-to-hf.yml)
 [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
+[![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
+<a href="https://pkgprateek-ai-rag-document.hf.space/">
+  <img src="assets/demo-screenshot.jpeg" alt="Enterprise RAG Demo" width="700"/>
+</a>
+</div>
+---
+## The Problem
+Knowledge workers spend **2.5 hours daily** searching for information buried in documents. Legal teams review contracts manually. Researchers dig through papers. Finance teams hunt for clauses in agreements.
+## The Solution
+**Enterprise RAG** eliminates that friction:
+```
+Upload documents → Ask questions → Get cited answers in <5 seconds
+```
+No more Ctrl+F. No more reading 50 pages to find one clause. Just ask.
 ---
+## Features
+| Feature | What You Get |
+|---------|--------------|
+| **Multi-document upload** | Process multiple files at once with batch progress |
+| **Streaming answers** | Watch answers generate in real-time with thinking indicator |
+| **Inline citations** | Every claim linked to source document + page number |
+| **3 AI models** | GPT-OSS 120B, Llama 3.3 70B, Gemma 3 27B |
+| **Session isolation** | Your documents are private to your session |
+| **Auto-cleanup** | Documents auto-deleted after 7 days |
 ---
 ## Architecture
 ```mermaid
+flowchart LR
+    subgraph Input
+        A[📄 PDF / DOCX / TXT]
     end
+    subgraph Processing
+        B[✂️ Chunk<br/>1000 chars]
+        C[🧠 Embed<br/>bge-small-en-v1.5]
+        D[(💾 ChromaDB)]
     end
+    subgraph Query
+        E[💬 Question]
+        F[🎯 Top-4 Retrieval]
+        G[🤖 LLM Stream]
+        H[📝 Cited Answer]
     end
+    A --> B --> C --> D
+    E --> F --> G --> H
+    D --> F
 ```
+**Stack:** LangChain · ChromaDB · sentence-transformers · Groq + OpenRouter
 ---
+## Quick Start
+### Docker (Recommended)
 ```bash
 git clone https://github.com/pkgprateek/rag-document-qa-workflow.git
 cd rag-document-qa-workflow
+# Add your API keys
+echo "GROQ_API_KEY=your_key" > .env
+echo "OPENROUTER_API_KEY=your_key" >> .env
 docker compose up
 ```
+Open **http://localhost:7860**
+### Local Development
 ```bash
 uv venv && source .venv/bin/activate
 python app/main.py
 ```
+**Get Free API Keys:**
+- [Groq](https://console.groq.com/keys) — Required (GPT-OSS, Llama)
+- [OpenRouter](https://openrouter.ai/keys) — Optional (Gemma)
 ---
 | Metric | Value |
 |--------|-------|
+| **Query latency** | 50-200ms (p95) |
+| **Document processing** | 3-4s for 100 pages |
 | **Citation accuracy** | 93-96% relevance |
+| **Streaming** | First token in <500ms |
 ---
+## Enterprise Pilots
+**2-week paid pilots** for teams ready to deploy RAG on their infrastructure:
 | Week | Deliverables |
 |------|--------------|
+| **Week 1** | Document ingestion, chunking tuned for your domain |
+| **Week 2** | Deployment, team training, ROI analysis |
+**Includes:** Custom RAG system · Performance benchmarks · 30-day support
 <p align="center">
+  <a href="https://cal.com/prateekgoel/30m-discovery-call">
+    <img src="https://img.shields.io/badge/📅_Book_Discovery_Call-00C853?style=for-the-badge" alt="Book Call"/>
   </a>
 </p>
 **Prateek Kumar Goel**
+[![HuggingFace Demo](https://img.shields.io/badge/🚀_Demo-HuggingFace-yellow)](https://huggingface.co/spaces/pkgprateek/ai-rag-document)
 [![GitHub](https://img.shields.io/badge/💻_Code-GitHub-black)](https://github.com/pkgprateek)
 [![HuggingFace](https://img.shields.io/badge/🤗_Profile-HuggingFace-orange)](https://huggingface.co/pkgprateek)
 ---
 <p align="center">
+  <sub>MIT License · Built with ❤️ for enterprise document intelligence</sub>
 </p>

app/main.py CHANGED Viewed

@@ -104,47 +104,65 @@ class DocumentRagApp:
         except Exception as e:
             yield f"❌ Error: {str(e)}", loaded_docs
-    def process_file(self, file, session_id, current_docs):
-        """Process uploaded file with live progress updates"""
         loaded_docs = list(current_docs) if current_docs else []
-        if not file:
             yield "⚠️ Please upload a file", loaded_docs
             return
         try:
-            filename = os.path.basename(file.name)
-            yield f"Processing {filename}...", loaded_docs
-            ext = os.path.splitext(file.name)[1].lower()
-            if ext == ".pdf":
-                chunks = self.processor.process_pdf(file.name)
-            elif ext == ".txt":
-                chunks = self.processor.process_txt(file.name)
-            elif ext == ".docx":
-                chunks = self.processor.process_docx(file.name)
-            else:
-                yield (
-                    "❌ Unsupported format. Please upload PDF, DOCX, or TXT files.",
-                    loaded_docs,
-                )
-                return
-            yield f"✂️ Created {len(chunks)} smart chunks...", loaded_docs
-            yield "Building secure search index...", loaded_docs
-            # Pass session_id for user document isolation
-            self.rag_pipeline.add_documents(
-                chunks, session_id=session_id, is_sample=False
-            )
-            if filename not in loaded_docs:
-                loaded_docs.append(filename)
-            yield (
-                f"✓ Success! {filename} ready for questions ({len(chunks)} searchable chunks)",
-                loaded_docs,
-            )
         except Exception as e:
             yield (
                 f"❌ Error: {str(e)}. Please try again or contact support.",
@@ -181,6 +199,24 @@ class DocumentRagApp:
         except Exception as e:
             return f"Error: {str(e)}"
     def delete_document(self, doc_to_delete, session_id, current_docs):
         """
         Delete a document from the session.
@@ -696,8 +732,9 @@ with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
                     gr.Markdown("### OR UPLOAD DOCUMENTS", elem_classes="card-header")
                     file_upload = gr.File(
                         file_types=[".pdf", ".docx", ".txt"],
                         show_label=True,
-                        height=240,  # Increased height
                     )
                     # Security Badge
@@ -755,7 +792,7 @@ with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
                         elem_classes="doc-checkbox-group",
                     )
                     # Spacing before delete button
-                    gr.HTML('<div style="height: 0.10rem;"></div>')
                     with gr.Row():
                         remove_docs_btn = gr.Button(
                             "🗑️ Delete Selected Documents",
@@ -888,16 +925,35 @@ with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
     )
     # File upload
-    def process_file_wrapper(file, session_data, current_docs):
         session_id = get_session_id(session_data)
-        for status, docs in app.process_file(file, session_id, current_docs):
             checkbox_update, btn_update = update_doc_ui(docs)
-            yield status, docs, checkbox_update, btn_update
     process_btn.click(
         fn=process_file_wrapper,
         inputs=[file_upload, session_state, docs_state],
-        outputs=[upload_status, docs_state, doc_checkboxes, remove_docs_btn],
     )
     # Document deletion (batch removal via checkboxes)
@@ -933,50 +989,61 @@ with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
         fn=app.switch_model, inputs=model_selector, outputs=model_status
     )
-    # Question answering - explicit functions for each quick question
-    def ask_termination(session_data, current_docs):
         session_id = get_session_id(session_data)
-        return app.ask("What are the termination conditions?", session_id, current_docs)
-    def ask_payment(session_data, current_docs):
         session_id = get_session_id(session_data)
-        return app.ask("Summarize payment terms", session_id, current_docs)
-    def ask_findings(session_data, current_docs):
         session_id = get_session_id(session_data)
-        return app.ask("Summarize key findings", session_id, current_docs)
-    def ask_risks(session_data, current_docs):
         session_id = get_session_id(session_data)
-        return app.ask("What are the key risks mentioned?", session_id, current_docs)
-    def ask_custom(question, session_data, current_docs):
         session_id = get_session_id(session_data)
-        return app.ask(question, session_id, current_docs)
     q1.click(
-        fn=ask_termination,
         inputs=[session_state, docs_state],
         outputs=answer,
     )
     q2.click(
-        fn=ask_payment,
         inputs=[session_state, docs_state],
         outputs=answer,
     )
     q3.click(
-        fn=ask_findings,
         inputs=[session_state, docs_state],
         outputs=answer,
     )
     q4.click(
-        fn=ask_risks,
         inputs=[session_state, docs_state],
         outputs=answer,
     )
     ask_btn.click(
-        fn=ask_custom, inputs=[question, session_state, docs_state], outputs=answer
     )
 if __name__ == "__main__":

         except Exception as e:
             yield f"❌ Error: {str(e)}", loaded_docs
+    def process_file(self, files, session_id, current_docs):
+        """Process uploaded file(s) with live progress updates. Supports single or multiple files."""
         loaded_docs = list(current_docs) if current_docs else []
+        if not files:
             yield "⚠️ Please upload a file", loaded_docs
             return
+        # Normalize to list (handles both single file and list of files)
+        file_list = files if isinstance(files, list) else [files]
+        total_files = len(file_list)
+        total_chunks = 0
+        processed_files = []
         try:
+            for idx, file in enumerate(file_list, 1):
+                filename = os.path.basename(file.name)
+                yield f"📄 Processing {idx}/{total_files}: {filename}...", loaded_docs
+                ext = os.path.splitext(file.name)[1].lower()
+                if ext == ".pdf":
+                    chunks = self.processor.process_pdf(file.name)
+                elif ext == ".txt":
+                    chunks = self.processor.process_txt(file.name)
+                elif ext == ".docx":
+                    chunks = self.processor.process_docx(file.name)
+                else:
+                    yield (
+                        f"⚠️ Skipped {filename}: Unsupported format (use PDF, DOCX, or TXT)",
+                        loaded_docs,
+                    )
+                    continue
+                yield f"✂️ {filename}: Created {len(chunks)} chunks...", loaded_docs
+                # Pass session_id for user document isolation
+                self.rag_pipeline.add_documents(
+                    chunks, session_id=session_id, is_sample=False
+                )
+                if filename not in loaded_docs:
+                    loaded_docs.append(filename)
+                total_chunks += len(chunks)
+                processed_files.append(filename)
+            # Final success message
+            if processed_files:
+                if len(processed_files) == 1:
+                    yield (
+                        f"✓ Success! {processed_files[0]} ready ({total_chunks} searchable chunks)",
+                        loaded_docs,
+                    )
+                else:
+                    yield (
+                        f"✓ Success! {len(processed_files)} documents processed ({total_chunks} total chunks)",
+                        loaded_docs,
+                    )
+            else:
+                yield "⚠️ No valid documents to process", loaded_docs
         except Exception as e:
             yield (
                 f"❌ Error: {str(e)}. Please try again or contact support.",
         except Exception as e:
             return f"Error: {str(e)}"
+    def ask_stream(self, question, session_id, current_docs):
+        """Stream answer with thinking indicator for real-time display."""
+        if not current_docs:
+            yield "Please load documents first"
+            return
+        if not question.strip():
+            yield "Please enter a question"
+            return
+        # Thinking indicator
+        yield "🔍 Analyzing documents..."
+        try:
+            for answer_text in self.rag_pipeline.query_stream(question, session_id):
+                yield answer_text
+        except Exception as e:
+            yield f"Error: {str(e)}"
     def delete_document(self, doc_to_delete, session_id, current_docs):
         """
         Delete a document from the session.
                     gr.Markdown("### OR UPLOAD DOCUMENTS", elem_classes="card-header")
                     file_upload = gr.File(
                         file_types=[".pdf", ".docx", ".txt"],
+                        file_count="multiple",  # Enable multi-file selection
                         show_label=True,
+                        height=240,
                     )
                     # Security Badge
                         elem_classes="doc-checkbox-group",
                     )
                     # Spacing before delete button
+                    gr.HTML('<div style="height: 0.01rem;"></div>')
                     with gr.Row():
                         remove_docs_btn = gr.Button(
                             "🗑️ Delete Selected Documents",
     )
     # File upload
+    def process_file_wrapper(files, session_data, current_docs):
         session_id = get_session_id(session_data)
+        # Process files and yield progress
+        final_docs = current_docs
+        for status, docs in app.process_file(files, session_id, current_docs):
             checkbox_update, btn_update = update_doc_ui(docs)
+            final_docs = docs
+            # During processing, keep file visible
+            yield status, docs, checkbox_update, btn_update, gr.update()
+        # After processing, clear the file upload for new uploads
+        checkbox_update, btn_update = update_doc_ui(final_docs)
+        yield (
+            gr.update(value=""),
+            final_docs,
+            checkbox_update,
+            btn_update,
+            gr.update(value=None),
+        )
     process_btn.click(
         fn=process_file_wrapper,
         inputs=[file_upload, session_state, docs_state],
+        outputs=[
+            upload_status,
+            docs_state,
+            doc_checkboxes,
+            remove_docs_btn,
+            file_upload,
+        ],
     )
     # Document deletion (batch removal via checkboxes)
         fn=app.switch_model, inputs=model_selector, outputs=model_status
     )
+    # Question answering - streaming handlers for all questions
+    def ask_termination_stream(session_data, current_docs):
         session_id = get_session_id(session_data)
+        for text in app.ask_stream(
+            "What are the termination conditions?", session_id, current_docs
+        ):
+            yield text
+    def ask_payment_stream(session_data, current_docs):
         session_id = get_session_id(session_data)
+        for text in app.ask_stream("Summarize payment terms", session_id, current_docs):
+            yield text
+    def ask_findings_stream(session_data, current_docs):
         session_id = get_session_id(session_data)
+        for text in app.ask_stream("Summarize key findings", session_id, current_docs):
+            yield text
+    def ask_risks_stream(session_data, current_docs):
         session_id = get_session_id(session_data)
+        for text in app.ask_stream(
+            "What are the key risks mentioned?", session_id, current_docs
+        ):
+            yield text
+    def ask_custom_stream(question, session_data, current_docs):
         session_id = get_session_id(session_data)
+        for text in app.ask_stream(question, session_id, current_docs):
+            yield text
     q1.click(
+        fn=ask_termination_stream,
         inputs=[session_state, docs_state],
         outputs=answer,
     )
     q2.click(
+        fn=ask_payment_stream,
         inputs=[session_state, docs_state],
         outputs=answer,
     )
     q3.click(
+        fn=ask_findings_stream,
         inputs=[session_state, docs_state],
         outputs=answer,
     )
     q4.click(
+        fn=ask_risks_stream,
         inputs=[session_state, docs_state],
         outputs=answer,
     )
     ask_btn.click(
+        fn=ask_custom_stream,
+        inputs=[question, session_state, docs_state],
+        outputs=answer,
     )
 if __name__ == "__main__":

app/rag_pipeline.py CHANGED Viewed

@@ -388,6 +388,106 @@ Answer:""",
         return {"answer": answer_text}
     def _extract_citations(self, source_documents: List[Document]) -> List[dict]:
         """
         Extract formatted citations from source documents with page numbers and previews.

         return {"answer": answer_text}
+    def query_stream(self, question: str, session_id: str = None):
+        """
+        Stream answer tokens for real-time display.
+        Yields tokens as they arrive from the LLM.
+        Args:
+            question: User's question string
+            session_id: User's session ID for filtering results
+        Yields:
+            str: Accumulated answer text (each yield contains full answer so far)
+        """
+        # Check rate limit
+        if not self._check_rate_limit():
+            yield "⚠️ Rate limit exceeded. You can only ask 10 questions per hour. Please try again later."
+            return
+        # Set session ID for filtered retrieval
+        self._current_session_id = session_id
+        # Get documents using retriever (non-streaming part)
+        retriever = self.vector_store.as_retriever(search_kwargs={"k": 4})
+        docs = retriever.invoke(question)
+        # Filter by session
+        if session_id:
+            docs = [
+                d
+                for d in docs
+                if d.metadata.get("session_id") == session_id
+                or d.metadata.get("is_sample", False)
+            ]
+        if not docs:
+            yield "I couldn't find relevant information in your documents. Please try rephrasing your question."
+            return
+        # Build context and sources
+        context = "\n\n".join([d.page_content for d in docs])
+        sources = ", ".join(
+            list(set([d.metadata.get("source", "").split("/")[-1] for d in docs]))
+        )
+        # Format prompt
+        prompt = self._format_prompt(context, sources, question)
+        # Stream from LLM
+        full_answer = ""
+        for chunk in self.llm.stream(prompt):
+            if hasattr(chunk, "content"):
+                full_answer += chunk.content
+            else:
+                full_answer += str(chunk)
+            yield full_answer
+    def _format_prompt(self, context: str, sources: str, question: str) -> str:
+        """
+        Format the RAG prompt with context, sources, and question.
+        Args:
+            context: Retrieved document content
+            sources: Comma-separated source filenames
+            question: User's question
+        Returns:
+            str: Formatted prompt string
+        """
+        return f"""You are an expert AI assistant specializing in document analysis. Your goal is to provide comprehensive, accurate, and well-cited answers.
+Available Documents: {sources}
+Context from Documents:
+{context}
+User Question: {question}
+INSTRUCTIONS FOR YOUR RESPONSE:
+1. **Analyze Thoroughly**: Read the context carefully and identify all relevant information
+2. **Answer Comprehensively**: Provide a complete, detailed answer that fully addresses the question
+3. **Use Proper Structure**:
+   - Start with a clear, direct answer
+   - Follow with supporting details and explanation
+   - Use markdown formatting (headings, bullet points, bold) for readability
+4. **Cite Sources Inline**: As you make specific claims, cite the source immediately
+   - Format: (Source: filename, Page X) or (Source: filename) if page unknown
+   - Example: "The termination period is 30 days (Source: service_agreement.pdf, Page 3)"
+   - Be specific about which document and page number whenever possible
+5. **Include a Sources Section**: At the end of your answer, add:
+   **Sources Referenced:**
+   • filename (Page X) - Brief note about what info came from here
+   • filename2 (Page Y) - Brief note
+6. **Quality Standards**:
+   - Be specific and precise with facts, numbers, dates, and terms
+   - Quote exact phrases when important (use quotation marks)
+   - If information is unclear or missing, state what's uncertain
+   - Connect related points to create a cohesive narrative
+Answer:"""
     def _extract_citations(self, source_documents: List[Document]) -> List[dict]:
         """
         Extract formatted citations from source documents with page numbers and previews.