Spaces:

pkgprateek
/

ai-rag-document

Sleeping

pkgprateek commited on Dec 17, 2025

Commit

bb9f87e

1 Parent(s): 604aa46

feat: Add multi-provider LLM support with UI model selector

- Add Groq + OpenRouter provider abstraction
- Support 3 models: GPT-OSS 120B (default), Llama 3.3 70B, Gemma 3 27B
- Premium UI model selector with 45:55 column layout
- Update docs and configuration for multi-provider setup

Files changed (6) hide show

.env.example +7 -2
README-HF.md +4 -2
README.md +8 -5
app/main.py +104 -14
app/rag_pipeline.py +112 -17
docs/DESIGN_DECISIONS.md +11 -1

.env.example CHANGED Viewed

@@ -1,6 +1,11 @@
 # Environment Variables
-# OpenRouter API Key (Required)
 # Get your FREE key at: https://openrouter.ai/keys
-# Using free tier with google/gemma-3-4b-it:free model
 OPENROUTER_API_KEY=your_openrouter_api_key_here

 # Environment Variables
+# Groq API Key (Required - Default Provider)
+# Get your FREE key at: https://console.groq.com/keys
+# Provides access to GPT-OSS 120B (default) and Llama 3.3 70B models
+GROQ_API_KEY=your_groq_api_key_here
+# OpenRouter API Key (Optional - For Gemma Model)
 # Get your FREE key at: https://openrouter.ai/keys
+# Using free tier with google/gemma-3-27b-it:free model
 OPENROUTER_API_KEY=your_openrouter_api_key_here

README-HF.md CHANGED Viewed

@@ -59,12 +59,14 @@ No signup required. Your documents are processed locally and auto-deleted after
 ```bash
 git clone https://github.com/pkgprateek/rag-document-qa-workflow.git
 cd rag-document-qa-workflow
-echo "OPENROUTER_API_KEY=your_key" > .env
 docker compose up
 # → http://localhost:7860
 ```
-[Get free API key](https://openrouter.ai/keys) · [View source on GitHub](https://github.com/pkgprateek/rag-document-qa-workflow)
 ---

 ```bash
 git clone https://github.com/pkgprateek/rag-document-qa-workflow.git
 cd rag-document-qa-workflow
+echo "GROQ_API_KEY=your_key" > .env
+echo "OPENROUTER_API_KEY=your_key" >> .env
 docker compose up
 # → http://localhost:7860
 ```
+**Get Free API Keys:** [Groq](https://console.groq.com/keys) (Required) · [OpenRouter](https://openrouter.ai/keys) (Optional)
+[View source on GitHub](https://github.com/pkgprateek/rag-document-qa-workflow)
 ---

README.md CHANGED Viewed

@@ -46,13 +46,13 @@ flowchart TB
     end
     subgraph Generation ["✨ Generation"]
-        G["🤖 Gemma 3-4B-IT"]
         H["📝 Cited Answer"]
         F --> G --> H
     end
 ```
-**Stack**: LangChain 1.0.7 · ChromaDB 1.3.4 · sentence-transformers · OpenRouter
 ---
@@ -63,8 +63,9 @@ flowchart TB
 git clone https://github.com/pkgprateek/rag-document-qa-workflow.git
 cd rag-document-qa-workflow
-# Set your API key (free from OpenRouter)
-echo "OPENROUTER_API_KEY=your_key_here" > .env
 # Run with Docker (recommended)
 docker compose up
@@ -83,7 +84,9 @@ python app/main.py
 </details>
-🔑 [Get free OpenRouter API key](https://openrouter.ai/keys)
 ---

     end
     subgraph Generation ["✨ Generation"]
+        G["🤖 Multi-Provider LLM<br/>GPT-OSS 120B (default)<br/>Llama 3.3 70B · Gemma 3 27B"]
         H["📝 Cited Answer"]
         F --> G --> H
     end
 ```
+**Stack**: LangChain 1.0.7 · ChromaDB 1.3.4 · sentence-transformers · Groq + OpenRouter
 ---
 git clone https://github.com/pkgprateek/rag-document-qa-workflow.git
 cd rag-document-qa-workflow
+# Set your API keys (both free)
+echo "GROQ_API_KEY=your_key_here" > .env
+echo "OPENROUTER_API_KEY=your_key_here" >> .env
 # Run with Docker (recommended)
 docker compose up
 </details>
+🔑 **Get Your Free API Keys**
+- [Groq API key](https://console.groq.com/keys) (Required - GPT-OSS & Llama models)
+- [OpenRouter API key](https://openrouter.ai/keys) (Optional - Gemma model)
 ---

app/main.py CHANGED Viewed

@@ -61,6 +61,25 @@ class DocumentRagApp:
         except Exception as e:
             return f"Error: {str(e)}"
     def ask(self, question):
         if not self.loaded_documents:
             return "Please load documents first"
@@ -168,10 +187,10 @@ span, p, div { font-family: var(--font-body); }
     -webkit-backdrop-filter: blur(12px);
     border: 1px solid var(--border-glass) !important;
     border-radius: 20px !important;
-    padding: 2rem !important; /* Internal padding for the card content */
     margin-bottom: 2rem !important;
     box-shadow: 0 20px 40px -10px rgba(0,0,0,0.5) !important;
-    height: 100% !important; /* Attempt to stretch */
     display: flex !important;
     flex-direction: column !important;
 }
@@ -197,9 +216,15 @@ span, p, div { font-family: var(--font-body); }
 /* Upload Area specific */
 .gradio-file {
-    background-color: rgba(0, 0, 0, 0.2) !important;
-    border: 2px dashed rgba(255, 255, 255, 0.3) !important; /* Brighter border */
     border-radius: 12px !important;
 }
 .gradio-dropdown:hover, .gradio-textbox textarea:hover {
@@ -294,6 +319,47 @@ span, p, div { font-family: var(--font-body); }
     background: rgba(16, 185, 129, 0.25);
     box-shadow: 0 0 20px rgba(16, 185, 129, 0.2);
 }
 """
 with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
@@ -311,9 +377,9 @@ with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
             </div>
         """)
-        with gr.Row(equal_height=True):  # Force Row to try to equalize height
-            # --- LEFT: SETUP CARD ---
-            with gr.Column(scale=4):
                 with gr.Group(elem_classes="glass-card"):
                     gr.Markdown(
                         "### SELECT SAMPLE DOCUMENTS", elem_classes="card-header"
@@ -342,13 +408,13 @@ with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
                     # Visible Divider - Increased Opacity
                     gr.HTML(
-                        '<div style="margin: 2rem 0; height: 1px; background: rgba(255,255,255,0.3);"></div>'
                     )
                     gr.Markdown("### OR UPLOAD FILES", elem_classes="card-header")
                     file_upload = gr.File(
                         file_types=[".pdf", ".docx", ".txt"],
-                        show_label=False,
                         height=240,  # Increased height
                     )
@@ -360,13 +426,32 @@ with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
                     )
                     upload_status = gr.Markdown("")
-                    # Spacer to fill height if needed
-                    gr.HTML('<div style="flex-grow: 1;"></div>')
-            # --- RIGHT: INTERACTION CARD ---
-            with gr.Column(scale=6):
                 with gr.Group(elem_classes="glass-card"):
-                    gr.Markdown("### INTELLIGENT ANALYSIS", elem_classes="card-header")
                     # Question Input
                     question = gr.Textbox(
@@ -418,6 +503,11 @@ with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
     process_btn.click(fn=app.process_file, inputs=file_upload, outputs=upload_status)
     q1.click(
         fn=lambda: f"**Query:** Termination Terms\n\n{app.ask('What are the termination conditions?')}",
         outputs=answer,

         except Exception as e:
             return f"Error: {str(e)}"
+    def switch_model(self, model_choice):
+        """Handle model switching from UI radio button"""
+        # Map UI choices to model keys
+        model_map = {
+            "GPT-OSS 120B (OpenAI) - Default": "gpt-oss-120b",
+            "Llama 3.3 70B (Meta)": "llama-3.3-70b",
+            "Gemma 3 27B (Google)": "gemma-3-27b",
+        }
+        model_key = model_map.get(model_choice)
+        if not model_key:
+            return f"❌ Invalid model selection"
+        try:
+            display_name = self.rag_pipeline.switch_model(model_key)
+            return f"✓ Switched to {display_name}"
+        except Exception as e:
+            return f"❌ Error switching model: {str(e)}"
     def ask(self, question):
         if not self.loaded_documents:
             return "Please load documents first"
     -webkit-backdrop-filter: blur(12px);
     border: 1px solid var(--border-glass) !important;
     border-radius: 20px !important;
+    padding: 2rem 2rem 1.5rem 2rem !important; /* Reduced bottom padding */
     margin-bottom: 2rem !important;
     box-shadow: 0 20px 40px -10px rgba(0,0,0,0.5) !important;
+    height: 100% !important;
     display: flex !important;
     flex-direction: column !important;
 }
 /* Upload Area specific */
 .gradio-file {
+    background-color: rgba(0, 0, 0, 0.15) !important;
+    border: 2px dashed rgba(255, 255, 255, 0.3) !important;
     border-radius: 12px !important;
+    padding: 1rem !important;
+}
+.gradio-file:hover {
+    background-color: rgba(0, 0, 0, 0.2) !important;
+    border-color: var(--accent) !important;
 }
 .gradio-dropdown:hover, .gradio-textbox textarea:hover {
     background: rgba(16, 185, 129, 0.25);
     box-shadow: 0 0 20px rgba(16, 185, 129, 0.2);
 }
+/* --- MODEL SELECTOR --- */
+.model-selector {
+    background: rgba(0, 0, 0, 0.15) !important;
+    border-radius: 8px !important;
+    padding: 0.75rem !important;
+    margin-bottom: 1rem !important;
+    border: 1px solid var(--border-glass) !important;
+}
+.model-selector label {
+    background: rgba(255, 255, 255, 0.05) !important;
+    border: 1px solid var(--border-glass) !important;
+    padding: 0.5rem 0.75rem !important;
+    border-radius: 6px !important;
+    transition: all 0.2s !important;
+    cursor: pointer !important;
+    margin: 0.2rem 0 !important;
+    display: block !important;
+    font-size: 0.875rem !important;
+}
+.model-selector label:hover {
+    background: rgba(255, 255, 255, 0.1) !important;
+    border-color: var(--accent) !important;
+    transform: translateX(3px) !important;
+}
+.model-selector input:checked + label {
+    background: var(--primary-gradient) !important;
+    border-color: transparent !important;
+    font-weight: 600 !important;
+    box-shadow: 0 3px 12px rgba(16, 185, 129, 0.3) !important;
+}
+.model-status {
+    font-size: 0.8rem;
+    color: var(--text-secondary);
+    padding: 0.25rem 0.5rem;
+    margin-top: 0.1rem;
+}
 """
 with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
             </div>
         """)
+        with gr.Row(equal_height=True):
+            # --- LEFT: SETUP CARD (45%) ---
+            with gr.Column(scale=9):
                 with gr.Group(elem_classes="glass-card"):
                     gr.Markdown(
                         "### SELECT SAMPLE DOCUMENTS", elem_classes="card-header"
                     # Visible Divider - Increased Opacity
                     gr.HTML(
+                        '<div style="margin: 2rem 0; height: 1px; background: rgba(255,255,255,0.5);"></div>'
                     )
                     gr.Markdown("### OR UPLOAD FILES", elem_classes="card-header")
                     file_upload = gr.File(
                         file_types=[".pdf", ".docx", ".txt"],
+                        show_label=True,
                         height=240,  # Increased height
                     )
                     )
                     upload_status = gr.Markdown("")
+                    # Divider
+                    gr.HTML(
+                        '<div style="margin: 1rem 0; height: 1px; background: rgba(255,255,255,0.15);"></div>'
+                    )
+                    # Model Selector (Compact)
+                    gr.Markdown("**🤖 AI Model**", elem_classes="card-subheader")
+                    model_selector = gr.Radio(
+                        choices=[
+                            "GPT-OSS 120B (OpenAI) - Default",
+                            "Llama 3.3 70B (Meta)",
+                            "Gemma 3 27B (Google)",
+                        ],
+                        value="GPT-OSS 120B (OpenAI) - Default",
+                        elem_classes="model-selector",
+                        show_label=False,
+                    )
+                    model_status = gr.Markdown(
+                        "_GPT-OSS 120B active_",
+                        elem_classes="model-status",
+                    )
+            # --- RIGHT: INTERACTION CARD (55%) ---
+            with gr.Column(scale=11):
                 with gr.Group(elem_classes="glass-card"):
+                    gr.Markdown("### ASK ANYTHING", elem_classes="card-header")
                     # Question Input
                     question = gr.Textbox(
     process_btn.click(fn=app.process_file, inputs=file_upload, outputs=upload_status)
+    # Model switching
+    model_selector.change(
+        fn=app.switch_model, inputs=model_selector, outputs=model_status
+    )
     q1.click(
         fn=lambda: f"**Query:** Termination Terms\n\n{app.ask('What are the termination conditions?')}",
         outputs=answer,

app/rag_pipeline.py CHANGED Viewed

@@ -15,13 +15,39 @@ os.environ["TOKENIZERS_PARALLELISM"] = "false"
 class RAGPipeline:
-    def __init__(self, persist_directory: str = "./data/chroma_db"):
         """
-        Initialize RAG pipeline with embeddings, vector store, and LLM.
-        Sets up rate limiting (10 queries/hour) and uses OpenRouter API with free Gemma model.
         Args:
             persist_directory: Path to store ChromaDB vector database (default: ./data/chroma_db)
         """
         # Initialize better embeddings (BAAI/bge-small-en-v1.5)
         self.embeddings = HuggingFaceEmbeddings(
@@ -47,25 +73,94 @@ class RAGPipeline:
         # Auto-cleanup on initialization
         self._cleanup_old_documents()
-        # Initialize LLM using OpenRouter (cheapest free option)
-        openrouter_key = os.getenv("OPENROUTER_API_KEY")
-        if not openrouter_key:
             raise ValueError(
-                "OPENROUTER_API_KEY environment variable not set. "
-                "Get one free at https://openrouter.ai/keys"
             )
-        # Using google/gemma-3-4b-it:free - free tier on OpenRouter
-        self.llm = ChatOpenAI(
-            model="google/gemma-3-4b-it:free",
-            openai_api_key=openrouter_key,
-            openai_api_base="https://openrouter.ai/api/v1",
-            temperature=0.1,
-            max_tokens=512,
-        )
-        # Create RAG chain
         self.rag_chain = self.create_rag_chain()
     def create_rag_chain(self):
         """

 class RAGPipeline:
+    # Model configuration for multi-provider support
+    MODEL_CONFIG = {
+        "gpt-oss-120b": {
+            "provider": "groq",
+            "model": "openai/gpt-oss-120b",
+            "display": "GPT-OSS 120B (OpenAI)",
+            "temperature": 0.1,
+            "max_tokens": 1024,
+        },
+        "llama-3.3-70b": {
+            "provider": "groq",
+            "model": "llama-3.3-70b-versatile",
+            "display": "Llama 3.3 70B (Meta)",
+            "temperature": 0.1,
+            "max_tokens": 1024,
+        },
+        "gemma-3-27b": {
+            "provider": "openrouter",
+            "model": "google/gemma-3-27b-it:free",
+            "display": "Gemma 3 27B (Google)",
+            "temperature": 0.1,
+            "max_tokens": 512,
+        },
+    }
+    def __init__(self, persist_directory: str = "./data/chroma_db", default_model: str = "gpt-oss-120b"):
         """
+        Initialize RAG pipeline with embeddings, vector store, and multi-provider LLM support.
+        Sets up rate limiting (10 queries/hour) and supports Groq + OpenRouter APIs.
         Args:
             persist_directory: Path to store ChromaDB vector database (default: ./data/chroma_db)
+            default_model: Model key from MODEL_CONFIG (default: gpt-oss-120b)
         """
         # Initialize better embeddings (BAAI/bge-small-en-v1.5)
         self.embeddings = HuggingFaceEmbeddings(
         # Auto-cleanup on initialization
         self._cleanup_old_documents()
+        # Initialize LLM with default model
+        self.current_model = default_model
+        self.llm = self._initialize_llm(default_model)
+        # Create RAG chain
+        self.rag_chain = self.create_rag_chain()
+    def _initialize_llm(self, model_key: str):
+        """
+        Initialize LLM based on provider and model configuration.
+        Supports both Groq and OpenRouter providers.
+        Args:
+            model_key: Key from MODEL_CONFIG dictionary
+        Returns:
+            ChatOpenAI: Configured LLM instance
+        Raises:
+            ValueError: If model_key is invalid or required API key is missing
+        """
+        if model_key not in self.MODEL_CONFIG:
             raise ValueError(
+                f"Invalid model key: {model_key}. "
+                f"Available models: {', '.join(self.MODEL_CONFIG.keys())}"
+            )
+        config = self.MODEL_CONFIG[model_key]
+        provider = config["provider"]
+        if provider == "groq":
+            # Groq API configuration
+            groq_key = os.getenv("GROQ_API_KEY")
+            if not groq_key:
+                raise ValueError(
+                    "GROQ_API_KEY environment variable not set. "
+                    "Get one free at https://console.groq.com/keys"
+                )
+            return ChatOpenAI(
+                model=config["model"],
+                openai_api_key=groq_key,
+                openai_api_base="https://api.groq.com/openai/v1",
+                temperature=config["temperature"],
+                max_tokens=config["max_tokens"],
             )
+        elif provider == "openrouter":
+            # OpenRouter API configuration
+            openrouter_key = os.getenv("OPENROUTER_API_KEY")
+            if not openrouter_key:
+                raise ValueError(
+                    "OPENROUTER_API_KEY environment variable not set. "
+                    "Get one free at https://openrouter.ai/keys"
+                )
+            return ChatOpenAI(
+                model=config["model"],
+                openai_api_key=openrouter_key,
+                openai_api_base="https://openrouter.ai/api/v1",
+                temperature=config["temperature"],
+                max_tokens=config["max_tokens"],
+            )
+        else:
+            raise ValueError(f"Unknown provider: {provider}")
+    def switch_model(self, model_key: str) -> str:
+        """
+        Dynamically switch to a different LLM model and recreate the RAG chain.
+        Args:
+            model_key: Key from MODEL_CONFIG dictionary
+        Returns:
+            str: Display name of the switched model
+        Raises:
+            ValueError: If model_key is invalid or API key is missing
+        """
+        # Initialize new LLM
+        self.llm = self._initialize_llm(model_key)
+        self.current_model = model_key
+        # Recreate RAG chain with new LLM
         self.rag_chain = self.create_rag_chain()
+        return self.MODEL_CONFIG[model_key]["display"]
     def create_rag_chain(self):
         """

docs/DESIGN_DECISIONS.md CHANGED Viewed

@@ -8,12 +8,22 @@
 | **Embeddings** | bge-small-en-v1.5 | Best quality/speed ratio on MTEB |
 | **Vector DB** | ChromaDB | Embedded, persistent, no server |
 | **Retrieval** | Top-4 cosine | k=4 tested optimal (vs k=2,8,16) |
-| **LLM** | Gemma 3-4B via OpenRouter | Free tier, citation-friendly |
 | **Rate limit** | 10/hour | Prevents API abuse |
 | **Cleanup** | 7-day auto-delete | Privacy without user friction |
 ---
 ## Trade-offs Acknowledged
 - **Speed vs Quality**: Using smaller embeddings (384-dim) trades ~2% accuracy for 3x speed

 | **Embeddings** | bge-small-en-v1.5 | Best quality/speed ratio on MTEB |
 | **Vector DB** | ChromaDB | Embedded, persistent, no server |
 | **Retrieval** | Top-4 cosine | k=4 tested optimal (vs k=2,8,16) |
+| **LLM** | GPT-OSS 120B (default), Llama 3.3 70B, Gemma 3 27B | Multi-provider flexibility via Groq + OpenRouter |
 | **Rate limit** | 10/hour | Prevents API abuse |
 | **Cleanup** | 7-day auto-delete | Privacy without user friction |
 ---
+## Model Selection Rationale
+| Model | Provider | Use Case | Strengths |
+|-------|----------|----------|------------|
+| **GPT-OSS 120B** (Default) | Groq | General enterprise Q&A | Best quality, fast inference, OpenAI architecture |
+| **Llama 3.3 70B** | Groq | Complex reasoning | Open-source, strong context understanding |
+| **Gemma 3 27B** | OpenRouter | Cost-optimized | Free tier, Google-trained, efficient |
+---
 ## Trade-offs Acknowledged
 - **Speed vs Quality**: Using smaller embeddings (384-dim) trades ~2% accuracy for 3x speed