pkgprateek commited on
Commit
bb9f87e
·
1 Parent(s): 604aa46

feat: Add multi-provider LLM support with UI model selector

Browse files

- Add Groq + OpenRouter provider abstraction
- Support 3 models: GPT-OSS 120B (default), Llama 3.3 70B, Gemma 3 27B
- Premium UI model selector with 45:55 column layout
- Update docs and configuration for multi-provider setup

Files changed (6) hide show
  1. .env.example +7 -2
  2. README-HF.md +4 -2
  3. README.md +8 -5
  4. app/main.py +104 -14
  5. app/rag_pipeline.py +112 -17
  6. docs/DESIGN_DECISIONS.md +11 -1
.env.example CHANGED
@@ -1,6 +1,11 @@
1
  # Environment Variables
2
 
3
- # OpenRouter API Key (Required)
 
 
 
 
 
4
  # Get your FREE key at: https://openrouter.ai/keys
5
- # Using free tier with google/gemma-3-4b-it:free model
6
  OPENROUTER_API_KEY=your_openrouter_api_key_here
 
1
  # Environment Variables
2
 
3
+ # Groq API Key (Required - Default Provider)
4
+ # Get your FREE key at: https://console.groq.com/keys
5
+ # Provides access to GPT-OSS 120B (default) and Llama 3.3 70B models
6
+ GROQ_API_KEY=your_groq_api_key_here
7
+
8
+ # OpenRouter API Key (Optional - For Gemma Model)
9
  # Get your FREE key at: https://openrouter.ai/keys
10
+ # Using free tier with google/gemma-3-27b-it:free model
11
  OPENROUTER_API_KEY=your_openrouter_api_key_here
README-HF.md CHANGED
@@ -59,12 +59,14 @@ No signup required. Your documents are processed locally and auto-deleted after
59
  ```bash
60
  git clone https://github.com/pkgprateek/rag-document-qa-workflow.git
61
  cd rag-document-qa-workflow
62
- echo "OPENROUTER_API_KEY=your_key" > .env
 
63
  docker compose up
64
  # → http://localhost:7860
65
  ```
66
 
67
- [Get free API key](https://openrouter.ai/keys) · [View source on GitHub](https://github.com/pkgprateek/rag-document-qa-workflow)
 
68
 
69
  ---
70
 
 
59
  ```bash
60
  git clone https://github.com/pkgprateek/rag-document-qa-workflow.git
61
  cd rag-document-qa-workflow
62
+ echo "GROQ_API_KEY=your_key" > .env
63
+ echo "OPENROUTER_API_KEY=your_key" >> .env
64
  docker compose up
65
  # → http://localhost:7860
66
  ```
67
 
68
+ **Get Free API Keys:** [Groq](https://console.groq.com/keys) (Required) · [OpenRouter](https://openrouter.ai/keys) (Optional)
69
+ [View source on GitHub](https://github.com/pkgprateek/rag-document-qa-workflow)
70
 
71
  ---
72
 
README.md CHANGED
@@ -46,13 +46,13 @@ flowchart TB
46
  end
47
 
48
  subgraph Generation ["✨ Generation"]
49
- G["🤖 Gemma 3-4B-IT"]
50
  H["📝 Cited Answer"]
51
  F --> G --> H
52
  end
53
  ```
54
 
55
- **Stack**: LangChain 1.0.7 · ChromaDB 1.3.4 · sentence-transformers · OpenRouter
56
 
57
  ---
58
 
@@ -63,8 +63,9 @@ flowchart TB
63
  git clone https://github.com/pkgprateek/rag-document-qa-workflow.git
64
  cd rag-document-qa-workflow
65
 
66
- # Set your API key (free from OpenRouter)
67
- echo "OPENROUTER_API_KEY=your_key_here" > .env
 
68
 
69
  # Run with Docker (recommended)
70
  docker compose up
@@ -83,7 +84,9 @@ python app/main.py
83
 
84
  </details>
85
 
86
- 🔑 [Get free OpenRouter API key](https://openrouter.ai/keys)
 
 
87
 
88
  ---
89
 
 
46
  end
47
 
48
  subgraph Generation ["✨ Generation"]
49
+ G["🤖 Multi-Provider LLM<br/>GPT-OSS 120B (default)<br/>Llama 3.3 70B · Gemma 3 27B"]
50
  H["📝 Cited Answer"]
51
  F --> G --> H
52
  end
53
  ```
54
 
55
+ **Stack**: LangChain 1.0.7 · ChromaDB 1.3.4 · sentence-transformers · Groq + OpenRouter
56
 
57
  ---
58
 
 
63
  git clone https://github.com/pkgprateek/rag-document-qa-workflow.git
64
  cd rag-document-qa-workflow
65
 
66
+ # Set your API keys (both free)
67
+ echo "GROQ_API_KEY=your_key_here" > .env
68
+ echo "OPENROUTER_API_KEY=your_key_here" >> .env
69
 
70
  # Run with Docker (recommended)
71
  docker compose up
 
84
 
85
  </details>
86
 
87
+ 🔑 **Get Your Free API Keys**
88
+ - [Groq API key](https://console.groq.com/keys) (Required - GPT-OSS & Llama models)
89
+ - [OpenRouter API key](https://openrouter.ai/keys) (Optional - Gemma model)
90
 
91
  ---
92
 
app/main.py CHANGED
@@ -61,6 +61,25 @@ class DocumentRagApp:
61
  except Exception as e:
62
  return f"Error: {str(e)}"
63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
  def ask(self, question):
65
  if not self.loaded_documents:
66
  return "Please load documents first"
@@ -168,10 +187,10 @@ span, p, div { font-family: var(--font-body); }
168
  -webkit-backdrop-filter: blur(12px);
169
  border: 1px solid var(--border-glass) !important;
170
  border-radius: 20px !important;
171
- padding: 2rem !important; /* Internal padding for the card content */
172
  margin-bottom: 2rem !important;
173
  box-shadow: 0 20px 40px -10px rgba(0,0,0,0.5) !important;
174
- height: 100% !important; /* Attempt to stretch */
175
  display: flex !important;
176
  flex-direction: column !important;
177
  }
@@ -197,9 +216,15 @@ span, p, div { font-family: var(--font-body); }
197
 
198
  /* Upload Area specific */
199
  .gradio-file {
200
- background-color: rgba(0, 0, 0, 0.2) !important;
201
- border: 2px dashed rgba(255, 255, 255, 0.3) !important; /* Brighter border */
202
  border-radius: 12px !important;
 
 
 
 
 
 
203
  }
204
 
205
  .gradio-dropdown:hover, .gradio-textbox textarea:hover {
@@ -294,6 +319,47 @@ span, p, div { font-family: var(--font-body); }
294
  background: rgba(16, 185, 129, 0.25);
295
  box-shadow: 0 0 20px rgba(16, 185, 129, 0.2);
296
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
297
  """
298
 
299
  with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
@@ -311,9 +377,9 @@ with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
311
  </div>
312
  """)
313
 
314
- with gr.Row(equal_height=True): # Force Row to try to equalize height
315
- # --- LEFT: SETUP CARD ---
316
- with gr.Column(scale=4):
317
  with gr.Group(elem_classes="glass-card"):
318
  gr.Markdown(
319
  "### SELECT SAMPLE DOCUMENTS", elem_classes="card-header"
@@ -342,13 +408,13 @@ with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
342
 
343
  # Visible Divider - Increased Opacity
344
  gr.HTML(
345
- '<div style="margin: 2rem 0; height: 1px; background: rgba(255,255,255,0.3);"></div>'
346
  )
347
 
348
  gr.Markdown("### OR UPLOAD FILES", elem_classes="card-header")
349
  file_upload = gr.File(
350
  file_types=[".pdf", ".docx", ".txt"],
351
- show_label=False,
352
  height=240, # Increased height
353
  )
354
 
@@ -360,13 +426,32 @@ with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
360
  )
361
  upload_status = gr.Markdown("")
362
 
363
- # Spacer to fill height if needed
364
- gr.HTML('<div style="flex-grow: 1;"></div>')
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
365
 
366
- # --- RIGHT: INTERACTION CARD ---
367
- with gr.Column(scale=6):
368
  with gr.Group(elem_classes="glass-card"):
369
- gr.Markdown("### INTELLIGENT ANALYSIS", elem_classes="card-header")
370
 
371
  # Question Input
372
  question = gr.Textbox(
@@ -418,6 +503,11 @@ with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
418
 
419
  process_btn.click(fn=app.process_file, inputs=file_upload, outputs=upload_status)
420
 
 
 
 
 
 
421
  q1.click(
422
  fn=lambda: f"**Query:** Termination Terms\n\n{app.ask('What are the termination conditions?')}",
423
  outputs=answer,
 
61
  except Exception as e:
62
  return f"Error: {str(e)}"
63
 
64
+ def switch_model(self, model_choice):
65
+ """Handle model switching from UI radio button"""
66
+ # Map UI choices to model keys
67
+ model_map = {
68
+ "GPT-OSS 120B (OpenAI) - Default": "gpt-oss-120b",
69
+ "Llama 3.3 70B (Meta)": "llama-3.3-70b",
70
+ "Gemma 3 27B (Google)": "gemma-3-27b",
71
+ }
72
+
73
+ model_key = model_map.get(model_choice)
74
+ if not model_key:
75
+ return f"❌ Invalid model selection"
76
+
77
+ try:
78
+ display_name = self.rag_pipeline.switch_model(model_key)
79
+ return f"✓ Switched to {display_name}"
80
+ except Exception as e:
81
+ return f"❌ Error switching model: {str(e)}"
82
+
83
  def ask(self, question):
84
  if not self.loaded_documents:
85
  return "Please load documents first"
 
187
  -webkit-backdrop-filter: blur(12px);
188
  border: 1px solid var(--border-glass) !important;
189
  border-radius: 20px !important;
190
+ padding: 2rem 2rem 1.5rem 2rem !important; /* Reduced bottom padding */
191
  margin-bottom: 2rem !important;
192
  box-shadow: 0 20px 40px -10px rgba(0,0,0,0.5) !important;
193
+ height: 100% !important;
194
  display: flex !important;
195
  flex-direction: column !important;
196
  }
 
216
 
217
  /* Upload Area specific */
218
  .gradio-file {
219
+ background-color: rgba(0, 0, 0, 0.15) !important;
220
+ border: 2px dashed rgba(255, 255, 255, 0.3) !important;
221
  border-radius: 12px !important;
222
+ padding: 1rem !important;
223
+ }
224
+
225
+ .gradio-file:hover {
226
+ background-color: rgba(0, 0, 0, 0.2) !important;
227
+ border-color: var(--accent) !important;
228
  }
229
 
230
  .gradio-dropdown:hover, .gradio-textbox textarea:hover {
 
319
  background: rgba(16, 185, 129, 0.25);
320
  box-shadow: 0 0 20px rgba(16, 185, 129, 0.2);
321
  }
322
+
323
+ /* --- MODEL SELECTOR --- */
324
+ .model-selector {
325
+ background: rgba(0, 0, 0, 0.15) !important;
326
+ border-radius: 8px !important;
327
+ padding: 0.75rem !important;
328
+ margin-bottom: 1rem !important;
329
+ border: 1px solid var(--border-glass) !important;
330
+ }
331
+
332
+ .model-selector label {
333
+ background: rgba(255, 255, 255, 0.05) !important;
334
+ border: 1px solid var(--border-glass) !important;
335
+ padding: 0.5rem 0.75rem !important;
336
+ border-radius: 6px !important;
337
+ transition: all 0.2s !important;
338
+ cursor: pointer !important;
339
+ margin: 0.2rem 0 !important;
340
+ display: block !important;
341
+ font-size: 0.875rem !important;
342
+ }
343
+
344
+ .model-selector label:hover {
345
+ background: rgba(255, 255, 255, 0.1) !important;
346
+ border-color: var(--accent) !important;
347
+ transform: translateX(3px) !important;
348
+ }
349
+
350
+ .model-selector input:checked + label {
351
+ background: var(--primary-gradient) !important;
352
+ border-color: transparent !important;
353
+ font-weight: 600 !important;
354
+ box-shadow: 0 3px 12px rgba(16, 185, 129, 0.3) !important;
355
+ }
356
+
357
+ .model-status {
358
+ font-size: 0.8rem;
359
+ color: var(--text-secondary);
360
+ padding: 0.25rem 0.5rem;
361
+ margin-top: 0.1rem;
362
+ }
363
  """
364
 
365
  with gr.Blocks(css=css, theme=gr.themes.Base(), title="Enterprise RAG") as demo:
 
377
  </div>
378
  """)
379
 
380
+ with gr.Row(equal_height=True):
381
+ # --- LEFT: SETUP CARD (45%) ---
382
+ with gr.Column(scale=9):
383
  with gr.Group(elem_classes="glass-card"):
384
  gr.Markdown(
385
  "### SELECT SAMPLE DOCUMENTS", elem_classes="card-header"
 
408
 
409
  # Visible Divider - Increased Opacity
410
  gr.HTML(
411
+ '<div style="margin: 2rem 0; height: 1px; background: rgba(255,255,255,0.5);"></div>'
412
  )
413
 
414
  gr.Markdown("### OR UPLOAD FILES", elem_classes="card-header")
415
  file_upload = gr.File(
416
  file_types=[".pdf", ".docx", ".txt"],
417
+ show_label=True,
418
  height=240, # Increased height
419
  )
420
 
 
426
  )
427
  upload_status = gr.Markdown("")
428
 
429
+ # Divider
430
+ gr.HTML(
431
+ '<div style="margin: 1rem 0; height: 1px; background: rgba(255,255,255,0.15);"></div>'
432
+ )
433
+
434
+ # Model Selector (Compact)
435
+ gr.Markdown("**🤖 AI Model**", elem_classes="card-subheader")
436
+ model_selector = gr.Radio(
437
+ choices=[
438
+ "GPT-OSS 120B (OpenAI) - Default",
439
+ "Llama 3.3 70B (Meta)",
440
+ "Gemma 3 27B (Google)",
441
+ ],
442
+ value="GPT-OSS 120B (OpenAI) - Default",
443
+ elem_classes="model-selector",
444
+ show_label=False,
445
+ )
446
+ model_status = gr.Markdown(
447
+ "_GPT-OSS 120B active_",
448
+ elem_classes="model-status",
449
+ )
450
 
451
+ # --- RIGHT: INTERACTION CARD (55%) ---
452
+ with gr.Column(scale=11):
453
  with gr.Group(elem_classes="glass-card"):
454
+ gr.Markdown("### ASK ANYTHING", elem_classes="card-header")
455
 
456
  # Question Input
457
  question = gr.Textbox(
 
503
 
504
  process_btn.click(fn=app.process_file, inputs=file_upload, outputs=upload_status)
505
 
506
+ # Model switching
507
+ model_selector.change(
508
+ fn=app.switch_model, inputs=model_selector, outputs=model_status
509
+ )
510
+
511
  q1.click(
512
  fn=lambda: f"**Query:** Termination Terms\n\n{app.ask('What are the termination conditions?')}",
513
  outputs=answer,
app/rag_pipeline.py CHANGED
@@ -15,13 +15,39 @@ os.environ["TOKENIZERS_PARALLELISM"] = "false"
15
 
16
 
17
  class RAGPipeline:
18
- def __init__(self, persist_directory: str = "./data/chroma_db"):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  """
20
- Initialize RAG pipeline with embeddings, vector store, and LLM.
21
- Sets up rate limiting (10 queries/hour) and uses OpenRouter API with free Gemma model.
22
 
23
  Args:
24
  persist_directory: Path to store ChromaDB vector database (default: ./data/chroma_db)
 
25
  """
26
  # Initialize better embeddings (BAAI/bge-small-en-v1.5)
27
  self.embeddings = HuggingFaceEmbeddings(
@@ -47,25 +73,94 @@ class RAGPipeline:
47
  # Auto-cleanup on initialization
48
  self._cleanup_old_documents()
49
 
50
- # Initialize LLM using OpenRouter (cheapest free option)
51
- openrouter_key = os.getenv("OPENROUTER_API_KEY")
52
- if not openrouter_key:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  raise ValueError(
54
- "OPENROUTER_API_KEY environment variable not set. "
55
- "Get one free at https://openrouter.ai/keys"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
- # Using google/gemma-3-4b-it:free - free tier on OpenRouter
59
- self.llm = ChatOpenAI(
60
- model="google/gemma-3-4b-it:free",
61
- openai_api_key=openrouter_key,
62
- openai_api_base="https://openrouter.ai/api/v1",
63
- temperature=0.1,
64
- max_tokens=512,
65
- )
66
 
67
- # Create RAG chain
 
 
 
 
 
 
 
 
 
 
68
  self.rag_chain = self.create_rag_chain()
 
 
69
 
70
  def create_rag_chain(self):
71
  """
 
15
 
16
 
17
  class RAGPipeline:
18
+ # Model configuration for multi-provider support
19
+ MODEL_CONFIG = {
20
+ "gpt-oss-120b": {
21
+ "provider": "groq",
22
+ "model": "openai/gpt-oss-120b",
23
+ "display": "GPT-OSS 120B (OpenAI)",
24
+ "temperature": 0.1,
25
+ "max_tokens": 1024,
26
+ },
27
+ "llama-3.3-70b": {
28
+ "provider": "groq",
29
+ "model": "llama-3.3-70b-versatile",
30
+ "display": "Llama 3.3 70B (Meta)",
31
+ "temperature": 0.1,
32
+ "max_tokens": 1024,
33
+ },
34
+ "gemma-3-27b": {
35
+ "provider": "openrouter",
36
+ "model": "google/gemma-3-27b-it:free",
37
+ "display": "Gemma 3 27B (Google)",
38
+ "temperature": 0.1,
39
+ "max_tokens": 512,
40
+ },
41
+ }
42
+
43
+ def __init__(self, persist_directory: str = "./data/chroma_db", default_model: str = "gpt-oss-120b"):
44
  """
45
+ Initialize RAG pipeline with embeddings, vector store, and multi-provider LLM support.
46
+ Sets up rate limiting (10 queries/hour) and supports Groq + OpenRouter APIs.
47
 
48
  Args:
49
  persist_directory: Path to store ChromaDB vector database (default: ./data/chroma_db)
50
+ default_model: Model key from MODEL_CONFIG (default: gpt-oss-120b)
51
  """
52
  # Initialize better embeddings (BAAI/bge-small-en-v1.5)
53
  self.embeddings = HuggingFaceEmbeddings(
 
73
  # Auto-cleanup on initialization
74
  self._cleanup_old_documents()
75
 
76
+ # Initialize LLM with default model
77
+ self.current_model = default_model
78
+ self.llm = self._initialize_llm(default_model)
79
+
80
+ # Create RAG chain
81
+ self.rag_chain = self.create_rag_chain()
82
+
83
+ def _initialize_llm(self, model_key: str):
84
+ """
85
+ Initialize LLM based on provider and model configuration.
86
+ Supports both Groq and OpenRouter providers.
87
+
88
+ Args:
89
+ model_key: Key from MODEL_CONFIG dictionary
90
+
91
+ Returns:
92
+ ChatOpenAI: Configured LLM instance
93
+
94
+ Raises:
95
+ ValueError: If model_key is invalid or required API key is missing
96
+ """
97
+ if model_key not in self.MODEL_CONFIG:
98
  raise ValueError(
99
+ f"Invalid model key: {model_key}. "
100
+ f"Available models: {', '.join(self.MODEL_CONFIG.keys())}"
101
+ )
102
+
103
+ config = self.MODEL_CONFIG[model_key]
104
+ provider = config["provider"]
105
+
106
+ if provider == "groq":
107
+ # Groq API configuration
108
+ groq_key = os.getenv("GROQ_API_KEY")
109
+ if not groq_key:
110
+ raise ValueError(
111
+ "GROQ_API_KEY environment variable not set. "
112
+ "Get one free at https://console.groq.com/keys"
113
+ )
114
+
115
+ return ChatOpenAI(
116
+ model=config["model"],
117
+ openai_api_key=groq_key,
118
+ openai_api_base="https://api.groq.com/openai/v1",
119
+ temperature=config["temperature"],
120
+ max_tokens=config["max_tokens"],
121
  )
122
+
123
+ elif provider == "openrouter":
124
+ # OpenRouter API configuration
125
+ openrouter_key = os.getenv("OPENROUTER_API_KEY")
126
+ if not openrouter_key:
127
+ raise ValueError(
128
+ "OPENROUTER_API_KEY environment variable not set. "
129
+ "Get one free at https://openrouter.ai/keys"
130
+ )
131
+
132
+ return ChatOpenAI(
133
+ model=config["model"],
134
+ openai_api_key=openrouter_key,
135
+ openai_api_base="https://openrouter.ai/api/v1",
136
+ temperature=config["temperature"],
137
+ max_tokens=config["max_tokens"],
138
+ )
139
+
140
+ else:
141
+ raise ValueError(f"Unknown provider: {provider}")
142
+
143
+ def switch_model(self, model_key: str) -> str:
144
+ """
145
+ Dynamically switch to a different LLM model and recreate the RAG chain.
146
 
147
+ Args:
148
+ model_key: Key from MODEL_CONFIG dictionary
 
 
 
 
 
 
149
 
150
+ Returns:
151
+ str: Display name of the switched model
152
+
153
+ Raises:
154
+ ValueError: If model_key is invalid or API key is missing
155
+ """
156
+ # Initialize new LLM
157
+ self.llm = self._initialize_llm(model_key)
158
+ self.current_model = model_key
159
+
160
+ # Recreate RAG chain with new LLM
161
  self.rag_chain = self.create_rag_chain()
162
+
163
+ return self.MODEL_CONFIG[model_key]["display"]
164
 
165
  def create_rag_chain(self):
166
  """
docs/DESIGN_DECISIONS.md CHANGED
@@ -8,12 +8,22 @@
8
  | **Embeddings** | bge-small-en-v1.5 | Best quality/speed ratio on MTEB |
9
  | **Vector DB** | ChromaDB | Embedded, persistent, no server |
10
  | **Retrieval** | Top-4 cosine | k=4 tested optimal (vs k=2,8,16) |
11
- | **LLM** | Gemma 3-4B via OpenRouter | Free tier, citation-friendly |
12
  | **Rate limit** | 10/hour | Prevents API abuse |
13
  | **Cleanup** | 7-day auto-delete | Privacy without user friction |
14
 
15
  ---
16
 
 
 
 
 
 
 
 
 
 
 
17
  ## Trade-offs Acknowledged
18
 
19
  - **Speed vs Quality**: Using smaller embeddings (384-dim) trades ~2% accuracy for 3x speed
 
8
  | **Embeddings** | bge-small-en-v1.5 | Best quality/speed ratio on MTEB |
9
  | **Vector DB** | ChromaDB | Embedded, persistent, no server |
10
  | **Retrieval** | Top-4 cosine | k=4 tested optimal (vs k=2,8,16) |
11
+ | **LLM** | GPT-OSS 120B (default), Llama 3.3 70B, Gemma 3 27B | Multi-provider flexibility via Groq + OpenRouter |
12
  | **Rate limit** | 10/hour | Prevents API abuse |
13
  | **Cleanup** | 7-day auto-delete | Privacy without user friction |
14
 
15
  ---
16
 
17
+ ## Model Selection Rationale
18
+
19
+ | Model | Provider | Use Case | Strengths |
20
+ |-------|----------|----------|------------|
21
+ | **GPT-OSS 120B** (Default) | Groq | General enterprise Q&A | Best quality, fast inference, OpenAI architecture |
22
+ | **Llama 3.3 70B** | Groq | Complex reasoning | Open-source, strong context understanding |
23
+ | **Gemma 3 27B** | OpenRouter | Cost-optimized | Free tier, Google-trained, efficient |
24
+
25
+ ---
26
+
27
  ## Trade-offs Acknowledged
28
 
29
  - **Speed vs Quality**: Using smaller embeddings (384-dim) trades ~2% accuracy for 3x speed