Spaces:
Runtime error
Runtime error
Unified Data Source Priority System
π― Updated Priority Order
The system now uses a cost-optimized fallback chain, checking cheaper/faster sources first:
Priority Chain (Best to Last)
1. FAQ Database
ββ Source: Database
ββ Cost: FREE (just DB query)
ββ Speed: FASTEST (~1ms)
ββ Use: Exact/similar FAQ matches
2. Website Scraped Content
ββ Source: Database
ββ Cost: FREE (just DB query + text search)
ββ Speed: FAST (~5ms)
ββ Use: Business-specific content
3. Industry Knowledge Base
ββ Source: MedQuAD, CourseQ, STACKED datasets
ββ Cost: FREE (local data)
ββ Speed: FAST (~10ms)
ββ Use: Healthcare/Education queries
4. Unanswered History
ββ Source: Database (learning from past)
ββ Cost: FREE
ββ Speed: FAST (~5ms)
ββ Use: Similar previously asked questions
5. Small LLM (Your existing system)
ββ Source: OpenAI/local model
ββ Cost: LOW ($$$)
ββ Speed: MODERATE (~1-2s)
ββ Use: General Q&A fallback
6. Gemini AI (LAST RESORT)
ββ Source: Google Gemini
ββ Cost: HIGH ($$$$$)
ββ Speed: MODERATE (~2-3s)
ββ Limit: 50 queries/user/day
ββ Use: ONLY when nothing else works
π° Cost Optimization Strategy
Free Sources (Check First)
FAQ β Scraped Content β Industry KB β History
β β β β
0Β’ 0Β’ 0Β’ 0Β’
Paid Sources (Check Last)
Small LLM β Gemini AI
β β
$0.002 $0.01
(cheap) (expensive)
π Example Query Flow
User Query: "mujhe bukhar hai" (I have fever)
Step 1: FAQ Database
Query: "fever"
Result: β Not found
β Continue
Step 2: Website Scraped Content
Query: "fever"
Result: β No scraped content
β Continue
Step 3: Industry Knowledge Base (Healthcare)
Query: "fever"
Result: β
FOUND!
Source: SymCAT + MedQuAD
Answer: "Symptoms analyzed: fever. Possible: Flu, COVID-19..."
β STOP (answer found, no LLM needed!)
Cost: $0 β
Another Query: "Write me a poem about health"
Step 1: FAQ Database β β Not found
Step 2: Scraped Content β β Not found
Step 3: Industry KB β β Not a medical Q&A
Step 4: History β β Not asked before
Step 5: Small LLM β β
FOUND!
Answer: Generated poem
Cost: $0.002 β
(avoided expensive Gemini)
Edge Case: Gemini Usage
Step 1-4: All checked β β Not found
Step 5: Small LLM β β Failed/unavailable
Step 6: Gemini AI β β
Used (last resort)
Query count: 45/50 (user still has 5 left today)
Answer: Generated response
Cost: $0.01 β οΈ (expensive, but necessary)
π‘οΈ Query Limit Protection
Gemini has daily limits to protect costs:
# Per user per day
DEFAULT_LIMIT = 50 queries/day
# Tracking
- By session_id (anonymous users)
- By visitor_email (logged-in users)
# Behavior
Query 1-49: β
Normal Gemini response
Query 50: β
Last allowed
Query 51+: β οΈ "Daily limit reached" message
β Falls back to generic response
π Data Source Comparison
| Source | Speed | Cost | Accuracy | Availability |
|---|---|---|---|---|
| FAQ | β‘β‘β‘β‘β‘ | FREE | βββββ | If data exists |
| Scraped | β‘β‘β‘β‘ | FREE | ββββ | If data exists |
| Industry KB | β‘β‘β‘β‘ | FREE | βββββ | Always |
| History | β‘β‘β‘β‘ | FREE | βββ | If data exists |
| Small LLM | β‘β‘β‘ | $ | ββββ | Always |
| Gemini | β‘β‘ | $$$$$ | βββββ | If enabled + under limit |
π― Selection Logic
Intent-Based Priority Weights
FAQ Intent:
FAQ: 100% weight
Scraped: 80%
Industry: 60%
Small LLM: 50%
Gemini: 50%
Industry Knowledge Intent:
Industry KB: 100% weight
FAQ: 70%
Small LLM: 65%
Scraped: 50%
Gemini: 50%
Business-Specific Intent:
Scraped: 100% weight
FAQ: 80%
Small LLM: 65%
Industry: 40%
Gemini: 50%
β Benefits
- Cost Savings: Check free sources first
- Speed: Faster responses from local data
- Accuracy: Industry datasets more accurate than generic LLM
- Scalability: Limits on expensive Gemini prevent runaway costs
- Reliability: Multiple fallbacks ensure answers always provided
π§ Configuration
Environment Variables
# Small LLM (existing)
OPENAI_API_KEY=your_openai_key
# Gemini AI (expensive fallback)
GEMINI_API_KEY=your_gemini_key
GEMINI_DAILY_LIMIT_PER_USER=50 # Adjust based on budget
Disable Gemini (Save Money)
# Just don't set GEMINI_API_KEY
# System will use Small LLM as final fallback
π Expected Cost Impact
Before (if using Gemini for everything):
1000 queries/day Γ $0.01 = $10/day = $300/month β
After (with smart fallbacks):
700 FAQ/KB (free) = $0
200 Small LLM Γ $0.002 = $0.40
100 Gemini Γ $0.01 = $1.00
βββββββββββββββββββββββββββββ
Total: $1.40/day = $42/month β
Savings: 86% ($258/month) π
π Summary
Priority: Free sources β Small LLM β Gemini (last)
Protection: Daily query limits per user
Savings: ~86% cost reduction
Speed: Faster with local data first
Accuracy: Better with industry datasets
Gemini is now truly the LAST resort! π―