Spaces:

anasraza526
/

customeragent-api

Runtime error

┌─────────────────────────────────────────────────────────────┐
│ Step 1: Language Detection                                  │
└─────────────────────────────────────────────────────────────┘
Input: "mujhe bukhar hai"
→ Detected: Roman Urdu (confidence: 0.95)

┌─────────────────────────────────────────────────────────────┐
│ Step 2: Translation to English (for processing)             │
└─────────────────────────────────────────────────────────────┘
Roman Urdu: "mujhe bukhar hai"
→ English: "I have fever"

┌─────────────────────────────────────────────────────────────┐
│ Step 3: Intent Classification                               │
└─────────────────────────────────────────────────────────────┘
Query: "I have fever"
Industry: Healthcare
→ Intent: INDUSTRY_KNOWLEDGE (confidence: 0.85)

┌─────────────────────────────────────────────────────────────┐
│ Step 4: Check Data Source Availability                      │
└─────────────────────────────────────────────────────────────┘
Checking website_id: 123

✅ FAQ: Has 15 FAQs
❌ Scraped Content: Empty (no data)
✅ Industry KB: Healthcare module available
❌ Unanswered History: No history yet
✅ LLM Fallback: Always available

Available: [FAQ, Industry KB, LLM]

┌─────────────────────────────────────────────────────────────┐
│ Step 5: Query All Available Sources                         │
└─────────────────────────────────────────────────────────────┘

Source 1: FAQ Database
  Query: "I have fever"
  Result: No exact match
  Confidence: 0.0
  ❌ Not found

Source 2: Industry Knowledge Base (Healthcare)
  Query: "I have fever"
  Result: Found symptom checker
  Confidence: 0.90
  ✅ Found! (Fever → Flu, COVID-19, Infection)

Source 3: LLM Fallback
  Query: "I have fever"
  Result: Generic response
  Confidence: 0.60
  ✅ Available

┌─────────────────────────────────────────────────────────────┐
│ Step 6: Select Best Result                                  │
└─────────────────────────────────────────────────────────────┘

Results ranked by: Confidence × Priority Weight

For INDUSTRY_KNOWLEDGE intent:
- Industry KB: 0.90 × 1.0 = 0.90 ✅ WINNER
- LLM Fallback: 0.60 × 0.5 = 0.30

Selected: Industry Knowledge Base

┌─────────────────────────────────────────────────────────────┐
│ Step 7: Translate Answer Back to User's Language            │
└─────────────────────────────────────────────────────────────┘

Answer (English): "You have fever. Possible conditions: Flu, COVID-19..."
Target Language: Roman Urdu
→ Translated: "Aapko bukhar hai. Mumkina bimariyan: Flu, COVID-19..."

┌─────────────────────────────────────────────────────────────┐
│ Final Response                                               │
└─────────────────────────────────────────────────────────────┘
{
  "answer": "Aapko bukhar hai. Mumkina bimariyan: Flu, COVID-19...",
  "source": "industry_knowledge",
  "confidence": 0.90,
  "language_detected": "ur-roman",
  "intent": "INDUSTRY_KNOWLEDGE"
}

🎯 Priority System by Intent

1. FAQ Intent

Priority Order:
1. FAQ Database (100%) ← Try first
2. Website Scraped (80%)
3. Industry KB (60%)
4. LLM Fallback (50%)

Example: "What are your hours?"

Checks FAQ first (highest priority)
If not in FAQ, check website content
Fallback to LLM

2. Industry Knowledge Intent

Priority Order:
1. Industry KB (100%) ← Try first
2. FAQ Database (70%)
3. Website Scraped (50%)
4. LLM Fallback (50%)

Example: "What is diabetes?"

Healthcare module first
Falls back to FAQ if medical answer not found
LLM as last resort

3. Business-Specific Intent

Priority Order:
1. Website Scraped (100%) ← Try first
2. FAQ Database (80%)
3. Industry KB (40%)
4. LLM Fallback (50%)

Example: "Who is Dr. Khan?"

Checks scraped website content first
FAQ second
LLM if not found

🌐 Multilingual Support

Language Flow

User Query (Any Language)
    ↓
Detect Language (en/ur/ur-roman)
    ↓
Translate to English (if needed)
    ↓
Process with English (all data sources)
    ↓
Translate Answer Back (to user's language)
    ↓
Return in Original Language

Examples

Scenario 1: Urdu Query

Input:  "داخلے کی شرائط کیا ہیں؟"
Detect: Urdu
Translate: "What are admission requirements?"
Process: Query FAQ + Education KB
Answer: "Minimum GPA 3.0..."
Translate: "کم از کم GPA 3.0..."
Output: "کم از کم GPA 3.0..."

Scenario 2: Roman Urdu Query

Input:  "fee kitni hai?"
Detect: Roman Urdu
Translate: "How much is the fee?"
Process: Query FAQ first
Answer: "Annual fee is $5000"
Translate: "Saalana fee $5000 hai"
Output: "Saalana fee $5000 hai"

🔄 Handling Empty Data Sources

Case 1: No FAQ Data

# System automatically detects
available_sources[FAQ] = False  # No FAQs in database

# Skip FAQ, move to next priority
→ Checks: Website Scraped → Industry KB → LLM

Case 2: No Scraped Content

# Website has no scraped content
available_sources[WEBSITE_SCRAPED] = False

# Skip scraped content
→ Checks: FAQ → Industry KB → LLM

Case 3: All Business Data Empty (FAQ + Scraped both empty)

# Fallback chain:
1. FAQ → Empty ❌
2. Scraped → Empty ❌
3. Industry KB → Available ✅ (if industry query)
4. LLM → Always available ✅

Result: User still gets an answer!

📝 Smart Features

1. Automatic Availability Detection

# Before querying, system checks:
- Does website have scraped content? (query DB)
- Are there active FAQs? (count)
- Is there unanswered history? (check logs)

# Only queries available sources
# Skips empty ones automatically

2. Confidence-Based Selection

# Even if FAQ found, may use Industry KB if higher confidence
FAQ: confidence = 0.65
Industry KB: confidence = 0.90

→ Selects Industry KB (higher confidence)

3. Unanswered Question Logging

# If confidence < 0.5 or not found:
log_unanswered_question(query, website_id)

# Later, admin can:
- Review unanswered questions
- Add to FAQ manually
- Improve knowledge base

4. Learning from History

# Future feature:
# If same question asked multiple times
# → Auto-suggest adding to FAQ
# → Learn patterns

💻 Usage Example

from app.services.unified_data_manager import get_unified_manager

# Initialize with database session
manager = get_unified_manager(db_session)

# Query (any language)
result = await manager.query(
    user_query="mujhe bukhar hai",  # Roman Urdu
    website_id=123,
    industry="healthcare",
    session_id="session_456"
)

# Result:
{
    "answer": "Aapko bukhar hai. Mumkina bimariyan: Flu...",
    "source": "industry_knowledge",  # Which source was used
    "confidence": 0.90,
    "language_detected": "ur-roman",
    "intent": "INDUSTRY_KNOWLEDGE",
    "data_sources_checked": ["faq", "industry_knowledge", "llm_fallback"]
}

🎯 Data Source Priority Matrix

Intent	1st Priority	2nd Priority	3rd Priority	Fallback
FAQ	FAQ (1.0)	Scraped (0.8)	Industry (0.6)	LLM (0.5)
Industry Knowledge	Industry (1.0)	FAQ (0.7)	Scraped (0.5)	LLM (0.5)
Business Specific	Scraped (1.0)	FAQ (0.8)	Industry (0.4)	LLM (0.5)
Creative	LLM (1.0)	-	-	-

✅ Benefits

1. Graceful Degradation

If primary source empty → automatic fallback
User always gets an answer
No errors shown to user

2. Multilingual Support

Same code handles all 3 languages
Automatic translation in/out
Language-aware responses

3. Context-Aware Routing

Intent determines priority
Industry influences search
Confidence-based selection

4. Learn and Improve

Logs unanswered questions
Track what users ask
Identify knowledge gaps

5. Flexible Architecture

Easy to add new data sources
Configurable priorities
Modular components

🚀 Future Enhancements

Vector Search: Use embeddings for better matching
Hybrid Retrieval: Combine keyword + semantic search
Answer Fusion: Merge answers from multiple sources
Learning Loop: Auto-improve from unanswered questions
Caching: Cache frequently asked questions
Analytics: Track which sources perform best

📊 Summary

Your Problem: Multiple data sources, may be empty, 3 languages, need smart routing

Solution: Unified Data Source Manager

✅ Automatically checks all sources
✅ Prioritizes by intent
✅ Handles empty data gracefully
✅ Works in any language
✅ Always provides answer (LLM fallback)
✅ Logs unanswered for learning

Result: One unified interface that intelligently handles everything!