--- title: QModel emoji: πŸ•Œ colorFrom: green colorTo: blue sdk: docker app_port: 8000 license: mit tags: - quran - hadith - islamic - rag - faiss - nlp - arabic language: - ar - en --- # QModel v6 β€” Islamic RAG System **Specialized Qur'an & Hadith Knowledge System with Dual LLM Support** > A production-ready Retrieval-Augmented Generation system specialized exclusively in authenticated Islamic knowledge. No hallucinations, no outside knowledgeβ€”only content from verified sources. ![Version](https://img.shields.io/badge/version-6.0.0-blue) ![Backend](https://img.shields.io/badge/backend-ollama%20%7C%20huggingface-green) ![Status](https://img.shields.io/badge/status-production--ready-success) --- ## Features ### πŸ“– Qur'an Capabilities - **Verse Lookup**: Find verses by topic or keyword - **Word Frequency**: Count occurrences with Surah breakdown - **Bilingual**: Full Arabic + English translation support - **Tafsir Integration**: AI-powered contextual interpretation ### πŸ“š Hadith Capabilities - **Authenticity Verification**: Check if Hadith is in authenticated collections - **Grade Display**: Show Sahih/Hasan/Da'if authenticity levels - **Topic Search**: Find relevant Hadiths across 9 major collections - **Collection Navigation**: Filter by Bukhari, Muslim, Abu Dawud, etc. ### πŸ›‘οΈ Safety Features - **Confidence Gating**: Low-confidence queries return "not found" instead of guesses - **Source Attribution**: Every answer cites exact verse/Hadith reference - **Verbatim Quotes**: Text copied directly from data, never paraphrased - **Anti-Hallucination**: Hardened prompts with few-shot "not found" examples ### πŸš€ Integration - **OpenAI-Compatible API**: Use with Open-WebUI, Langchain, or any OpenAI client - **OpenAI Schema**: Full support for `/v1/chat/completions` and `/v1/models` - **Streaming Responses**: SSE streaming for long-form answers ### βš™οΈ Technical - **Dual LLM Backend**: Ollama (dev) + HuggingFace (prod) - **Hybrid Search**: Dense (FAISS) + Sparse (BM25) scoring - **Async API**: FastAPI with async/await throughout - **Caching**: TTL-based LRU cache for frequent queries - **Scale**: 6,236 Quranic verses + 41,390 Hadiths indexed --- ## Quick Start ### Prerequisites - Python 3.10+ - 16 GB RAM minimum (for embeddings + LLM) - GPU recommended for HuggingFace backend - Ollama installed (for local development) OR internet access (for HuggingFace) ### Installation ```bash # Clone and enter project git clone https://github.com/Logicsoft/QModel.git && cd QModel python3 -m venv .venv && source .venv/bin/activate pip install -r requirements.txt # Configure (choose one backend) # Option A β€” Ollama (local development): export LLM_BACKEND=ollama export OLLAMA_MODEL=llama2 # Make sure Ollama is running: ollama serve # Option B β€” HuggingFace (production): export LLM_BACKEND=hf export HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct # Run python main.py # Query curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?" ``` API docs: http://localhost:8000/docs ### Data & Index Pre-built data files are included: - `metadata.json` β€” 47,626 documents (6,236 Quran verses + 41,390 hadiths from 9 canonical collections) - `QModel.index` β€” FAISS search index To rebuild after dataset changes: ```bash python build_index.py ``` --- ## Example Queries ```bash # Basic question curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?" # Word frequency curl "http://localhost:8000/ask?q=How%20many%20times%20is%20mercy%20mentioned?" # Authentic Hadiths only curl "http://localhost:8000/ask?q=prayer&source_type=hadith&grade_filter=sahih" # Quran text search curl "http://localhost:8000/quran/search?q=bismillah" # Quran topic search curl "http://localhost:8000/quran/topic?topic=patience&top_k=5" # Quran word frequency curl "http://localhost:8000/quran/word-frequency?word=mercy" # Single chapter curl "http://localhost:8000/quran/chapter/2" # Exact verse curl "http://localhost:8000/quran/verse/2:255" # Hadith text search curl "http://localhost:8000/hadith/search?q=actions+are+judged+by+intentions" # Hadith topic search (Sahih only) curl "http://localhost:8000/hadith/topic?topic=fasting&grade_filter=sahih" # Verify Hadith authenticity curl "http://localhost:8000/hadith/verify?q=Actions%20are%20judged%20by%20intentions" # Browse a collection curl "http://localhost:8000/hadith/collection/bukhari?limit=5" # Streaming (OpenAI-compatible) curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"QModel","messages":[{"role":"user","content":"What does Islam say about charity?"}],"stream":true}' ``` --- ## Configuration All configuration via environment variables (`.env` file or exported directly): ### Backend Selection | Backend | Pros | Cons | When to Use | |---------|------|------|------------| | **Ollama** | Fast setup, no GPU, free | Smaller models | Development, testing | | **HuggingFace** | Larger models, better quality | Requires GPU or significant RAM | Production | ### Ollama Backend (Development) ```bash LLM_BACKEND=ollama OLLAMA_HOST=http://localhost:11434 OLLAMA_MODEL=llama2 # or: mistral, neural-chat, orca-mini ``` Requires: `ollama serve` running and model pulled (`ollama pull llama2`). ### HuggingFace Backend (Production) ```bash LLM_BACKEND=hf HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct HF_DEVICE=auto # auto | cuda | cpu HF_MAX_NEW_TOKENS=2048 ``` ### All Environment Variables | Variable | Default | Description | |----------|---------|-------------| | **Backend** | | | | `LLM_BACKEND` | `hf` | `ollama` or `hf` | | `OLLAMA_HOST` | `http://localhost:11434` | Ollama server URL | | `OLLAMA_MODEL` | `llama2` | Ollama model name | | `HF_MODEL_NAME` | `Qwen/Qwen2-7B-Instruct` | HuggingFace model ID | | `HF_DEVICE` | `auto` | `auto`, `cuda`, or `cpu` | | `HF_MAX_NEW_TOKENS` | `2048` | Max output length | | **Embedding & Data** | | | | `EMBED_MODEL` | `intfloat/multilingual-e5-large` | Embedding model | | `FAISS_INDEX` | `QModel.index` | Index file path | | `METADATA_FILE` | `metadata.json` | Dataset file | | **Retrieval** | | | | `TOP_K_SEARCH` | `20` | Candidate pool (5–100) | | `TOP_K_RETURN` | `5` | Results shown to user (1–20) | | `RERANK_ALPHA` | `0.6` | Dense vs Sparse weight (0.0–1.0) | | **Generation** | | | | `TEMPERATURE` | `0.2` | Creativity (0.0–1.0, use 0.1–0.2 for religious) | | `MAX_TOKENS` | `2048` | Max response length | | **Safety** | | | | `CONFIDENCE_THRESHOLD` | `0.30` | Min score to call LLM (higher = fewer hallucinations) | | `HADITH_BOOST` | `0.08` | Score boost for hadith on hadith queries | | **Other** | | | | `CACHE_SIZE` | `512` | Query response cache entries | | `CACHE_TTL` | `3600` | Cache expiry in seconds | | `ALLOWED_ORIGINS` | `*` | CORS origins | | `MAX_EXAMPLES` | `3` | Few-shot examples in system prompt | ### Configuration Examples **Development (Ollama)** ```bash LLM_BACKEND=ollama OLLAMA_HOST=http://localhost:11434 OLLAMA_MODEL=llama2 TEMPERATURE=0.2 CONFIDENCE_THRESHOLD=0.30 ALLOWED_ORIGINS=* ``` **Production (HuggingFace + GPU)** ```bash LLM_BACKEND=hf HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct HF_DEVICE=cuda TOP_K_SEARCH=30 TEMPERATURE=0.1 CONFIDENCE_THRESHOLD=0.35 ALLOWED_ORIGINS=yourdomain.com,api.yourdomain.com ``` ### Tuning Tips - **Better results**: Increase `TOP_K_SEARCH`, lower `CONFIDENCE_THRESHOLD`, use `TEMPERATURE=0.1` - **Faster performance**: Lower `TOP_K_SEARCH` and `TOP_K_RETURN`, reduce `MAX_TOKENS`, use Ollama - **More conservative**: Increase `CONFIDENCE_THRESHOLD`, lower `TEMPERATURE` --- ## Docker Deployment ### Docker Compose (Recommended) ```bash cp .env.example .env # Configure backend (see Configuration section) docker-compose up ``` ### Docker CLI ```bash docker build -t qmodel . # With Ollama backend docker run -p 8000:8000 \ --env-file .env \ --add-host host.docker.internal:host-gateway \ qmodel # With HuggingFace backend docker run -p 8000:8000 \ --env-file .env \ --env HF_TOKEN=your_token_here \ qmodel ``` ### Docker with Ollama ```bash # .env LLM_BACKEND=ollama OLLAMA_HOST=http://host.docker.internal:11434 OLLAMA_MODEL=llama2 ``` Requires Ollama running on the host (`ollama serve`). ### Docker with HuggingFace ```bash # .env LLM_BACKEND=hf HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct HF_DEVICE=auto # Pass HF token export HF_TOKEN=hf_xxxxxxxxxxxxx docker-compose up ``` ### Docker Compose with GPU (Linux) ```yaml services: qmodel: deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] ``` ### Production Tips - Remove dev volume mount (`.:/app`) in `docker-compose.yml` - Set `restart: on-failure:5` - Use specific `ALLOWED_ORIGINS` instead of `*` --- ## Open-WebUI Integration QModel is fully OpenAI-compatible and works out of the box with Open-WebUI. ### Setup ```bash # Start QModel python main.py # Start Open-WebUI docker run -d -p 3000:8080 --name open-webui ghcr.io/open-webui/open-webui:latest ``` ### Connect 1. **Settings** β†’ **Models** β†’ **Manage Models** 2. Click **"Connect to OpenAI-compatible API"** 3. **API Base URL**: `http://localhost:8000/v1` 4. **Model Name**: `QModel` 5. **API Key**: Leave blank 6. **Save & Test** β†’ βœ… Connected ### Docker Compose (QModel + Ollama + Open-WebUI) ```yaml version: '3.8' services: qmodel: build: . ports: - "8000:8000" environment: - LLM_BACKEND=ollama - OLLAMA_HOST=http://ollama:11434 ollama: image: ollama/ollama:latest ports: - "11434:11434" web-ui: image: ghcr.io/open-webui/open-webui:latest ports: - "3000:8080" depends_on: - qmodel ``` ### Supported Features | Feature | Status | |---------|--------| | Chat | βœ… Full support | | Streaming | βœ… `stream: true` | | Multi-turn context | βœ… Handled by Open-WebUI | | Temperature | βœ… Configurable | | Token limits | βœ… `max_tokens` | | Model listing | βœ… `/v1/models` | | Source attribution | βœ… `x_metadata.sources` | --- ## Architecture ### Module Structure ``` main.py ← FastAPI app + router registration app/ config.py ← Config class (env vars) llm.py ← LLM providers (Ollama, HuggingFace) cache.py ← TTL-LRU async cache arabic_nlp.py ← Arabic normalization, stemming, language detection search.py ← Hybrid FAISS+BM25, text search, query rewriting analysis.py ← Intent detection, analytics, counting prompts.py ← Prompt engineering (persona, anti-hallucination) models.py ← Pydantic schemas state.py ← AppState, lifespan, RAG pipeline routers/ quran.py ← 6 Quran endpoints hadith.py ← 5 Hadith endpoints chat.py ← /ask + OpenAI-compatible chat ops.py ← health, models, debug scores ``` ### Data Pipeline 1. **Ingest**: 47,626 documents (6,236 Quran verses + 41,390 Hadiths from 9 collections) 2. **Embed**: Encode with `multilingual-e5-large` (Arabic + English dual embeddings) 3. **Index**: FAISS `IndexFlatIP` for dense retrieval ### Retrieval & Ranking 1. Dense retrieval (FAISS semantic scoring) 2. Sparse retrieval (BM25 term-frequency) 3. Fusion: 60% dense + 40% sparse 4. Intent-aware boost (+0.08 to Hadith when intent=hadith) 5. Type filter (quran_only / hadith_only / authenticated_only) 6. Text search fallback (exact phrase + word-overlap) ### Anti-Hallucination Measures - Few-shot examples including "not found" refusal path - Hardcoded citation format rules - Verbatim copy rules (no text reconstruction) - Confidence threshold gating (default: 0.30) - Post-generation citation verification - Grade inference from collection name ### Performance | Operation | Time | Backend | |-----------|------|---------| | Query (cached) | ~50ms | Both | | Query (Ollama) | 400–800ms | Ollama | | Query (HF GPU) | 500–1500ms | CUDA | | Query (HF CPU) | 2–5s | CPU | --- ## Troubleshooting ### "Cannot connect to Ollama" ```bash ollama serve # Ensure Ollama is running on host # In Docker, use OLLAMA_HOST=http://host.docker.internal:11434 ``` ### "HuggingFace model not found" ```bash export HF_TOKEN=hf_xxxxxxxxxxxxx # Set token for gated models ``` ### "Out of memory" - Use smaller model: `HF_MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2` - Use Ollama with `neural-chat` - Reduce `MAX_TOKENS` to 1024 - Increase Docker memory limit in `docker-compose.yml` ### "Assistant returns 'Not found'" This is expected β€” QModel rejects low-confidence queries. Try: - More specific queries - Lower `CONFIDENCE_THRESHOLD` in `.env` - Check raw scores: `GET /debug/scores?q=your+query` ### "Port already in use" ```bash docker-compose down && docker system prune # Or change port: ports: ["8001:8000"] ``` --- ## Roadmap - [x] Grade-based filtering - [x] Streaming responses (SSE) - [x] Modular architecture (4 routers, 16 endpoints) - [x] Dual LLM backend (Ollama + HuggingFace) - [x] Text search (exact substring + fuzzy matching) - [ ] Chain of narrators (Isnad display) - [ ] Synonym expansion (mercy β†’ rahma, compassion) - [ ] Batch processing (multiple questions per request) - [ ] Islamic calendar integration (Hijri dates) - [ ] Tafsir endpoint with scholar citations --- ## Data Sources - **Qur'an**: [risan/quran-json](https://github.com/risan/quran-json) β€” 114 Surahs, 6,236 verses - **Hadith**: [AhmedBaset/hadith-json](https://github.com/AhmedBaset/hadith-json) β€” 9 canonical collections, 41,390 hadiths --- ## Architecture Overview ``` User Query ↓ Query Rewriting & Intent Detection ↓ Hybrid Search (FAISS dense + BM25 sparse) ↓ Filtering & Ranking ↓ Confidence Gate (skip LLM if low-scoring) ↓ LLM Generation (Ollama or HuggingFace) ↓ Formatted Response with Sources ``` See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed system design. --- ## Troubleshooting | Issue | Solution | |-------|----------| | "Service is initialising" | Wait 60-90s for embeddings model to load | | Low retrieval scores | Check `/debug/scores`, try synonyms, lower threshold | | "Model not found" (HF) | Run `huggingface-cli login` | | Out of memory | Use smaller model or CPU backend | | No results | Verify data files exist: `metadata.json` and `QModel.index` | See [SETUP.md](SETUP.md) and [DOCKER.md](DOCKER.md) for more detailed troubleshooting. --- ## What's New in v6 ✨ **Dual LLM Backend** β€” Ollama (dev) + HuggingFace (prod) ✨ **Grade Filtering** β€” Return only Sahih/Hasan authenticated Hadiths ✨ **Source Filtering** β€” Quran-only or Hadith-only queries ✨ **Hadith Verification** β€” `/hadith/verify` endpoint ✨ **Enhanced Frequency** β€” Word counts by Surah ✨ **OpenAI Compatible** β€” Use with any OpenAI client ✨ **Production Ready** β€” Structured logging, error handling, async throughout --- ## Next Steps 1. **Get Started**: See [SETUP.md](SETUP.md) 2. **Integrate with Open-WebUI**: See [OPEN_WEBUI.md](OPEN_WEBUI.md) 3. **Deploy with Docker**: See [DOCKER.md](DOCKER.md) 4. **Understand Architecture**: See [ARCHITECTURE.md](ARCHITECTURE.md) --- ## License This project uses open-source data from: - [Qur'an JSON](https://github.com/risan/quran-json) β€” Open source - [Hadith API](https://github.com/AhmedBaset/hadith-json) β€” Open source See individual repositories for license details. --- **Made with ❀️ for Islamic scholarship.** Version 4.0.0 | March 2025 | Production-Ready