---
title: QModel
emoji: 🕌
colorFrom: green
colorTo: blue
sdk: docker
app_port: 8000
license: mit
tags:
  - quran
  - hadith
  - islamic
  - rag
  - faiss
  - nlp
  - arabic
language:
  - ar
  - en
---

# QModel v6 — Islamic RAG System
**Specialized Qur'an & Hadith Knowledge System with Dual LLM Support**

> A production-ready Retrieval-Augmented Generation system specialized exclusively in authenticated Islamic knowledge. No hallucinations, no outside knowledge—only content from verified sources.

![Version](https://img.shields.io/badge/version-6.0.0-blue)
![Backend](https://img.shields.io/badge/backend-ollama%20%7C%20huggingface-green)
![Status](https://img.shields.io/badge/status-production--ready-success)

---

## Features

### 📖 Qur'an Capabilities
- **Verse Lookup**: Find verses by topic or keyword
- **Word Frequency**: Count occurrences with Surah breakdown
- **Bilingual**: Full Arabic + English translation support
- **Tafsir Integration**: AI-powered contextual interpretation

### 📚 Hadith Capabilities
- **Authenticity Verification**: Check if Hadith is in authenticated collections
- **Grade Display**: Show Sahih/Hasan/Da'if authenticity levels
- **Topic Search**: Find relevant Hadiths across 9 major collections
- **Collection Navigation**: Filter by Bukhari, Muslim, Abu Dawud, etc.

### 🛡️ Safety Features
- **Confidence Gating**: Low-confidence queries return "not found" instead of guesses
- **Source Attribution**: Every answer cites exact verse/Hadith reference
- **Verbatim Quotes**: Text copied directly from data, never paraphrased
- **Anti-Hallucination**: Hardened prompts with few-shot "not found" examples

### 🚀 Integration
- **OpenAI-Compatible API**: Use with Open-WebUI, Langchain, or any OpenAI client
- **OpenAI Schema**: Full support for `/v1/chat/completions` and `/v1/models`
- **Streaming Responses**: SSE streaming for long-form answers

### ⚙️ Technical
- **Dual LLM Backend**: Ollama (dev) + HuggingFace (prod)
- **Hybrid Search**: Dense (FAISS) + Sparse (BM25) scoring
- **Async API**: FastAPI with async/await throughout
- **Caching**: TTL-based LRU cache for frequent queries
- **Scale**: 6,236 Quranic verses + 41,390 Hadiths indexed

---

## Quick Start

### Prerequisites
- Python 3.10+
- 16 GB RAM minimum (for embeddings + LLM)
- GPU recommended for HuggingFace backend
- Ollama installed (for local development) OR internet access (for HuggingFace)

### Installation

```bash
# Clone and enter project
git clone https://github.com/Logicsoft/QModel.git && cd QModel
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Configure (choose one backend)
# Option A — Ollama (local development):
export LLM_BACKEND=ollama
export OLLAMA_MODEL=llama2
# Make sure Ollama is running: ollama serve

# Option B — HuggingFace (production):
export LLM_BACKEND=hf
export HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct

# Run
python main.py

# Query
curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?"
```

API docs: http://localhost:8000/docs

### Data & Index

Pre-built data files are included:
- `metadata.json` — 47,626 documents (6,236 Quran verses + 41,390 hadiths from 9 canonical collections)
- `QModel.index` — FAISS search index

To rebuild after dataset changes:
```bash
python build_index.py
```

---

## Example Queries

```bash
# Basic question
curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?"

# Word frequency
curl "http://localhost:8000/ask?q=How%20many%20times%20is%20mercy%20mentioned?"

# Authentic Hadiths only
curl "http://localhost:8000/ask?q=prayer&source_type=hadith&grade_filter=sahih"

# Quran text search
curl "http://localhost:8000/quran/search?q=bismillah"

# Quran topic search
curl "http://localhost:8000/quran/topic?topic=patience&top_k=5"

# Quran word frequency
curl "http://localhost:8000/quran/word-frequency?word=mercy"

# Single chapter
curl "http://localhost:8000/quran/chapter/2"

# Exact verse
curl "http://localhost:8000/quran/verse/2:255"

# Hadith text search
curl "http://localhost:8000/hadith/search?q=actions+are+judged+by+intentions"

# Hadith topic search (Sahih only)
curl "http://localhost:8000/hadith/topic?topic=fasting&grade_filter=sahih"

# Verify Hadith authenticity
curl "http://localhost:8000/hadith/verify?q=Actions%20are%20judged%20by%20intentions"

# Browse a collection
curl "http://localhost:8000/hadith/collection/bukhari?limit=5"

# Streaming (OpenAI-compatible)
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"QModel","messages":[{"role":"user","content":"What does Islam say about charity?"}],"stream":true}'
```

---

## Configuration

All configuration via environment variables (`.env` file or exported directly):

### Backend Selection

| Backend | Pros | Cons | When to Use |
|---------|------|------|------------|
| **Ollama** | Fast setup, no GPU, free | Smaller models | Development, testing |
| **HuggingFace** | Larger models, better quality | Requires GPU or significant RAM | Production |

### Ollama Backend (Development)

```bash
LLM_BACKEND=ollama
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama2              # or: mistral, neural-chat, orca-mini
```

Requires: `ollama serve` running and model pulled (`ollama pull llama2`).

### HuggingFace Backend (Production)

```bash
LLM_BACKEND=hf
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
HF_DEVICE=auto                   # auto | cuda | cpu
HF_MAX_NEW_TOKENS=2048
```

### All Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| **Backend** | | |
| `LLM_BACKEND` | `hf` | `ollama` or `hf` |
| `OLLAMA_HOST` | `http://localhost:11434` | Ollama server URL |
| `OLLAMA_MODEL` | `llama2` | Ollama model name |
| `HF_MODEL_NAME` | `Qwen/Qwen2-7B-Instruct` | HuggingFace model ID |
| `HF_DEVICE` | `auto` | `auto`, `cuda`, or `cpu` |
| `HF_MAX_NEW_TOKENS` | `2048` | Max output length |
| **Embedding & Data** | | |
| `EMBED_MODEL` | `intfloat/multilingual-e5-large` | Embedding model |
| `FAISS_INDEX` | `QModel.index` | Index file path |
| `METADATA_FILE` | `metadata.json` | Dataset file |
| **Retrieval** | | |
| `TOP_K_SEARCH` | `20` | Candidate pool (5–100) |
| `TOP_K_RETURN` | `5` | Results shown to user (1–20) |
| `RERANK_ALPHA` | `0.6` | Dense vs Sparse weight (0.0–1.0) |
| **Generation** | | |
| `TEMPERATURE` | `0.2` | Creativity (0.0–1.0, use 0.1–0.2 for religious) |
| `MAX_TOKENS` | `2048` | Max response length |
| **Safety** | | |
| `CONFIDENCE_THRESHOLD` | `0.30` | Min score to call LLM (higher = fewer hallucinations) |
| `HADITH_BOOST` | `0.08` | Score boost for hadith on hadith queries |
| **Other** | | |
| `CACHE_SIZE` | `512` | Query response cache entries |
| `CACHE_TTL` | `3600` | Cache expiry in seconds |
| `ALLOWED_ORIGINS` | `*` | CORS origins |
| `MAX_EXAMPLES` | `3` | Few-shot examples in system prompt |

### Configuration Examples

**Development (Ollama)**
```bash
LLM_BACKEND=ollama
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama2
TEMPERATURE=0.2
CONFIDENCE_THRESHOLD=0.30
ALLOWED_ORIGINS=*
```

**Production (HuggingFace + GPU)**
```bash
LLM_BACKEND=hf
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
HF_DEVICE=cuda
TOP_K_SEARCH=30
TEMPERATURE=0.1
CONFIDENCE_THRESHOLD=0.35
ALLOWED_ORIGINS=yourdomain.com,api.yourdomain.com
```

### Tuning Tips

- **Better results**: Increase `TOP_K_SEARCH`, lower `CONFIDENCE_THRESHOLD`, use `TEMPERATURE=0.1`
- **Faster performance**: Lower `TOP_K_SEARCH` and `TOP_K_RETURN`, reduce `MAX_TOKENS`, use Ollama
- **More conservative**: Increase `CONFIDENCE_THRESHOLD`, lower `TEMPERATURE`

---

## Docker Deployment

### Docker Compose (Recommended)

```bash
cp .env.example .env   # Configure backend (see Configuration section)
docker-compose up
```

### Docker CLI

```bash
docker build -t qmodel .

# With Ollama backend
docker run -p 8000:8000 \
  --env-file .env \
  --add-host host.docker.internal:host-gateway \
  qmodel

# With HuggingFace backend
docker run -p 8000:8000 \
  --env-file .env \
  --env HF_TOKEN=your_token_here \
  qmodel
```

### Docker with Ollama

```bash
# .env
LLM_BACKEND=ollama
OLLAMA_HOST=http://host.docker.internal:11434
OLLAMA_MODEL=llama2
```

Requires Ollama running on the host (`ollama serve`).

### Docker with HuggingFace

```bash
# .env
LLM_BACKEND=hf
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
HF_DEVICE=auto

# Pass HF token
export HF_TOKEN=hf_xxxxxxxxxxxxx
docker-compose up
```

### Docker Compose with GPU (Linux)

```yaml
services:
  qmodel:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
```

### Production Tips

- Remove dev volume mount (`.:/app`) in `docker-compose.yml`
- Set `restart: on-failure:5`
- Use specific `ALLOWED_ORIGINS` instead of `*`

---

## Open-WebUI Integration

QModel is fully OpenAI-compatible and works out of the box with Open-WebUI.

### Setup

```bash
# Start QModel
python main.py

# Start Open-WebUI
docker run -d -p 3000:8080 --name open-webui ghcr.io/open-webui/open-webui:latest
```

### Connect

1. **Settings** → **Models** → **Manage Models**
2. Click **"Connect to OpenAI-compatible API"**
3. **API Base URL**: `http://localhost:8000/v1`
4. **Model Name**: `QModel`
5. **API Key**: Leave blank
6. **Save & Test** → ✅ Connected

### Docker Compose (QModel + Ollama + Open-WebUI)

```yaml
version: '3.8'
services:
  qmodel:
    build: .
    ports:
      - "8000:8000"
    environment:
      - LLM_BACKEND=ollama
      - OLLAMA_HOST=http://ollama:11434

  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"

  web-ui:
    image: ghcr.io/open-webui/open-webui:latest
    ports:
      - "3000:8080"
    depends_on:
      - qmodel
```

### Supported Features

| Feature | Status |
|---------|--------|
| Chat | ✅ Full support |
| Streaming | ✅ `stream: true` |
| Multi-turn context | ✅ Handled by Open-WebUI |
| Temperature | ✅ Configurable |
| Token limits | ✅ `max_tokens` |
| Model listing | ✅ `/v1/models` |
| Source attribution | ✅ `x_metadata.sources` |

---

## Architecture

### Module Structure

```
main.py                    ← FastAPI app + router registration
app/
  config.py               ← Config class (env vars)
  llm.py                  ← LLM providers (Ollama, HuggingFace)
  cache.py                ← TTL-LRU async cache
  arabic_nlp.py           ← Arabic normalization, stemming, language detection
  search.py               ← Hybrid FAISS+BM25, text search, query rewriting
  analysis.py             ← Intent detection, analytics, counting
  prompts.py              ← Prompt engineering (persona, anti-hallucination)
  models.py               ← Pydantic schemas
  state.py                ← AppState, lifespan, RAG pipeline
  routers/
    quran.py              ← 6 Quran endpoints
    hadith.py             ← 5 Hadith endpoints
    chat.py               ← /ask + OpenAI-compatible chat
    ops.py                ← health, models, debug scores
```

### Data Pipeline

1. **Ingest**: 47,626 documents (6,236 Quran verses + 41,390 Hadiths from 9 collections)
2. **Embed**: Encode with `multilingual-e5-large` (Arabic + English dual embeddings)
3. **Index**: FAISS `IndexFlatIP` for dense retrieval

### Retrieval & Ranking

1. Dense retrieval (FAISS semantic scoring)
2. Sparse retrieval (BM25 term-frequency)
3. Fusion: 60% dense + 40% sparse
4. Intent-aware boost (+0.08 to Hadith when intent=hadith)
5. Type filter (quran_only / hadith_only / authenticated_only)
6. Text search fallback (exact phrase + word-overlap)

### Anti-Hallucination Measures

- Few-shot examples including "not found" refusal path
- Hardcoded citation format rules
- Verbatim copy rules (no text reconstruction)
- Confidence threshold gating (default: 0.30)
- Post-generation citation verification
- Grade inference from collection name

### Performance

| Operation | Time | Backend |
|-----------|------|---------|
| Query (cached) | ~50ms | Both |
| Query (Ollama) | 400–800ms | Ollama |
| Query (HF GPU) | 500–1500ms | CUDA |
| Query (HF CPU) | 2–5s | CPU |

---

## Troubleshooting

### "Cannot connect to Ollama"
```bash
ollama serve                      # Ensure Ollama is running on host
# In Docker, use OLLAMA_HOST=http://host.docker.internal:11434
```

### "HuggingFace model not found"
```bash
export HF_TOKEN=hf_xxxxxxxxxxxxx  # Set token for gated models
```

### "Out of memory"
- Use smaller model: `HF_MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2`
- Use Ollama with `neural-chat`
- Reduce `MAX_TOKENS` to 1024
- Increase Docker memory limit in `docker-compose.yml`

### "Assistant returns 'Not found'"
This is expected — QModel rejects low-confidence queries. Try:
- More specific queries
- Lower `CONFIDENCE_THRESHOLD` in `.env`
- Check raw scores: `GET /debug/scores?q=your+query`

### "Port already in use"
```bash
docker-compose down && docker system prune
# Or change port: ports: ["8001:8000"]
```

---

## Roadmap

- [x] Grade-based filtering
- [x] Streaming responses (SSE)
- [x] Modular architecture (4 routers, 16 endpoints)
- [x] Dual LLM backend (Ollama + HuggingFace)
- [x] Text search (exact substring + fuzzy matching)
- [ ] Chain of narrators (Isnad display)
- [ ] Synonym expansion (mercy → rahma, compassion)
- [ ] Batch processing (multiple questions per request)
- [ ] Islamic calendar integration (Hijri dates)
- [ ] Tafsir endpoint with scholar citations

---

## Data Sources

- **Qur'an**: [risan/quran-json](https://github.com/risan/quran-json) — 114 Surahs, 6,236 verses
- **Hadith**: [AhmedBaset/hadith-json](https://github.com/AhmedBaset/hadith-json) — 9 canonical collections, 41,390 hadiths

---

## Architecture Overview

```
User Query
    ↓
Query Rewriting & Intent Detection
    ↓
Hybrid Search (FAISS dense + BM25 sparse)
    ↓
Filtering & Ranking
    ↓
Confidence Gate (skip LLM if low-scoring)
    ↓
LLM Generation (Ollama or HuggingFace)
    ↓
Formatted Response with Sources
```

See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed system design.

---

## Troubleshooting

| Issue | Solution |
|-------|----------|
| "Service is initialising" | Wait 60-90s for embeddings model to load |
| Low retrieval scores | Check `/debug/scores`, try synonyms, lower threshold |
| "Model not found" (HF) | Run `huggingface-cli login` |
| Out of memory | Use smaller model or CPU backend |
| No results | Verify data files exist: `metadata.json` and `QModel.index` |

See [SETUP.md](SETUP.md) and [DOCKER.md](DOCKER.md) for more detailed troubleshooting.

---

## What's New in v6

✨ **Dual LLM Backend** — Ollama (dev) + HuggingFace (prod)
✨ **Grade Filtering** — Return only Sahih/Hasan authenticated Hadiths
✨ **Source Filtering** — Quran-only or Hadith-only queries
✨ **Hadith Verification** — `/hadith/verify` endpoint
✨ **Enhanced Frequency** — Word counts by Surah
✨ **OpenAI Compatible** — Use with any OpenAI client
✨ **Production Ready** — Structured logging, error handling, async throughout

---

## Next Steps

1. **Get Started**: See [SETUP.md](SETUP.md)
2. **Integrate with Open-WebUI**: See [OPEN_WEBUI.md](OPEN_WEBUI.md)
3. **Deploy with Docker**: See [DOCKER.md](DOCKER.md)
4. **Understand Architecture**: See [ARCHITECTURE.md](ARCHITECTURE.md)

---

## License

This project uses open-source data from:
- [Qur'an JSON](https://github.com/risan/quran-json) — Open source
- [Hadith API](https://github.com/AhmedBaset/hadith-json) — Open source

See individual repositories for license details.

---

**Made with ❤️ for Islamic scholarship.**

Version 4.0.0 | March 2025 | Production-Ready