🕌 Tadabur — Quran Speech Recognition

Fine-tuned Whisper Medium on the Tadabur dataset for Quran ASR, Surah/Ayah identification, and reciter recognition.

CS465 Machine Learning Project — Spring 2026

What This Model Does

Given a Quran audio recitation, the pipeline returns:

Arabic transcription — 6.26% WER on unseen data
Surah & Ayah identification — fuzzy matched against all 6,236 ayahs
Reciter name — identified from 335 supported reciters at 98.47% accuracy

Performance

ASR Results (500 held-out test samples)

Model	WER (%)	CER (%)
Whisper Medium Vanilla	41.10%	11.47%
Tadabur-Whisper-Small (Author)	47.06%	12.28%
This model	6.26%	4.41%

Reciter Classifier

Metric	Value
Supported reciters	335
Validation accuracy	98.47%
Training accuracy	98.71%

Files in This Repository

File	Size	Description
`model.safetensors`	3.06 GB	Fine-tuned Whisper Medium weights
`reciter_classifier.pt`	2.76 MB	MLP reciter classifier
`reciter_idx_to_id.json`	1.25 KB	Classifier index → reciter ID
`reciter_id_to_idx.json`	1.25 KB	Reciter ID → classifier index
`sheikh_dict.json`	2.7 KB	Reciter ID → Arabic name
`surah_dict.json`	2.7 KB	Surah index → Arabic name
`quran_simple.json`	~3 MB	Full Quran text for matching
`supported_reciters.txt`	—	List of all 335 supported reciters

Quick Start

Install

pip install transformers torch librosa rapidfuzz huggingface_hub

Transcription only

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa, torch

MODEL = "rakansuliman/tadabur-whisper-medium"
processor = WhisperProcessor.from_pretrained(MODEL)
model = WhisperForConditionalGeneration.from_pretrained(MODEL)
model.eval()

audio, _ = librosa.load("recitation.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

with torch.no_grad():
    ids = model.generate(
        inputs,
        language="arabic",
        task="transcribe",
        max_new_tokens=225,
        suppress_tokens=[],
        forced_decoder_ids=None,
    )

print(processor.batch_decode(ids, skip_special_tokens=True)[0])

Full pipeline (transcription + reciter)

from huggingface_hub import hf_hub_download
import torch, torch.nn as nn, json

MODEL = "rakansuliman/tadabur-whisper-medium"

# Download classifier files
hf_hub_download(MODEL, "reciter_classifier.pt",  local_dir="./")
hf_hub_download(MODEL, "reciter_idx_to_id.json", local_dir="./")
hf_hub_download(MODEL, "sheikh_dict.json",        local_dir="./")

# Define classifier (must match training architecture)
class ReciterClassifier(nn.Module):
    def __init__(self, hidden_dim, num_classes):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(hidden_dim, 512), nn.BatchNorm1d(512), nn.ReLU(), nn.Dropout(0.3),
            nn.Linear(512, 256), nn.BatchNorm1d(256), nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(256, num_classes),
        )
    def forward(self, x): return self.net(x)

# Load mappings
with open("reciter_idx_to_id.json") as f:
    idx_to_id = {int(k): int(v) for k, v in json.load(f).items()}
with open("sheikh_dict.json", encoding="utf-8-sig") as f:
    sheikh = {int(v): k for k, v in json.load(f).items()}

# Load classifier
clf = ReciterClassifier(1024, len(idx_to_id))
clf.load_state_dict(torch.load("reciter_classifier.pt", map_location="cpu"))
clf.eval()

# Run encoder + classify reciter
with torch.no_grad():
    encoder_out = model.model.encoder(inputs)
    embedding   = encoder_out.last_hidden_state.mean(dim=1).float()
    logits      = clf(embedding)
    pred_idx    = logits.argmax(dim=1).item()
    confidence  = torch.softmax(logits, dim=1).max().item()

reciter_id   = idx_to_id[pred_idx]
reciter_name = sheikh.get(reciter_id, f"ID {reciter_id}")
print(f"Reciter: {reciter_name} ({confidence*100:.1f}%)")

Architecture

Audio Input (mic / file / video)
    ↓
Whisper Encoder  ←─ runs once, shared
    ├── Whisper Decoder  →  Arabic text
    └── MLP Classifier   →  Reciter name
    ↓
RapidFuzz matching against 6,236 ayahs
    ↓
Surah name + Ayah number + confidence

Reciter Classifier Architecture

Linear(1024→512) → BatchNorm → ReLU → Dropout(0.3)
    → Linear(512→256) → BatchNorm → ReLU → Dropout(0.2)
    → Linear(256→335)

Training Details

ASR Fine-tuning

Base model: openai/whisper-medium
Dataset: 9,432 samples (1 shard of Tadabur)
Hardware: NVIDIA RTX 4090 (24GB VRAM)
Batch size: 8 × 4 gradient accumulation = 32 effective
Learning rate: 1e-5 cosine with 500 warmup steps
Precision: FP16
Best checkpoint: step 10,000

Reciter Classifier

Training data: 500 shards (~325k samples, 335 reciters)
Phase 1: Extract Whisper encoder embeddings shard-by-shard
Phase 2: Train MLP on pre-extracted embeddings (15 min)
Optimizer: AdamW with cosine annealing
Epochs: 20, Batch size: 256

Supported Reciters

See supported_reciters.txt for the full list of 335 supported reciters including: عبد الباسط عبد الصمد، محمد صديق المنشاوي، ياسر الدوسري، سعود الشريم، ماهر المعيقلي، عبدالرحمن السديس، and 329 more.

Limitations

ASR trained on 1 shard only — may have reduced generalization on rare recitation styles
Reciter classifier covers 335 of 671 total reciters in the dataset
Surah/Ayah matching accuracy depends on transcription quality
Model optimized for standard Hafs recitation style

Citation

@misc{suliman2026tadabur,
  author = {Suliman, Rakan and Mamdoh, Abdulrahman and Aldosari, Hussam},
  title  = {Tadabur: Quran ASR with Surah/Ayah Identification and Reciter Recognition},
  year   = {2026},
  url    = {https://huggingface.co/rakansuliman/tadabur-whisper-medium}
}

License

CC BY-NC 4.0 — Research and educational use only. Please engage with Quran content respectfully. 🤲

Downloads last month: 201

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rakansuliman/tadabur-whisper-medium

Base model

openai/whisper-medium

Finetuned

(850)

this model

Dataset used to train rakansuliman/tadabur-whisper-medium

Space using rakansuliman/tadabur-whisper-medium 1

Evaluation results

wer on Tadabur
self-reported

6.260
cer on Tadabur
self-reported

4.410