๐Ÿ•Œ Tadabur โ€” Quran Speech Recognition

Fine-tuned Whisper Medium on the Tadabur dataset for Quran ASR, Surah/Ayah identification, and reciter recognition.

CS465 Machine Learning Project โ€” Spring 2026


What This Model Does

Given a Quran audio recitation, the pipeline returns:

  1. Arabic transcription โ€” 6.26% WER on unseen data
  2. Surah & Ayah identification โ€” fuzzy matched against all 6,236 ayahs
  3. Reciter name โ€” identified from 335 supported reciters at 98.47% accuracy

Performance

ASR Results (500 held-out test samples)

Model WER (%) CER (%)
Whisper Medium Vanilla 41.10% 11.47%
Tadabur-Whisper-Small (Author) 47.06% 12.28%
This model 6.26% 4.41%

Reciter Classifier

Metric Value
Supported reciters 335
Validation accuracy 98.47%
Training accuracy 98.71%

Files in This Repository

File Size Description
model.safetensors 3.06 GB Fine-tuned Whisper Medium weights
reciter_classifier.pt 2.76 MB MLP reciter classifier
reciter_idx_to_id.json 1.25 KB Classifier index โ†’ reciter ID
reciter_id_to_idx.json 1.25 KB Reciter ID โ†’ classifier index
sheikh_dict.json 2.7 KB Reciter ID โ†’ Arabic name
surah_dict.json 2.7 KB Surah index โ†’ Arabic name
quran_simple.json ~3 MB Full Quran text for matching
supported_reciters.txt โ€” List of all 335 supported reciters

Quick Start

Install

pip install transformers torch librosa rapidfuzz huggingface_hub

Transcription only

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa, torch

MODEL = "rakansuliman/tadabur-whisper-medium"
processor = WhisperProcessor.from_pretrained(MODEL)
model = WhisperForConditionalGeneration.from_pretrained(MODEL)
model.eval()

audio, _ = librosa.load("recitation.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

with torch.no_grad():
    ids = model.generate(
        inputs,
        language="arabic",
        task="transcribe",
        max_new_tokens=225,
        suppress_tokens=[],
        forced_decoder_ids=None,
    )

print(processor.batch_decode(ids, skip_special_tokens=True)[0])

Full pipeline (transcription + reciter)

from huggingface_hub import hf_hub_download
import torch, torch.nn as nn, json

MODEL = "rakansuliman/tadabur-whisper-medium"

# Download classifier files
hf_hub_download(MODEL, "reciter_classifier.pt",  local_dir="./")
hf_hub_download(MODEL, "reciter_idx_to_id.json", local_dir="./")
hf_hub_download(MODEL, "sheikh_dict.json",        local_dir="./")

# Define classifier (must match training architecture)
class ReciterClassifier(nn.Module):
    def __init__(self, hidden_dim, num_classes):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(hidden_dim, 512), nn.BatchNorm1d(512), nn.ReLU(), nn.Dropout(0.3),
            nn.Linear(512, 256), nn.BatchNorm1d(256), nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(256, num_classes),
        )
    def forward(self, x): return self.net(x)

# Load mappings
with open("reciter_idx_to_id.json") as f:
    idx_to_id = {int(k): int(v) for k, v in json.load(f).items()}
with open("sheikh_dict.json", encoding="utf-8-sig") as f:
    sheikh = {int(v): k for k, v in json.load(f).items()}

# Load classifier
clf = ReciterClassifier(1024, len(idx_to_id))
clf.load_state_dict(torch.load("reciter_classifier.pt", map_location="cpu"))
clf.eval()

# Run encoder + classify reciter
with torch.no_grad():
    encoder_out = model.model.encoder(inputs)
    embedding   = encoder_out.last_hidden_state.mean(dim=1).float()
    logits      = clf(embedding)
    pred_idx    = logits.argmax(dim=1).item()
    confidence  = torch.softmax(logits, dim=1).max().item()

reciter_id   = idx_to_id[pred_idx]
reciter_name = sheikh.get(reciter_id, f"ID {reciter_id}")
print(f"Reciter: {reciter_name} ({confidence*100:.1f}%)")

Architecture

Audio Input (mic / file / video)
    โ†“
Whisper Encoder  โ†โ”€ runs once, shared
    โ”œโ”€โ”€ Whisper Decoder  โ†’  Arabic text
    โ””โ”€โ”€ MLP Classifier   โ†’  Reciter name
    โ†“
RapidFuzz matching against 6,236 ayahs
    โ†“
Surah name + Ayah number + confidence

Reciter Classifier Architecture

Linear(1024โ†’512) โ†’ BatchNorm โ†’ ReLU โ†’ Dropout(0.3)
    โ†’ Linear(512โ†’256) โ†’ BatchNorm โ†’ ReLU โ†’ Dropout(0.2)
    โ†’ Linear(256โ†’335)

Training Details

ASR Fine-tuning

  • Base model: openai/whisper-medium
  • Dataset: 9,432 samples (1 shard of Tadabur)
  • Hardware: NVIDIA RTX 4090 (24GB VRAM)
  • Batch size: 8 ร— 4 gradient accumulation = 32 effective
  • Learning rate: 1e-5 cosine with 500 warmup steps
  • Precision: FP16
  • Best checkpoint: step 10,000

Reciter Classifier

  • Training data: 500 shards (~325k samples, 335 reciters)
  • Phase 1: Extract Whisper encoder embeddings shard-by-shard
  • Phase 2: Train MLP on pre-extracted embeddings (15 min)
  • Optimizer: AdamW with cosine annealing
  • Epochs: 20, Batch size: 256

Supported Reciters

See supported_reciters.txt for the full list of 335 supported reciters including: ุนุจุฏ ุงู„ุจุงุณุท ุนุจุฏ ุงู„ุตู…ุฏุŒ ู…ุญู…ุฏ ุตุฏูŠู‚ ุงู„ู…ู†ุดุงูˆูŠุŒ ูŠุงุณุฑ ุงู„ุฏูˆุณุฑูŠุŒ ุณุนูˆุฏ ุงู„ุดุฑูŠู…ุŒ ู…ุงู‡ุฑ ุงู„ู…ุนูŠู‚ู„ูŠุŒ ุนุจุฏุงู„ุฑุญู…ู† ุงู„ุณุฏูŠุณุŒ and 329 more.


Limitations

  • ASR trained on 1 shard only โ€” may have reduced generalization on rare recitation styles
  • Reciter classifier covers 335 of 671 total reciters in the dataset
  • Surah/Ayah matching accuracy depends on transcription quality
  • Model optimized for standard Hafs recitation style

Citation

@misc{suliman2026tadabur,
  author = {Suliman, Rakan and Mamdoh, Abdulrahman and Aldosari, Hussam},
  title  = {Tadabur: Quran ASR with Surah/Ayah Identification and Reciter Recognition},
  year   = {2026},
  url    = {https://huggingface.co/rakansuliman/tadabur-whisper-medium}
}

License

CC BY-NC 4.0 โ€” Research and educational use only. Please engage with Quran content respectfully. ๐Ÿคฒ

Downloads last month
201
Safetensors
Model size
0.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for rakansuliman/tadabur-whisper-medium

Finetuned
(850)
this model

Dataset used to train rakansuliman/tadabur-whisper-medium

Space using rakansuliman/tadabur-whisper-medium 1

Evaluation results