Whisper Medium — Fine-tuned for Arabic ASR on SADA

openai/whisper-medium fine-tuned on the full SADA22 dataset (~420 hours of Saudi Arabic speech) for Arabic automatic speech recognition (ASR).

Used as Baseline 2 in experiments on predicting the Arabic Level of Dialectness (ALDi) from speech: the transcript produced by this model is fed into a text-based ALDi classifier to obtain a dialect score.

Training details

Setting	Value
Base model	openai/whisper-medium (~764M parameters)
Dataset	SADA22 (full, ~420 h of Saudi Arabic)
Language	Arabic
Task	transcribe
Epochs	4
Learning rate	1e-5
Batch size	8
Gradient accumulation steps	1
Warmup ratio	0.1
FP16	yes

Quick Start

1. Install dependencies

pip install torch "transformers>=4.27" torchaudio safetensors

2. Transcribe an audio file

import torch
import torchaudio
from transformers import WhisperForConditionalGeneration, WhisperProcessor

model = WhisperForConditionalGeneration.from_pretrained("wageehkhad/whisper-medium-finetuned-sada-asr")
processor = WhisperProcessor.from_pretrained("wageehkhad/whisper-medium-finetuned-sada-asr")


def transcribe(audio_path: str, device: str = "cpu") -> str:
    """
    Transcribe an Arabic audio file.
    Accepts any format supported by torchaudio (WAV, FLAC, MP3, etc.).
    """
    wav, sr = torchaudio.load(audio_path)
    wav = torchaudio.functional.resample(wav, sr, 16_000).mean(0).numpy()

    inputs = processor(
        wav,
        sampling_rate=16_000,
        return_tensors="pt",
        return_attention_mask=True,
    ).to(device)
    model.to(device)

    with torch.no_grad():
        token_ids = model.generate(
            inputs.input_features,
            attention_mask=inputs.attention_mask,
            language="arabic",
            task="transcribe",
        )

    return processor.batch_decode(token_ids, skip_special_tokens=True)[0]

# Example
print(transcribe("example.wav"))

If you get the error:

ImportError: TorchCodec is required for load_with_torchcodec

make sure to run

pip install torchcodec

3. Using as part of the Baseline 2 ALDi pipeline

This model is the ASR component of a two-step pipeline:

Audio → [this model] → Arabic transcript → [AMR-KELEG/Sentence-ALDi] → ALDi score

from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
import torch

# Step 1: transcribe
asr = pipeline("automatic-speech-recognition",
               model="wageehkhad/whisper-medium-finetuned-sada-asr")
result = asr("speech.wav", generate_kwargs={"language": "arabic", "task": "transcribe"})
transcript = result["text"]

# Step 2: score dialect level
aldi_tok = AutoTokenizer.from_pretrained("AMR-KELEG/Sentence-ALDi")
aldi_mdl = AutoModelForSequenceClassification.from_pretrained("AMR-KELEG/Sentence-ALDi")
aldi_mdl.eval()

enc = aldi_tok(transcript, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    score = float(aldi_mdl(**enc).logits.squeeze())

print(f"Transcript : {transcript}")
print(f"ALDi score : {score:.3f}")  # 0.0 = MSA, 1.0 = heavy dialect

Limitations

Fine-tuned on Saudi Arabic (SADA22). WER on other Arabic dialects or MSA broadcasts will be higher.
Optimised for speech segments up to 30 seconds (Whisper's native window). Longer files are chunked automatically by the pipeline.

Citation

If you use this model, please cite the ALDi paper and the SADA dataset:

@inproceedings{keleg2023aldi,
  title     = {ALDi: Quantifying the Arabic Level of Dialectness of Text},
  author    = {Keleg, Amr and Goldwater, Sharon and Magdy, Walid},
  booktitle = {Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year      = {2023},
  publisher = {Association for Computational Linguistics},
  address   = {Singapore},
  url       = {https://aclanthology.org/2023.emnlp-main.655}
}

@misc{sada22,
  author       = {Al-Gamdi, Ahmed and others},
  title        = {SADA: Saudi Audio Dataset for Arabic},
  year         = {2022},
  howpublished = {\url{https://huggingface.co/datasets/MohamedRashad/SADA22}},
  note         = {Accessed 2026}
}

Downloads last month: 22

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for wageehkhad/whisper-medium-finetuned-sada-asr

Base model

openai/whisper-medium

Finetuned

(852)

this model

wageehkhad
/

whisper-medium-finetuned-sada-asr