Whisper Tiny โ€” Fine-tuned for Urdu ASR

Fine-tuned version of openai/whisper-tiny on the Common Voice Urdu dataset.

Model Details

Property Value
Base model openai/whisper-tiny
Language Urdu (ur)
Task Automatic Speech Recognition
WER (before training) 119.36%
WER (after training) 48.28%
Training samples 2,939
Epochs 30

Usage

HuggingFace Transformers (GPU)

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio

processor = WhisperProcessor.from_pretrained("abidanoaman/whisper-tiny-fast-urdu-finetuned-common-voice-partaial")
model     = WhisperForConditionalGeneration.from_pretrained("abidanoaman/whisper-tiny-fast-urdu-finetuned-common-voice-partaial")

audio, sr = torchaudio.load("audio.wav")
if sr != 16000:
    audio = torchaudio.transforms.Resample(sr, 16000)(audio)
audio = audio.squeeze().numpy()

inputs = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(inputs)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

faster-whisper (CPU, recommended for deployment)

from huggingface_hub import snapshot_download
from faster_whisper import WhisperModel

# Download faster-whisper model
model_path = snapshot_download(
    repo_id="abidanoaman/whisper-tiny-fast-urdu-finetuned-common-voice-partaial",
    allow_patterns="faster-whisper/*"
)

model = WhisperModel(
    f"{model_path}/faster-whisper",
    device="cpu",
    compute_type="int8"
)

segments, info = model.transcribe(
    "audio.wav",
    language="ur",
    vad_filter=True,
    beam_size=5,
    condition_on_previous_text=False,
    temperature=0.0,
)

text = " ".join([seg.text for seg in segments])
print(text)

Training Details

  • Dataset: Mozilla Common Voice โ€” Urdu
  • Base model: openai/whisper-tiny
  • Learning rate: 1e-5 with cosine scheduler
  • Batch size: 32 (effective)
  • Optimizer: AdamW
  • Framework: HuggingFace Transformers + faster-whisper (CTranslate2 int8)
Downloads last month
21
Safetensors
Model size
37.8M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for abidanoaman/whisper-tiny-fast-urdu-finetuned-common-voice-partaial

Finetuned
(1802)
this model