🎙️ Whisper-v3-Urdu-LoRA: Classical Poetry & ASR

Hugging Face Model License: MIT

📌 Project Overview

This repository features a fine-tuned OpenAI Whisper Large-v3 model utilizing PEFT/LoRA (Low-Rank Adaptation). The primary objective is to bridge the gap in Automatic Speech Recognition (ASR) for Classical Urdu Literature and Formal Academic Speech.

Unlike standard models that struggle with complex Persianized vocabulary and Nastaliq spacing, this adapter is specifically optimized for the rhythmic and lexical nuances of classical Ghazals and Nazms.

💻 Computing Environment

  • GPU: NVIDIA GeForce RTX 4060 Ti (16GB VRAM)
  • Precision: 4-bit Quantization (bitsandbytes) / Float16 Inference
  • Platform: Ubuntu 22.04 LTS (Local AI Workstation)
  • Storage Path: /mnt/data/ASR (SATA SSD)

📊 Evaluation: Classical Poetry (Shikwa by Allama Iqbal)

The model was benchmarked against the iconic verses of Shikwa. The results demonstrate high phonetic retention and an advanced understanding of formal Urdu vocabulary.

File ID Audio Sample ASR Transcription (LoRA) Ground Truth (Reference)
LJ0002 کونزیان کاربانوں سود فراموشراہوں کیوں زیاں کار بنوں سود فراموش رہوں
LJ0003 فکر فردہ نہ کروں میں وہ غمیدوش رہوں فکرِ فردا نہ کروں محوِ غمِ دوش رہوں
LJ0004 نالِ بُلبل کے سُنوں اور رحمتاً گوشت رہوں نالۂ بلبل کے سنوں اور ہمہ تن گوش رہوں
LJ0005 ہم نوا میں بھی کوئی گُل ہوں کے خاموش رہوں ہم نوا میں بھی کوئی گل ہوں کہ خاموش رہوں
LJ0006 جورت آموز میری تاب سخن ہے مجھ کو... جرأت آموز مری تابِ سخن ہے مجھ کو

📈 General Benchmark: Common Voice

To verify that the model maintains general-purpose linguistic utility, it was tested on standard colloquial Urdu.

  • Audio Sample:
  • Ground Truth: "اب کس کی باری، آصف زرداری"
  • Model Prediction: "اب کس کی باری ااصف زرداری"
  • Result: 98% Character Accuracy.

⚙️ Technical Specifications

LoRA Hyperparameters

  • Rank (r): 32
  • Alpha: 64
  • Target Modules: q_proj, v_proj
  • Learning Rate: 3.2e-6
  • Training Loss: 1.019 (Final convergence)

🚀 Usage Instructions

To achieve the results shown above, use the following inference configuration:

import torch
from peft import PeftModel, PeftConfig
from transformers import WhisperProcessor, WhisperForConditionalGeneration

# Load Model with LoRA Weights
adapter_id = "Khurram123/whisper-large-v3-urdu-lora"
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v3", load_in_4bit=True)
model = PeftModel.from_pretrained(model, adapter_id)

# Recommended Generation Config
predicted_ids = model.generate(
    input_features, 
    language="urdu", 
    task="transcribe",
    num_beams=5,             # Critical for Urdu word boundaries
    temperature=0.2,         # Reduces hallucination in poetic suffixes
    repetition_penalty=1.2   # Improves spacing between words
)
Downloads last month
80
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

  • Final Training Loss on muhammadsaadgondal/urdu-tts (Shikwa Subset)
    self-reported
    1.019