🎙️ Whisper-v3-Urdu-LoRA: Classical Poetry & ASR
📌 Project Overview
This repository features a fine-tuned OpenAI Whisper Large-v3 model utilizing PEFT/LoRA (Low-Rank Adaptation). The primary objective is to bridge the gap in Automatic Speech Recognition (ASR) for Classical Urdu Literature and Formal Academic Speech.
Unlike standard models that struggle with complex Persianized vocabulary and Nastaliq spacing, this adapter is specifically optimized for the rhythmic and lexical nuances of classical Ghazals and Nazms.
💻 Computing Environment
- GPU: NVIDIA GeForce RTX 4060 Ti (16GB VRAM)
- Precision: 4-bit Quantization (bitsandbytes) / Float16 Inference
- Platform: Ubuntu 22.04 LTS (Local AI Workstation)
- Storage Path:
/mnt/data/ASR(SATA SSD)
📊 Evaluation: Classical Poetry (Shikwa by Allama Iqbal)
The model was benchmarked against the iconic verses of Shikwa. The results demonstrate high phonetic retention and an advanced understanding of formal Urdu vocabulary.
| File ID | Audio Sample | ASR Transcription (LoRA) | Ground Truth (Reference) |
|---|---|---|---|
| LJ0002 | کونزیان کاربانوں سود فراموشراہوں | کیوں زیاں کار بنوں سود فراموش رہوں | |
| LJ0003 | فکر فردہ نہ کروں میں وہ غمیدوش رہوں | فکرِ فردا نہ کروں محوِ غمِ دوش رہوں | |
| LJ0004 | نالِ بُلبل کے سُنوں اور رحمتاً گوشت رہوں | نالۂ بلبل کے سنوں اور ہمہ تن گوش رہوں | |
| LJ0005 | ہم نوا میں بھی کوئی گُل ہوں کے خاموش رہوں | ہم نوا میں بھی کوئی گل ہوں کہ خاموش رہوں | |
| LJ0006 | جورت آموز میری تاب سخن ہے مجھ کو... | جرأت آموز مری تابِ سخن ہے مجھ کو |
📈 General Benchmark: Common Voice
To verify that the model maintains general-purpose linguistic utility, it was tested on standard colloquial Urdu.
- Audio Sample:
- Ground Truth: "اب کس کی باری، آصف زرداری"
- Model Prediction: "اب کس کی باری ااصف زرداری"
- Result: 98% Character Accuracy.
⚙️ Technical Specifications
LoRA Hyperparameters
- Rank (r): 32
- Alpha: 64
- Target Modules:
q_proj,v_proj - Learning Rate: 3.2e-6
- Training Loss: 1.019 (Final convergence)
🚀 Usage Instructions
To achieve the results shown above, use the following inference configuration:
import torch
from peft import PeftModel, PeftConfig
from transformers import WhisperProcessor, WhisperForConditionalGeneration
# Load Model with LoRA Weights
adapter_id = "Khurram123/whisper-large-v3-urdu-lora"
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v3", load_in_4bit=True)
model = PeftModel.from_pretrained(model, adapter_id)
# Recommended Generation Config
predicted_ids = model.generate(
input_features,
language="urdu",
task="transcribe",
num_beams=5, # Critical for Urdu word boundaries
temperature=0.2, # Reduces hallucination in poetic suffixes
repetition_penalty=1.2 # Improves spacing between words
)
- Downloads last month
- 80
Evaluation results
- Final Training Loss on muhammadsaadgondal/urdu-tts (Shikwa Subset)self-reported1.019