🎙️ Whisper-v3-Urdu-LoRA: Classical Poetry & ASR

📌 Project Overview

This repository features a fine-tuned OpenAI Whisper Large-v3 model utilizing PEFT/LoRA (Low-Rank Adaptation). The primary objective is to bridge the gap in Automatic Speech Recognition (ASR) for Classical Urdu Literature and Formal Academic Speech.

Unlike standard models that struggle with complex Persianized vocabulary and Nastaliq spacing, this adapter is specifically optimized for the rhythmic and lexical nuances of classical Ghazals and Nazms.

💻 Computing Environment

GPU: NVIDIA GeForce RTX 4060 Ti (16GB VRAM)
Precision: 4-bit Quantization (bitsandbytes) / Float16 Inference
Platform: Ubuntu 22.04 LTS (Local AI Workstation)
Storage Path: /mnt/data/ASR (SATA SSD)

📊 Evaluation: Classical Poetry (Shikwa by Allama Iqbal)

The model was benchmarked against the iconic verses of Shikwa. The results demonstrate high phonetic retention and an advanced understanding of formal Urdu vocabulary.

File ID	ASR Transcription (LoRA)	Ground Truth (Reference)
LJ0002	کونزیان کاربانوں سود فراموشراہوں	کیوں زیاں کار بنوں سود فراموش رہوں
LJ0003	فکر فردہ نہ کروں میں وہ غمیدوش رہوں	فکرِ فردا نہ کروں محوِ غمِ دوش رہوں
LJ0004	نالِ بُلبل کے سُنوں اور رحمتاً گوشت رہوں	نالۂ بلبل کے سنوں اور ہمہ تن گوش رہوں
LJ0005	ہم نوا میں بھی کوئی گُل ہوں کے خاموش رہوں	ہم نوا میں بھی کوئی گل ہوں کہ خاموش رہوں
LJ0006	جورت آموز میری تاب سخن ہے مجھ کو...	جرأت آموز مری تابِ سخن ہے مجھ کو

📈 General Benchmark: Common Voice

To verify that the model maintains general-purpose linguistic utility, it was tested on standard colloquial Urdu.

Audio Sample:
Ground Truth: "اب کس کی باری، آصف زرداری"
Model Prediction: "اب کس کی باری ااصف زرداری"
Result: 98% Character Accuracy.

⚙️ Technical Specifications

LoRA Hyperparameters

Rank (r): 32
Alpha: 64
Target Modules: q_proj, v_proj
Learning Rate: 3.2e-6
Training Loss: 1.019 (Final convergence)

🚀 Usage Instructions

To achieve the results shown above, use the following inference configuration:

import torch
from peft import PeftModel, PeftConfig
from transformers import WhisperProcessor, WhisperForConditionalGeneration

# Load Model with LoRA Weights
adapter_id = "Khurram123/whisper-large-v3-urdu-lora"
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v3", load_in_4bit=True)
model = PeftModel.from_pretrained(model, adapter_id)

# Recommended Generation Config
predicted_ids = model.generate(
    input_features, 
    language="urdu", 
    task="transcribe",
    num_beams=5,             # Critical for Urdu word boundaries
    temperature=0.2,         # Reduces hallucination in poetic suffixes
    repetition_penalty=1.2   # Improves spacing between words
)

Downloads last month: 80

Evaluation results

Final Training Loss on muhammadsaadgondal/urdu-tts (Shikwa Subset)
self-reported

1.019