Whisper Medium — Hindi High LR

Fine-tuned Hindi ASR model based on openai/whisper-medium, trained as a single-stage baseline using a high learning rate on the full Hindi training corpus. This model serves as the High LR baseline in the research paper Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition and the Vividh-ASR: Diagnosing and Fixing Studio-Bias in Whisper for Indic Languages benchmark suite.

This model is part of a set of Malayalam and Hindi Whisper models released by Adalat AI alongside the Vividh-ASR benchmark.


Model Description

The High LR baseline fine-tunes Whisper in a single stage on all available Hindi training data mixed together, without any curriculum ordering:

Stage Data LR
1 All tiers — Studio + Broadcast + Spontaneous (~2190 hrs) 2e-4

Training uses AdamW (weight decay 0.1), linear warmup for the first 10% of steps, and cosine annealing to zero. Trained on NVIDIA H100 GPUs using HuggingFace Transformers.


Benchmark Results (Vividh-ASR)

Benchmark WER is measured using faster-whisper with 7s VAD segmentation for long-form audio. See the blogpost for full evaluation details.

Model Tier A (Studio) Tier B (Broadcast) Tier C (Spontaneous) Tier D (Noise) Global
whisper-medium-hi-high-lr (This model) 13.63 11.33 18.98 14.05 15.73
whisper-medium-hi-rmft 15.82 10.11 22.71 17.27 18.14
whisper-small-hi-high-lr 16.96 11.05 23.02 16.77 18.73
whisper-small-hi-rmft 18.60 11.49 25.34 20.97 20.70
indic-whisper-hi 16.24 11.62 39.87 14.99 25.01
vaani-whisper-large-v3-hindi 12.55 17.61 28.91 14.52 21.05
whisper-medium-vaani-hindi 18.15 25.92 22.85 17.19 21.51
whisper-small-vaani-hindi 23.39 30.37 26.63 22.10 25.92

WER %. Lower is better. See Vividh-ASR benchmark for full evaluation details.


Usage

from transformers import pipeline

asr = pipeline(
    "automatic-speech-recognition",
    model="adalat-ai/whisper-medium-hi-high-lr",
    chunk_length_s=30,
    device="cuda"
)

result = asr("audio.wav")
print(result["text"])

Note: For long-form audio, benchmark results use faster-whisper with 7s VAD segmentation. For short clips, the HuggingFace pipeline above will produce equivalent results.


Training Data

Training data is a superset of the Vividh-ASR benchmark evaluation splits. Sources used:

Tier Hours Sources
A (Studio) 272.1 FLEURS, IndicTTS, Kathbath, Common Voice, MUCS
B (Broadcast) 1359.9 Shrutilipi
C (Spontaneous) 558.7 IndicVoices
Total 2190.7

Intended Use & Limitations

This model is intended as a general-purpose Hindi ASR model optimised for verbatim transcription accuracy across diverse acoustic conditions.

Limitations:

  • Evaluated on Hindi and Malayalam only; generalisation to other Indic languages is untested
  • Tier D evaluation uses synthetic noise profiles; performance on real-world degraded audio may differ

Citation

If you use this model or the Vividh-ASR benchmark, please cite:

@misc{vividhasr2025,
  title   = {Vividh-ASR: Diagnosing and Fixing Studio-Bias in Whisper
             for Indic Languages},
  author  = {[Kush Juvekar, Kavya Manohar, Kumaramanas Nethil]},
  year    = {2026},
  url     = {https://huggingface.co/blog/adalat-ai/vividh-benchmark}
}
@misc{vividh2026,
      title={Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition}, 
      author={Kush Juvekar, Kavya Manohar, Aditya Srinivas Menon, Arghya Bhattacharya, Kumarmanas Nethil},
      year={2026},
      eprint={2605.13087},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.13087}, 
}

Related Models and Datasets

See the Vividh collection.


Developed by Adalat AI. Released under Apache 2.0.

Downloads last month
27
Safetensors
Model size
0.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for adalat-ai/whisper-medium-hi-high-lr

Finetuned
(878)
this model

Collection including adalat-ai/whisper-medium-hi-high-lr

Paper for adalat-ai/whisper-medium-hi-high-lr

Article mentioning adalat-ai/whisper-medium-hi-high-lr