Whisper Small β€” LoRA Adapter for Dysarthric Speech

Parameter-efficient fine-tuning of OpenAI's Whisper Small for clinical speech recognition, targeting dysarthria (TORGO) with speaker-independent evaluation.

Standard ASR models fail catastrophically on clinical speech populations. This adapter reduces Whisper's error rate on dysarthric speech by ~50% using only 1.7M trainable parameters (0.73% of the model).

Quick Start

from transformers import WhisperForConditionalGeneration, WhisperProcessor
from peft import PeftModel
import librosa

processor = WhisperProcessor.from_pretrained("openai/whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model = PeftModel.from_pretrained(model, "dbarbera/whisper-small-torgo-dysarthria-lora")

audio, sr = librosa.load("audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
generated_ids = model.generate(**inputs)
print(processor.batch_decode(generated_ids, skip_special_tokens=True)[0])

Results

Overall

Metric Baseline (Whisper Small) + LoRA Ξ” Relative
WER 0.3160 (31.60%) 0.1687 (16.87%) +46.6%
SemScore 0.6970 0.8321 +19.4%

By Speech Status

Status Baseline WER LoRA WER Ξ” Relative Baseline SemScore LoRA SemScore
dysarthria (1750 utts) 0.7750 0.4071 +47.5% 0.4675 0.6409
healthy (3854 utts) 0.1057 0.0595 +43.7% 0.8012 0.9189

By Speaker

Speaker Status N Baseline WER LoRA WER Ξ” Relative
F03 dysarthria 1084 0.3668 0.2596 +29.2%
FC02 healthy 2187 0.0993 0.0604 +39.2%
M04 dysarthria 666 1.4556 0.6529 +55.1%
MC03 healthy 1667 0.1141 0.0583 +48.8%

Training Configuration

Parameter Value
Base model openai/whisper-small
Method LoRA (decoder only)
LoRA rank 16
LoRA alpha 32
LoRA targets q_proj, v_proj
LoRA dropout 0.05
Trainable params ~1.77M / 243M (0.73%)
Epochs 10
Batch size 8 Γ— 2 = 16 effective
Learning rate 0.0001
Scheduler cosine
Warmup steps 500
FP16 True

Why LoRA on the decoder only?

The encoder's acoustic representations generalise well across speakers. The decoder's language model is where Whisper fails on clinical speech β€” it "corrects" valid dysarthric productions toward standard words. Adapting only the decoder teaches the model to listen to what was actually said.

Evaluation methodology

  • Text normalisation: Lowercase, strip punctuation, Unicode NFKD (applied to both reference and hypothesis before WER)
  • SemScore: Cosine similarity of sentence embeddings (all-MiniLM-L6-v2) β€” captures semantic correctness even when surface form differs
  • Speaker-independent splits: No speaker appears in both train and test sets

Dataset

TORGO β€” 15 speakers (8 dysarthric, 7 control), ~13.5 hours total, 16,552 utterances.

Available on HuggingFace: abnerh/TORGO-database

Training Hardware

  • GPU: NVIDIA B200 (178.4 GB VRAM)
  • CUDA: 12.8
  • GPU peak memory: 5636 MB
  • CPU: AMD EPYC 9555 64-Core Processor
  • RAM: 2267.4 GB total, 1983.4 GB available
  • Training time: 1286.6s (21.4 min)
  • Throughput: 9.71 steps/s
  • Platform: Linux-6.8.0-101-generic-x86_64-with-glibc2.39
  • PyTorch: 2.8.0+cu128
  • Transformers: 5.3.0

Limitations

  • Trained on English dysarthric speech only (TORGO corpus, Canadian English)
  • Severe dysarthria (e.g., speaker M04) still has high WER β€” the adapter helps but cannot fully compensate
  • Not validated on other clinical populations (aphasia, hearing impairment, etc.)
  • LoRA adapter adds minimal inference overhead but requires PEFT library

Citation

@misc{barbera2026whisper-clinical,
  author = {Barbera, David},
  title = {Whisper-Clinical: Parameter-Efficient Fine-Tuning for Dysarthric Speech Recognition},
  year = {2026},
  url = {https://github.com/DavidBarbera/whisper-clinical-speech},
  note = {LoRA adapter for OpenAI Whisper Small trained on TORGO}
}

Author

David Barbera β€” PhD Cognitive Neuroscience (UCL). Specialising in speech recognition for clinical populations. Built a CE-marked Class II medical device for aphasia rehabilitation.

Website Β· Google Scholar Β· GitHub Β· HuggingFace


Last updated: 2026-03-23

Downloads last month
45
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for dbarbera/whisper-small-torgo-dysarthria-lora

Adapter
(216)
this model

Dataset used to train dbarbera/whisper-small-torgo-dysarthria-lora