Whisper Small β LoRA Adapter for Dysarthric Speech
Parameter-efficient fine-tuning of OpenAI's Whisper Small for clinical speech recognition, targeting dysarthria (TORGO) with speaker-independent evaluation.
Standard ASR models fail catastrophically on clinical speech populations. This adapter reduces Whisper's error rate on dysarthric speech by ~50% using only 1.7M trainable parameters (0.73% of the model).
Quick Start
from transformers import WhisperForConditionalGeneration, WhisperProcessor
from peft import PeftModel
import librosa
processor = WhisperProcessor.from_pretrained("openai/whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model = PeftModel.from_pretrained(model, "dbarbera/whisper-small-torgo-dysarthria-lora")
audio, sr = librosa.load("audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
generated_ids = model.generate(**inputs)
print(processor.batch_decode(generated_ids, skip_special_tokens=True)[0])
Results
Overall
| Metric | Baseline (Whisper Small) | + LoRA | Ξ Relative |
|---|---|---|---|
| WER | 0.3160 (31.60%) | 0.1687 (16.87%) | +46.6% |
| SemScore | 0.6970 | 0.8321 | +19.4% |
By Speech Status
| Status | Baseline WER | LoRA WER | Ξ Relative | Baseline SemScore | LoRA SemScore |
|---|---|---|---|---|---|
| dysarthria (1750 utts) | 0.7750 | 0.4071 | +47.5% | 0.4675 | 0.6409 |
| healthy (3854 utts) | 0.1057 | 0.0595 | +43.7% | 0.8012 | 0.9189 |
By Speaker
| Speaker | Status | N | Baseline WER | LoRA WER | Ξ Relative |
|---|---|---|---|---|---|
| F03 | dysarthria | 1084 | 0.3668 | 0.2596 | +29.2% |
| FC02 | healthy | 2187 | 0.0993 | 0.0604 | +39.2% |
| M04 | dysarthria | 666 | 1.4556 | 0.6529 | +55.1% |
| MC03 | healthy | 1667 | 0.1141 | 0.0583 | +48.8% |
Training Configuration
| Parameter | Value |
|---|---|
| Base model | openai/whisper-small |
| Method | LoRA (decoder only) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA targets | q_proj, v_proj |
| LoRA dropout | 0.05 |
| Trainable params | ~1.77M / 243M (0.73%) |
| Epochs | 10 |
| Batch size | 8 Γ 2 = 16 effective |
| Learning rate | 0.0001 |
| Scheduler | cosine |
| Warmup steps | 500 |
| FP16 | True |
Why LoRA on the decoder only?
The encoder's acoustic representations generalise well across speakers. The decoder's language model is where Whisper fails on clinical speech β it "corrects" valid dysarthric productions toward standard words. Adapting only the decoder teaches the model to listen to what was actually said.
Evaluation methodology
- Text normalisation: Lowercase, strip punctuation, Unicode NFKD (applied to both reference and hypothesis before WER)
- SemScore: Cosine similarity of sentence embeddings (all-MiniLM-L6-v2) β captures semantic correctness even when surface form differs
- Speaker-independent splits: No speaker appears in both train and test sets
Dataset
TORGO β 15 speakers (8 dysarthric, 7 control), ~13.5 hours total, 16,552 utterances.
Available on HuggingFace: abnerh/TORGO-database
Training Hardware
- GPU: NVIDIA B200 (178.4 GB VRAM)
- CUDA: 12.8
- GPU peak memory: 5636 MB
- CPU: AMD EPYC 9555 64-Core Processor
- RAM: 2267.4 GB total, 1983.4 GB available
- Training time: 1286.6s (21.4 min)
- Throughput: 9.71 steps/s
- Platform: Linux-6.8.0-101-generic-x86_64-with-glibc2.39
- PyTorch: 2.8.0+cu128
- Transformers: 5.3.0
Limitations
- Trained on English dysarthric speech only (TORGO corpus, Canadian English)
- Severe dysarthria (e.g., speaker M04) still has high WER β the adapter helps but cannot fully compensate
- Not validated on other clinical populations (aphasia, hearing impairment, etc.)
- LoRA adapter adds minimal inference overhead but requires PEFT library
Citation
@misc{barbera2026whisper-clinical,
author = {Barbera, David},
title = {Whisper-Clinical: Parameter-Efficient Fine-Tuning for Dysarthric Speech Recognition},
year = {2026},
url = {https://github.com/DavidBarbera/whisper-clinical-speech},
note = {LoRA adapter for OpenAI Whisper Small trained on TORGO}
}
Author
David Barbera β PhD Cognitive Neuroscience (UCL). Specialising in speech recognition for clinical populations. Built a CE-marked Class II medical device for aphasia rehabilitation.
Website Β· Google Scholar Β· GitHub Β· HuggingFace
Last updated: 2026-03-23
- Downloads last month
- 45
Model tree for dbarbera/whisper-small-torgo-dysarthria-lora
Base model
openai/whisper-small