Whisper Small — LoRA Adapter for Dysarthric Speech

Parameter-efficient fine-tuning of OpenAI's Whisper Small for clinical speech recognition, targeting dysarthria (TORGO) with speaker-independent evaluation.

Standard ASR models fail catastrophically on clinical speech populations. This adapter reduces Whisper's error rate on dysarthric speech by ~50% using only 1.7M trainable parameters (0.73% of the model).

Quick Start

from transformers import WhisperForConditionalGeneration, WhisperProcessor
from peft import PeftModel
import librosa

processor = WhisperProcessor.from_pretrained("openai/whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model = PeftModel.from_pretrained(model, "dbarbera/whisper-small-torgo-dysarthria-lora")

audio, sr = librosa.load("audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
generated_ids = model.generate(**inputs)
print(processor.batch_decode(generated_ids, skip_special_tokens=True)[0])

Results

Overall

Metric	Baseline (Whisper Small)	+ LoRA	Δ Relative
WER	0.3160 (31.60%)	0.1687 (16.87%)	+46.6%
SemScore	0.6970	0.8321	+19.4%

By Speech Status

Status	Baseline WER	LoRA WER	Δ Relative	Baseline SemScore	LoRA SemScore
dysarthria (1750 utts)	0.7750	0.4071	+47.5%	0.4675	0.6409
healthy (3854 utts)	0.1057	0.0595	+43.7%	0.8012	0.9189

By Speaker

Speaker	Status	N	Baseline WER	LoRA WER	Δ Relative
F03	dysarthria	1084	0.3668	0.2596	+29.2%
FC02	healthy	2187	0.0993	0.0604	+39.2%
M04	dysarthria	666	1.4556	0.6529	+55.1%
MC03	healthy	1667	0.1141	0.0583	+48.8%

Training Configuration

Parameter	Value
Base model	`openai/whisper-small`
Method	LoRA (decoder only)
LoRA rank	16
LoRA alpha	32
LoRA targets	q_proj, v_proj
LoRA dropout	0.05
Trainable params	~1.77M / 243M (0.73%)
Epochs	10
Batch size	8 × 2 = 16 effective
Learning rate	0.0001
Scheduler	cosine
Warmup steps	500
FP16	True

Why LoRA on the decoder only?

The encoder's acoustic representations generalise well across speakers. The decoder's language model is where Whisper fails on clinical speech — it "corrects" valid dysarthric productions toward standard words. Adapting only the decoder teaches the model to listen to what was actually said.

Evaluation methodology

Text normalisation: Lowercase, strip punctuation, Unicode NFKD (applied to both reference and hypothesis before WER)
SemScore: Cosine similarity of sentence embeddings (all-MiniLM-L6-v2) — captures semantic correctness even when surface form differs
Speaker-independent splits: No speaker appears in both train and test sets

Dataset

TORGO — 15 speakers (8 dysarthric, 7 control), ~13.5 hours total, 16,552 utterances.

Available on HuggingFace: abnerh/TORGO-database

Training Hardware

GPU: NVIDIA B200 (178.4 GB VRAM)
CUDA: 12.8
GPU peak memory: 5636 MB
CPU: AMD EPYC 9555 64-Core Processor
RAM: 2267.4 GB total, 1983.4 GB available
Training time: 1286.6s (21.4 min)
Throughput: 9.71 steps/s
Platform: Linux-6.8.0-101-generic-x86_64-with-glibc2.39
PyTorch: 2.8.0+cu128
Transformers: 5.3.0

Limitations

Trained on English dysarthric speech only (TORGO corpus, Canadian English)
Severe dysarthria (e.g., speaker M04) still has high WER — the adapter helps but cannot fully compensate
Not validated on other clinical populations (aphasia, hearing impairment, etc.)
LoRA adapter adds minimal inference overhead but requires PEFT library

Citation

@misc{barbera2026whisper-clinical,
  author = {Barbera, David},
  title = {Whisper-Clinical: Parameter-Efficient Fine-Tuning for Dysarthric Speech Recognition},
  year = {2026},
  url = {https://github.com/DavidBarbera/whisper-clinical-speech},
  note = {LoRA adapter for OpenAI Whisper Small trained on TORGO}
}

Author

David Barbera — PhD Cognitive Neuroscience (UCL). Specialising in speech recognition for clinical populations. Built a CE-marked Class II medical device for aphasia rehabilitation.

Website · Google Scholar · GitHub · HuggingFace

Last updated: 2026-03-23

Downloads last month: 45

Model tree for dbarbera/whisper-small-torgo-dysarthria-lora

Base model

openai/whisper-small

Adapter

(216)

this model

dbarbera
/

whisper-small-torgo-dysarthria-lora