Fine-tuned Whisper Small Nonstandard Kenyan English 🇰🇪

Fine-tuned version of openai/whisper-small optimized for non-standard Kenyan English speech, including speakers with speech impairments across varying severity levels and etiologies.

Key Features

🎯 Specialized for non-standard Kenyan accents and speech patterns
📊 Trained on non-standard Kenyan English speech data from cdli/kenyan_english_nonstandard_speech_v0.9
⚡ ~12% relative improvement over baseline on test set WER
🎙️ Best performance among experimental configurations
🧠 Robust generalization from development to test set

Performance vs. Baseline

Metric	Baseline	This Model
Test WER	12.3%	10.8%
Test CER	6.5%	5.6%

Development vs. Test Set Evaluation

To assess generalization, the model was evaluated on both the development(342 examples) and test(705 examples) splits of cdli/kenyan_english_nonstandard_speech_v0.9. Results show strong improvement on the test set across virtually all metrics and categories.

Overall Results

Metric	Dev Set	Test Set
Overall WER	14.9%	9.8%
Overall CER	8.0%	5.0%
Avg Utterance WER	16.4%	10.6%
Avg Utterance CER	9.3%	5.4%

The model generalizes well to unseen test data, suggesting it captures robust speech patterns rather than overfitting to development examples.

Results by Severity

Severity	Dev WER	Test WER	Dev CER	Test CER
Mild	13%	8%	6%	4%
Moderate	23%	15%	14%	9%
Severe	16%	11%	9%	6%

Notably, Moderate severity shows the highest error rates in both splits, likely reflecting specific speaker characteristics rather than a limitation tied strictly to severity level. The model shows consistent improvement across all severity buckets on the test set.

Results by Etiology

Etiology	Dev WER	Test WER	Notes
Cerebral Palsy	17%	12%	✅ Improved
Neurological Disorder	17%	10%	✅ Improved
Parkinson's Disease	26%	7%	✅ Major improvement
Multiple Sclerosis	10%	17%	❌ Regressed

Parkinson's Disease shows the most dramatic improvement (26% → 7% WER), likely reflecting speaker-specific differences between the dev and test splits — the test speaker may have milder symptoms or clearer articulation. Multiple Sclerosis is the only category where performance regressed (10% → 17% WER); however, with very small speaker counts (often N=1 per etiology), this is best attributed to individual speaker variability rather than a systematic weakness of the model.

Summary

Across both splits and all categories, the model achieves roughly a observable reduction in error rates from the development to the test set. It performs best on mild and severe impairment categories, and results across etiologies highlight the importance of speaker diversity in small-data settings — a known challenge in dysarthric and accented speech ASR research.

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", 
                      "smainye/whisper-small-kenyan-english-nonstandard")
                       
# Transcribe Kenyan English audio
result = transcriber("path/to/your/audio.wav")

# Get the transcription text
print("Transcription:", result["text"])