Fine-tuned Whisper Small Nonstandard Kenyan English π°πͺ
Fine-tuned version of openai/whisper-small optimized for non-standard Kenyan English speech, including speakers with speech impairments across varying severity levels and etiologies.
Key Features
- π― Specialized for non-standard Kenyan accents and speech patterns
- π Trained on non-standard Kenyan English speech data from
cdli/kenyan_english_nonstandard_speech_v0.9 - β‘ ~12% relative improvement over baseline on test set WER
- ποΈ Best performance among experimental configurations
- π§ Robust generalization from development to test set
Performance vs. Baseline
| Metric | Baseline | This Model |
|---|---|---|
| Test WER | 12.3% | 10.8% |
| Test CER | 6.5% | 5.6% |
Development vs. Test Set Evaluation
To assess generalization, the model was evaluated on both the development(342 examples) and test(705 examples) splits of cdli/kenyan_english_nonstandard_speech_v0.9. Results show strong improvement on the test set across virtually all metrics and categories.
Overall Results
| Metric | Dev Set | Test Set |
|---|---|---|
| Overall WER | 14.9% | 9.8% |
| Overall CER | 8.0% | 5.0% |
| Avg Utterance WER | 16.4% | 10.6% |
| Avg Utterance CER | 9.3% | 5.4% |
The model generalizes well to unseen test data, suggesting it captures robust speech patterns rather than overfitting to development examples.
Results by Severity
| Severity | Dev WER | Test WER | Dev CER | Test CER |
|---|---|---|---|---|
| Mild | 13% | 8% | 6% | 4% |
| Moderate | 23% | 15% | 14% | 9% |
| Severe | 16% | 11% | 9% | 6% |
Notably, Moderate severity shows the highest error rates in both splits, likely reflecting specific speaker characteristics rather than a limitation tied strictly to severity level. The model shows consistent improvement across all severity buckets on the test set.
Results by Etiology
| Etiology | Dev WER | Test WER | Notes |
|---|---|---|---|
| Cerebral Palsy | 17% | 12% | β Improved |
| Neurological Disorder | 17% | 10% | β Improved |
| Parkinson's Disease | 26% | 7% | β Major improvement |
| Multiple Sclerosis | 10% | 17% | β Regressed |
Parkinson's Disease shows the most dramatic improvement (26% β 7% WER), likely reflecting speaker-specific differences between the dev and test splits β the test speaker may have milder symptoms or clearer articulation. Multiple Sclerosis is the only category where performance regressed (10% β 17% WER); however, with very small speaker counts (often N=1 per etiology), this is best attributed to individual speaker variability rather than a systematic weakness of the model.
Summary
Across both splits and all categories, the model achieves roughly a observable reduction in error rates from the development to the test set. It performs best on mild and severe impairment categories, and results across etiologies highlight the importance of speaker diversity in small-data settings β a known challenge in dysarthric and accented speech ASR research.
from transformers import pipeline
transcriber = pipeline("automatic-speech-recognition",
"smainye/whisper-small-kenyan-english-nonstandard")
# Transcribe Kenyan English audio
result = transcriber("path/to/your/audio.wav")
# Get the transcription text
print("Transcription:", result["text"])
Contact Information
- Downloads last month
- 29
Model tree for smainye/whisper-small-kenyan-english-nonstandard
Base model
openai/whisper-small