Fine-tuned Whisper Small Nonstandard Kenyan Swahili π°πͺ
Fine-tuned version of openai/whisper-small optimized for non-standard Kenyan Swahili speech, including speakers with speech impairments across varying severity levels and etiologies.
Key Features
- π― Specialized for non-standard Kenyan Swahili accents and speech patterns
- π Trained on non-standard Kenyan Swahili speech data from
cdli/kenyan_swahili_nonstandard_speech_v0.9 - β‘ ~5.7% relative improvement over baseline on test set WER
- ποΈ Best performance among experimental configurations
- π§ Strong generalization from development to test set
Performance vs. Baseline
| Metric | Baseline | This Model (Run 3) |
|---|---|---|
| Test WER | 31.4% | 29.6% |
| Test CER | 12.2% | 11.8% |
Development vs. Test Set Evaluation
The model was evaluated on both the development (272 examples) and test (554 examples) splits of cdli/kenyan_swahili_nonstandard_speech_v0.9 to assess generalization to unseen data.
Overall Results
| Metric | Dev Set | Test Set | Improvement |
|---|---|---|---|
| Overall WER | 35.6% | 30.0% | -5.6% |
| Overall CER | 15.1% | 11.9% | -3.2% |
The model performs meaningfully better on the test set than the development set, suggesting it generalizes well to unseen speakers rather than overfitting to development examples. The gap may also partly reflect slightly less challenging audio conditions or impairment distributions in the test split.
Results by Severity
| Severity | Dev WER | Test WER | Trend |
|---|---|---|---|
| Mild | 33% | 25% | β Improved |
| Moderate | 45% | 32% | β Improved |
| Severe | 28% | 35% | β Degraded |
An interesting pattern emerges across the two splits. In the development set, "Severe" cases surprisingly outperformed "Mild" ones β an anomaly likely driven by specific speaker outliers (e.g., speaker KES006). The test set corrects this, following the expected pattern where Mild < Moderate < Severe in terms of error rate. The one concern is that performance on Severe cases actually worsened (28% β 35% WER), indicating the test set contains more challenging severe-impairment examples that the model has not fully learned to handle.
Results by Etiology
| Etiology | Dev WER | Test WER | Notes |
|---|---|---|---|
| Cerebral Palsy | 40% | 34% | β Improved |
| Multiple Sclerosis | 9% | 36% | β Major regression |
| Neurodevelopmental Disorder | 42% | 30% | β Improved |
| Parkinson's Disease | 39% | 18% | β Major improvement |
Parkinson's Disease shows the most striking gain (39% β 18% WER), likely reflecting differences between the dev and test speakers rather than a systematic model strength β the test speaker may present with milder or more consistent symptoms. Conversely, Multiple Sclerosis shows the sharpest regression (9% β 36% WER). Cerebral Palsy and Neurodevelopmental Disorders follow the overall positive trend with consistent 6β12 point WER reductions.
Summary
Across both evaluation splits, the model validates its capability with a 30% overall WER on the test set. The severity anomaly present in development data resolved itself on the test set, providing a more reliable picture of how the model handles varying impairment levels. The key takeaway from the etiology analysis is that individual speaker characteristics remain the primary driver of performance variance β a known challenge in low-resource, dysarthric speech recognition where per-etiology speaker counts are small. Broader speaker diversity in future training data would be the most impactful path to reducing this variance.
Usage
from transformers import pipeline
transcriber = pipeline("automatic-speech-recognition",
"smainye/whisper-small-kenyan-swahili-nonstandard")
# Transcribe Kenyan Swahili audio
result = transcriber("path/to/your/audio.wav")
# Get the transcription text
print("Transcription:", result["text"])
Contact Information
- Downloads last month
- 35
Model tree for smainye/whisper-small-kenyan-swahili-nonstandard
Base model
openai/whisper-small