Fine-tuned Whisper Small Nonstandard Kenyan Swahili 🇰🇪

Fine-tuned version of openai/whisper-small optimized for non-standard Kenyan Swahili speech, including speakers with speech impairments across varying severity levels and etiologies.

Key Features

🎯 Specialized for non-standard Kenyan Swahili accents and speech patterns
📊 Trained on non-standard Kenyan Swahili speech data from cdli/kenyan_swahili_nonstandard_speech_v0.9
⚡ ~5.7% relative improvement over baseline on test set WER
🎙️ Best performance among experimental configurations
🧠 Strong generalization from development to test set

Performance vs. Baseline

Metric	Baseline	This Model (Run 3)
Test WER	31.4%	29.6%
Test CER	12.2%	11.8%

Development vs. Test Set Evaluation

The model was evaluated on both the development (272 examples) and test (554 examples) splits of cdli/kenyan_swahili_nonstandard_speech_v0.9 to assess generalization to unseen data.

Overall Results

Metric	Dev Set	Test Set	Improvement
Overall WER	35.6%	30.0%	-5.6%
Overall CER	15.1%	11.9%	-3.2%

The model performs meaningfully better on the test set than the development set, suggesting it generalizes well to unseen speakers rather than overfitting to development examples. The gap may also partly reflect slightly less challenging audio conditions or impairment distributions in the test split.

Results by Severity

Severity	Dev WER	Test WER	Trend
Mild	33%	25%	✅ Improved
Moderate	45%	32%	✅ Improved
Severe	28%	35%	❌ Degraded

An interesting pattern emerges across the two splits. In the development set, "Severe" cases surprisingly outperformed "Mild" ones — an anomaly likely driven by specific speaker outliers (e.g., speaker KES006). The test set corrects this, following the expected pattern where Mild < Moderate < Severe in terms of error rate. The one concern is that performance on Severe cases actually worsened (28% → 35% WER), indicating the test set contains more challenging severe-impairment examples that the model has not fully learned to handle.

Results by Etiology

Etiology	Dev WER	Test WER	Notes
Cerebral Palsy	40%	34%	✅ Improved
Multiple Sclerosis	9%	36%	❌ Major regression
Neurodevelopmental Disorder	42%	30%	✅ Improved
Parkinson's Disease	39%	18%	✅ Major improvement

Parkinson's Disease shows the most striking gain (39% → 18% WER), likely reflecting differences between the dev and test speakers rather than a systematic model strength — the test speaker may present with milder or more consistent symptoms. Conversely, Multiple Sclerosis shows the sharpest regression (9% → 36% WER). Cerebral Palsy and Neurodevelopmental Disorders follow the overall positive trend with consistent 6–12 point WER reductions.

Summary

Across both evaluation splits, the model validates its capability with a 30% overall WER on the test set. The severity anomaly present in development data resolved itself on the test set, providing a more reliable picture of how the model handles varying impairment levels. The key takeaway from the etiology analysis is that individual speaker characteristics remain the primary driver of performance variance — a known challenge in low-resource, dysarthric speech recognition where per-etiology speaker counts are small. Broader speaker diversity in future training data would be the most impactful path to reducing this variance.

Usage

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition",
                       "smainye/whisper-small-kenyan-swahili-nonstandard")
                       
# Transcribe Kenyan Swahili audio
result = transcriber("path/to/your/audio.wav")

# Get the transcription text
print("Transcription:", result["text"])