GitHub

Fine-tuned Whisper Small Nonstandard Kenyan English πŸ‡°πŸ‡ͺ

Fine-tuned version of openai/whisper-small optimized for non-standard Kenyan English speech, including speakers with speech impairments across varying severity levels and etiologies.

Key Features

  • 🎯 Specialized for non-standard Kenyan accents and speech patterns
  • πŸ“Š Trained on non-standard Kenyan English speech data from cdli/kenyan_english_nonstandard_speech_v0.9
  • ⚑ ~12% relative improvement over baseline on test set WER
  • πŸŽ™οΈ Best performance among experimental configurations
  • 🧠 Robust generalization from development to test set

Performance vs. Baseline

Metric Baseline This Model
Test WER 12.3% 10.8%
Test CER 6.5% 5.6%

Development vs. Test Set Evaluation

To assess generalization, the model was evaluated on both the development(342 examples) and test(705 examples) splits of cdli/kenyan_english_nonstandard_speech_v0.9. Results show strong improvement on the test set across virtually all metrics and categories.

Overall Results

Metric Dev Set Test Set
Overall WER 14.9% 9.8%
Overall CER 8.0% 5.0%
Avg Utterance WER 16.4% 10.6%
Avg Utterance CER 9.3% 5.4%

The model generalizes well to unseen test data, suggesting it captures robust speech patterns rather than overfitting to development examples.

Results by Severity

Severity Dev WER Test WER Dev CER Test CER
Mild 13% 8% 6% 4%
Moderate 23% 15% 14% 9%
Severe 16% 11% 9% 6%

Notably, Moderate severity shows the highest error rates in both splits, likely reflecting specific speaker characteristics rather than a limitation tied strictly to severity level. The model shows consistent improvement across all severity buckets on the test set.

Results by Etiology

Etiology Dev WER Test WER Notes
Cerebral Palsy 17% 12% βœ… Improved
Neurological Disorder 17% 10% βœ… Improved
Parkinson's Disease 26% 7% βœ… Major improvement
Multiple Sclerosis 10% 17% ❌ Regressed

Parkinson's Disease shows the most dramatic improvement (26% β†’ 7% WER), likely reflecting speaker-specific differences between the dev and test splits β€” the test speaker may have milder symptoms or clearer articulation. Multiple Sclerosis is the only category where performance regressed (10% β†’ 17% WER); however, with very small speaker counts (often N=1 per etiology), this is best attributed to individual speaker variability rather than a systematic weakness of the model.

Summary

Across both splits and all categories, the model achieves roughly a observable reduction in error rates from the development to the test set. It performs best on mild and severe impairment categories, and results across etiologies highlight the importance of speaker diversity in small-data settings β€” a known challenge in dysarthric and accented speech ASR research.

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", 
                      "smainye/whisper-small-kenyan-english-nonstandard")
                       
# Transcribe Kenyan English audio
result = transcriber("path/to/your/audio.wav")

# Get the transcription text
print("Transcription:", result["text"])

Contact Information

Linktree

Downloads last month
29
Safetensors
Model size
0.2B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support

Model tree for smainye/whisper-small-kenyan-english-nonstandard

Finetuned
(3446)
this model

Dataset used to train smainye/whisper-small-kenyan-english-nonstandard

Collection including smainye/whisper-small-kenyan-english-nonstandard