sauti-whisper-small-swh
This model is a fine-tuned version of openai/whisper-small on the WAXAL (WaxalNLP) dataset (config: swa_tts).
It achieves the following results on the evaluation set:
- Loss: 1.3956
- WER: 56.1205
Model description
openai/whisper-small fine-tuned for Swahili ASR as part of the Sauti project at MsingiAI.
Intended uses & limitations
Intended use: Automatic speech recognition for Swahili.
Limitations:
- The training data comes from a dataset originally curated for TTS; performance may not generalize well to noisy, conversational, or code-switched audio.
- Whisper models have a maximum decoder target length (448 tokens). Long transcripts were truncated during preprocessing.
Training and evaluation data
- Dataset:
google/WaxalNLP - Config:
swa_tts - Splits: train+validation for training, test for evaluation
Training procedure
Training hyperparameters
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: AdamW
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 200
- training_steps: 1500
- mixed_precision_training: fp16 (Native AMP)
Training results
| Training Loss | Epoch | Step | Validation Loss | WER |
|---|---|---|---|---|
| 1.4954 | 2.5253 | 250 | 1.4282 | 59.3974 |
| 1.1445 | 5.0505 | 500 | 1.2585 | 55.7721 |
| 0.8363 | 7.5758 | 750 | 1.2555 | 55.9981 |
| 0.6360 | 10.1010 | 1000 | 1.3102 | 55.7815 |
| 0.4057 | 12.6263 | 1250 | 1.3652 | 55.9416 |
| 0.3618 | 15.1515 | 1500 | 1.3956 | 56.1205 |
Framework versions
- Transformers 5.2.0
- Pytorch 2.10.0+cu128
- Datasets 4.5.0
- Tokenizers 0.22.2
- Downloads last month
- 2
Model tree for korir8/sauti-whisper-small-swh
Base model
openai/whisper-smallDataset used to train korir8/sauti-whisper-small-swh
Evaluation results
- WER on WAXAL (WaxalNLP)self-reported56.121