sauti-whisper-small-swh

This model is a fine-tuned version of openai/whisper-small on the WAXAL (WaxalNLP) dataset (config: swa_tts).

It achieves the following results on the evaluation set:

  • Loss: 1.3956
  • WER: 56.1205

Model description

openai/whisper-small fine-tuned for Swahili ASR as part of the Sauti project at MsingiAI.

Intended uses & limitations

Intended use: Automatic speech recognition for Swahili.

Limitations:

  • The training data comes from a dataset originally curated for TTS; performance may not generalize well to noisy, conversational, or code-switched audio.
  • Whisper models have a maximum decoder target length (448 tokens). Long transcripts were truncated during preprocessing.

Training and evaluation data

  • Dataset: google/WaxalNLP
  • Config: swa_tts
  • Splits: train+validation for training, test for evaluation

Training procedure

Training hyperparameters

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: AdamW
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 200
  • training_steps: 1500
  • mixed_precision_training: fp16 (Native AMP)

Training results

Training Loss Epoch Step Validation Loss WER
1.4954 2.5253 250 1.4282 59.3974
1.1445 5.0505 500 1.2585 55.7721
0.8363 7.5758 750 1.2555 55.9981
0.6360 10.1010 1000 1.3102 55.7815
0.4057 12.6263 1250 1.3652 55.9416
0.3618 15.1515 1500 1.3956 56.1205

Framework versions

  • Transformers 5.2.0
  • Pytorch 2.10.0+cu128
  • Datasets 4.5.0
  • Tokenizers 0.22.2
Downloads last month
2
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for korir8/sauti-whisper-small-swh

Finetuned
(3443)
this model

Dataset used to train korir8/sauti-whisper-small-swh

Evaluation results