sauti-whisper-small-swh

This model is a fine-tuned version of openai/whisper-small on the WAXAL (WaxalNLP) dataset (config: swa_tts).

It achieves the following results on the evaluation set:

Model description

openai/whisper-small fine-tuned for Swahili ASR as part of the Sauti project at MsingiAI.

Intended use: Automatic speech recognition for Swahili.

Limitations:

The training data comes from a dataset originally curated for TTS; performance may not generalize well to noisy, conversational, or code-switched audio.
Whisper models have a maximum decoder target length (448 tokens). Long transcripts were truncated during preprocessing.

Training Loss	Epoch	Step	Validation Loss	WER
1.4954	2.5253	250	1.4282	59.3974
1.1445	5.0505	500	1.2585	55.7721
0.8363	7.5758	750	1.2555	55.9981
0.6360	10.1010	1000	1.3102	55.7815
0.4057	12.6263	1250	1.3652	55.9416
0.3618	15.1515	1500	1.3956	56.1205

Safetensors

Model size

0.2B params

Tensor type

F32

Base model

Finetuned

this model