Whisper Small — Kikuyu (Gikuyu) V4

Fine-tuned openai/whisper-small for Kikuyu (Gikuyu) automatic speech recognition.

Model Details

Base model: openai/whisper-small (244M params)
Fine-tuning: LoRA r=32, alpha=64, targets: q/k/v/out_proj/fc1/fc2
Dataset: google/WaxalNLP kik_tts (~1,226 train samples, ~6.5 hours)
Diacritics: Preserved (canonical ĩ, ũ)
Training: 1,500 steps, cosine LR schedule, SpecAugment, label smoothing 0.1

Results

Split	WER	CER
Eval	78.2%	40.4%
Test	83.6%	47.8%

Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration

processor = WhisperProcessor.from_pretrained("Kiragu/whisper-small-kikuyu-v4")
model = WhisperForConditionalGeneration.from_pretrained("Kiragu/whisper-small-kikuyu-v4")

input_features = processor(audio_array, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

Limitations

WER is still high (78-84%) — experimental model
Only ~6.5 hours of training data
Short phrases work better than long sentences
May hallucinate/repeat on longer audio

Part of Sauti AI

This model is part of Sauti AI, a decentralized vocal data collection platform for underrepresented languages.

Downloads last month: 79

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for Kiragu/whisper-small-kikuyu-v4

Base model

openai/whisper-small

Adapter

(219)

this model

Dataset used to train Kiragu/whisper-small-kikuyu-v4

Evaluation results

Test WER on WAXAL Kikuyu TTS
test set self-reported

83.600
Test CER on WAXAL Kikuyu TTS
test set self-reported

47.800