Dikkte ASR — Wolof Speech Recognition

Wolof speech-to-text model. Built by fine-tuning openai/whisper-small with LoRA adapters, then merging everything into a single model. This repo has the full merged weights — just pip install transformers and go.

Wolof is spoken by 10M+ people across Senegal, Gambia, and Mauritania but barely has any open ASR tools.

Quick start

Install

pip install transformers torch torchaudio

Transcribe audio (3 lines)

from transformers import pipeline

pipe = pipeline("automatic-speech-recognition", model="utachicodes/dikkte-wolof-asr")
print(pipe("audio.wav")["text"])

Full control

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torchaudio, torch

processor = WhisperProcessor.from_pretrained("utachicodes/dikkte-wolof-asr")
model = WhisperForConditionalGeneration.from_pretrained("utachicodes/dikkte-wolof-asr")
model.eval()

waveform, sr = torchaudio.load("your_audio.wav")
if sr != 16000:
    waveform = torchaudio.transforms.Resample(sr, 16000)(waveform)

inputs = processor(waveform.squeeze().numpy(), sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    ids = model.generate(input_features=inputs.input_features)

text = processor.batch_decode(ids, skip_special_tokens=True)[0]
print(text)

GPU inference

import torch
model = WhisperForConditionalGeneration.from_pretrained(
    "utachicodes/dikkte-wolof-asr",
    torch_dtype=torch.float16,
    device_map="auto",
)

Web UI

Record from your mic and see the transcription live:

git clone https://github.com/utachicodes/dikkte_asr.git
cd dikkte_asr
pip install -r requirements.txt
python wolof_stt.py
# opens at http://127.0.0.1:7860

Metrics

Metric	Value
WER	57.68%
CER	37.17%

Evaluated on 500 test samples. This is a v1 on whisper-small — accuracy improves with more data, longer training, or a bigger base model.

How it was trained


Base model	`openai/whisper-small` (244M params)
Method	LoRA (rank 32, alpha 64)
Targets	q_proj, v_proj, k_proj, o_proj
Trainable params	5.3M (2.1% of total)
Dataset	`alfaDF9/asr-wolof-dataset-processed-v1`
Samples	10,380 train / 2,598 test
Effective batch	16 (2 x 8 grad accum)
Learning rate	1e-3
Epochs	3
Training loss	4.21 → 0.67
Precision	fp16 mixed
Hardware	RTX 3060 Laptop (6GB VRAM)
Time	~6 hours

Retrain it

git clone https://github.com/utachicodes/dikkte_asr.git
cd dikkte_asr
pip install -r requirements.txt
python train_wolof.py

Edit the config block at the top of train_wolof.py to tweak hyperparams.

Other Wolof ASR models

Model	Params	Notes
CAYTU/whosper-large-v2	1.5B	LoRA on whisper-large-v2, needs 12GB+ VRAM
dofbi/wolof-asr	244M	Full fine-tune, 12% WER reported
facebook/mms-1b-all	1B	Multilingual, has a Wolof adapter
dikkte (this)	244M	Merged LoRA on whisper-small, 6GB VRAM

License

MIT

Credits

alfaDF9 for the Wolof ASR dataset
CAYTU / Seydou Diallo for the whosper approach
OpenAI for Whisper

Downloads last month: 58

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for utachicodes/dikkte-wolof-asr

Base model

openai/whisper-small

Finetuned

(3442)

this model

Dataset used to train utachicodes/dikkte-wolof-asr

Evaluation results

WER on alfaDF9/asr-wolof-dataset-processed-v1
test set self-reported

57.680
CER on alfaDF9/asr-wolof-dataset-processed-v1
test set self-reported

37.170