You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

wav2vec2-xls-r-300m โ€” Swahili ASR (200 hours)

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m for Swahili automatic speech recognition. It was originally trained by the ASR Africa research team and is hosted here for use as a base model for further fine-tuning on Luganda and other Bantu languages.

Model Performance

Evaluated on a combined held-out test set from CV + Fleurs + AMMI + ALFFA (Swahili):

Metric Score
WER 13.73%
CER 4.54%

Training Hyperparameters

  • Learning rate: 3e-4
  • Train batch size: 8 (effective: 16 with gradient accumulation)
  • Epochs: 100
  • Optimizer: AdamW
  • Mixed precision: FP16

How to Use

import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

model_id = "sulaimank/wav2vec2-xlsr-CV_Fleurs_AMMI_ALFFA-swahili-200hrs"
processor = Wav2Vec2Processor.from_pretrained(model_id)
model     = Wav2Vec2ForCTC.from_pretrained(model_id)

# load your audio at 16kHz
input_values = processor(audio_array, sampling_rate=16000, return_tensors="pt").input_values

with torch.no_grad():
    logits = model(input_values).logits

predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
print(transcription)
Downloads last month
35
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sulaimank/wav2vec2-xlsr-CV_Fleurs_AMMI_ALFFA-swahili-200hrs

Finetuned
(833)
this model
Finetunes
1 model