wav2vec2-xls-r-300m โ Swahili ASR (200 hours)
This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m for Swahili automatic speech recognition. It was originally trained by the ASR Africa research team and is hosted here for use as a base model for further fine-tuning on Luganda and other Bantu languages.
Model Performance
Evaluated on a combined held-out test set from CV + Fleurs + AMMI + ALFFA (Swahili):
| Metric | Score |
|---|---|
| WER | 13.73% |
| CER | 4.54% |
Training Hyperparameters
- Learning rate: 3e-4
- Train batch size: 8 (effective: 16 with gradient accumulation)
- Epochs: 100
- Optimizer: AdamW
- Mixed precision: FP16
How to Use
import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
model_id = "sulaimank/wav2vec2-xlsr-CV_Fleurs_AMMI_ALFFA-swahili-200hrs"
processor = Wav2Vec2Processor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)
# load your audio at 16kHz
input_values = processor(audio_array, sampling_rate=16000, return_tensors="pt").input_values
with torch.no_grad():
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
print(transcription)
- Downloads last month
- 35