Whisper Small Fine-tuned on Tunisian Dialect (TEDxTN)
This model is a fine-tuned version of openai/whisper-small on the TEDxTN dataset. It was trained to transcribe Tunisian dialect (Derja/Arabizi).
Model Description
- Model: openai/whisper-small
- Language: Tunisian Arabic (Derja)
- Dataset: TEDxTN (~22 hours)
Evaluation Results
| Metric | Score |
|---|---|
| WER | 37.99% |
| CER | 18.77% |
Usage
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model_id = "medfadiabaidi/whisper-small-tunisian-asr"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)
# audio_input = ... # Load your audio here
# inputs = processor(audio_input, return_tensors="pt")
# generated_ids = model.generate(inputs.input_features)
# transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
Training Details
- Epochs: 15
- Batch Size: 64
- Learning Rate: 1e-05
- Downloads last month
- 1
Evaluation results
- WER on TEDxTNtest set self-reported37.990
- CER on TEDxTNtest set self-reported18.770