Whisper Small Fine-tuned on Tunisian Dialect (TEDxTN)

This model is a fine-tuned version of openai/whisper-small on the TEDxTN dataset. It was trained to transcribe Tunisian dialect (Derja/Arabizi).

Model Description

  • Model: openai/whisper-small
  • Language: Tunisian Arabic (Derja)
  • Dataset: TEDxTN (~22 hours)

Evaluation Results

Metric Score
WER 37.99%
CER 18.77%

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor

model_id = "medfadiabaidi/whisper-small-tunisian-asr"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

# audio_input = ... # Load your audio here
# inputs = processor(audio_input, return_tensors="pt")
# generated_ids = model.generate(inputs.input_features)
# transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

Training Details

  • Epochs: 15
  • Batch Size: 64
  • Learning Rate: 1e-05
Downloads last month
1
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results