Whisper Medium โ€” Kreol Morisien (In-Domain Fine-Tuned)

This model is a fine-tuned version of openai/whisper-medium on 5 hours of in-domain Kreol Morisien (Mauritian Creole) audio data.

Model Details

Parameter Value
Base Model openai/whisper-medium
Parameters ~769M
Language Kreol Morisien (mfe)
Task Transcription
Best WER 16.49%

Training Data

  • Domain: In-domain Kreol Morisien speech
  • Total Duration: ~5 hours
  • Split: 90% train / 10% validation
  • Format: 16kHz mono WAV
  • Preprocessing: Text normalization (lowercased, whitespace-collapsed, smart quotes replaced)

Training Configuration

Hyperparameter Value
Epochs 15
Batch Size (per device) 8
Gradient Accumulation Steps 4
Effective Batch Size 32
Learning Rate 3e-5
LR Scheduler Cosine
Warmup Steps 100
Weight Decay 0.1
Precision bf16
Generation Beams (eval) 1 (greedy)
Best Checkpoint Step 300

Training Results

Step Training Loss Validation Loss WER
50 2.7217 0.5670 0.2318
100 0.2614 0.4051 0.1988
150 0.0864 0.4268 0.1772
200 0.0417 0.4443 0.2104
250 0.0189 0.4487 0.1703
300 0.0046 0.4557 0.1649
315 0.0046 0.4559 0.1649

Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf

processor = WhisperProcessor.from_pretrained("Shagufta/whisper-medium-km-indomain5-withoutac")
model = WhisperForConditionalGeneration.from_pretrained("Shagufta/whisper-medium-km-indomain5-withoutac")

speech, sr = sf.read("audio.wav", dtype="float32")

input_features = processor.feature_extractor(
    speech, sampling_rate=16000, return_tensors="pt"
).input_features

predicted_ids = model.generate(
    input_features,
    max_length=256,
    num_beams=5,
)

transcription = processor.tokenizer.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Model Config

model.config.forced_decoder_ids = None
model.config.suppress_tokens = []

Limitations

  • Trained on only 5 hours of in-domain data โ€” may not generalize well to out-of-domain Kreol Morisien speech.
  • Kreol Morisien is not an officially supported Whisper language, so no language token is set.
  • Signs of overfitting observed (training loss near 0, validation loss increasing after step 100). Best checkpoint selected by WER.

Framework

  • Transformers (Seq2SeqTrainer)
  • Evaluate (WER metric)
Downloads last month
45
Safetensors
Model size
0.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Shagufta/whisper-medium-km-indomain5-withoutac

Finetuned
(868)
this model

Evaluation results