Whisper Medium — Kreol Morisien (In-Domain Fine-Tuned, With AC, 10h)

This model is a fine-tuned version of openai/whisper-medium on 10 hours of in-domain Kreol Morisien (Mauritian Creole) audio data, with accent conditioning.

Model Details

Parameter Value
Base Model openai/whisper-medium
Parameters ~769M
Language Kreol Morisien (mfe)
Task Transcription
Best WER 12.82%

Training Data

  • Domain: In-domain Kreol Morisien speech
  • Total Duration: ~15 hours
  • Split: 90% train / 10% validation
  • Format: 16kHz mono WAV
  • Preprocessing: Text normalization (lowercased, whitespace-collapsed, smart quotes replaced)

Training Configuration

Hyperparameter Value
Epochs 10
Batch Size (per device) 8
Gradient Accumulation Steps 4
Effective Batch Size 32
Learning Rate 3e-5
LR Scheduler Cosine
Warmup Steps 200
Weight Decay 0.1
Precision bf16
Generation Beams (eval) 1 (greedy)
Best Checkpoint Step 700

Training Results

Step Training Loss Validation Loss WER
100 1.9289 0.3816 0.1964
200 0.4691 0.3208 0.1579
300 0.2571 0.3163 0.1499
400 0.1096 0.3442 0.1494
500 0.0398 0.3280 0.1360
600 0.0179 0.3312 0.1292
700 0.0046 0.3404 0.1282
750 0.0041 0.3405 0.1317

Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf

processor = WhisperProcessor.from_pretrained("Shagufta/whisper-medium-km-indomain10-withac")
model = WhisperForConditionalGeneration.from_pretrained("Shagufta/whisper-medium-km-indomain10-withac")

speech, sr = sf.read("audio.wav", dtype="float32")

input_features = processor.feature_extractor(
    speech, sampling_rate=16000, return_tensors="pt"
).input_features

predicted_ids = model.generate(
    input_features,
    max_length=256,
    num_beams=5,
)

transcription = processor.tokenizer.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Model Config

model.config.forced_decoder_ids = None
model.config.suppress_tokens = []

Framework

  • Transformers (Seq2SeqTrainer)
  • Evaluate (WER metric)
Downloads last month
76
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Shagufta/whisper-medium-km-indomain15-withac

Finetuned
(868)
this model

Evaluation results