Whisper Medium — Kreol Morisien (In-Domain Fine-Tuned)

This model is a fine-tuned version of openai/whisper-medium on 5 hours of in-domain Kreol Morisien (Mauritian Creole) audio data.

Model Details

Parameter	Value
Base Model	`openai/whisper-medium`
Parameters	~769M
Language	Kreol Morisien (mfe)
Task	Transcription
Best WER	16.49%

Training Data

Domain: In-domain Kreol Morisien speech
Total Duration: ~5 hours
Split: 90% train / 10% validation
Format: 16kHz mono WAV
Preprocessing: Text normalization (lowercased, whitespace-collapsed, smart quotes replaced)

Training Configuration

Hyperparameter	Value
Epochs	15
Batch Size (per device)	8
Gradient Accumulation Steps	4
Effective Batch Size	32
Learning Rate	3e-5
LR Scheduler	Cosine
Warmup Steps	100
Weight Decay	0.1
Precision	bf16
Generation Beams (eval)	1 (greedy)
Best Checkpoint	Step 300

Training Results

Step	Training Loss	Validation Loss	WER
50	2.7217	0.5670	0.2318
100	0.2614	0.4051	0.1988
150	0.0864	0.4268	0.1772
200	0.0417	0.4443	0.2104
250	0.0189	0.4487	0.1703
300	0.0046	0.4557	0.1649
315	0.0046	0.4559	0.1649

Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf

processor = WhisperProcessor.from_pretrained("Shagufta/whisper-medium-km-indomain5-withoutac")
model = WhisperForConditionalGeneration.from_pretrained("Shagufta/whisper-medium-km-indomain5-withoutac")

speech, sr = sf.read("audio.wav", dtype="float32")

input_features = processor.feature_extractor(
    speech, sampling_rate=16000, return_tensors="pt"
).input_features

predicted_ids = model.generate(
    input_features,
    max_length=256,
    num_beams=5,
)

transcription = processor.tokenizer.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Model Config

model.config.forced_decoder_ids = None
model.config.suppress_tokens = []

Limitations

Trained on only 5 hours of in-domain data — may not generalize well to out-of-domain Kreol Morisien speech.
Kreol Morisien is not an officially supported Whisper language, so no language token is set.
Signs of overfitting observed (training loss near 0, validation loss increasing after step 100). Best checkpoint selected by WER.

Framework

Transformers (Seq2SeqTrainer)
Evaluate (WER metric)

Downloads last month: 45

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for Shagufta/whisper-medium-km-indomain5-withoutac

Base model

openai/whisper-medium

Finetuned

(868)

this model

Evaluation results

WER on Kreol Morisien In-Domain (5h)
self-reported

16.490