Automatic Speech Recognition
Transformers
Safetensors
Morisyen
whisper
speech
kreol-morisien
mauritian-creole
fine-tuned
Eval Results (legacy)
Instructions to use Shagufta/whisper-medium-km-indomain5-withoutac with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Shagufta/whisper-medium-km-indomain5-withoutac with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="Shagufta/whisper-medium-km-indomain5-withoutac")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("Shagufta/whisper-medium-km-indomain5-withoutac") model = AutoModelForSpeechSeq2Seq.from_pretrained("Shagufta/whisper-medium-km-indomain5-withoutac") - Notebooks
- Google Colab
- Kaggle
Whisper Medium โ Kreol Morisien (In-Domain Fine-Tuned)
This model is a fine-tuned version of openai/whisper-medium on 5 hours of in-domain Kreol Morisien (Mauritian Creole) audio data.
Model Details
| Parameter | Value |
|---|---|
| Base Model | openai/whisper-medium |
| Parameters | ~769M |
| Language | Kreol Morisien (mfe) |
| Task | Transcription |
| Best WER | 16.49% |
Training Data
- Domain: In-domain Kreol Morisien speech
- Total Duration: ~5 hours
- Split: 90% train / 10% validation
- Format: 16kHz mono WAV
- Preprocessing: Text normalization (lowercased, whitespace-collapsed, smart quotes replaced)
Training Configuration
| Hyperparameter | Value |
|---|---|
| Epochs | 15 |
| Batch Size (per device) | 8 |
| Gradient Accumulation Steps | 4 |
| Effective Batch Size | 32 |
| Learning Rate | 3e-5 |
| LR Scheduler | Cosine |
| Warmup Steps | 100 |
| Weight Decay | 0.1 |
| Precision | bf16 |
| Generation Beams (eval) | 1 (greedy) |
| Best Checkpoint | Step 300 |
Training Results
| Step | Training Loss | Validation Loss | WER |
|---|---|---|---|
| 50 | 2.7217 | 0.5670 | 0.2318 |
| 100 | 0.2614 | 0.4051 | 0.1988 |
| 150 | 0.0864 | 0.4268 | 0.1772 |
| 200 | 0.0417 | 0.4443 | 0.2104 |
| 250 | 0.0189 | 0.4487 | 0.1703 |
| 300 | 0.0046 | 0.4557 | 0.1649 |
| 315 | 0.0046 | 0.4559 | 0.1649 |
Usage
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf
processor = WhisperProcessor.from_pretrained("Shagufta/whisper-medium-km-indomain5-withoutac")
model = WhisperForConditionalGeneration.from_pretrained("Shagufta/whisper-medium-km-indomain5-withoutac")
speech, sr = sf.read("audio.wav", dtype="float32")
input_features = processor.feature_extractor(
speech, sampling_rate=16000, return_tensors="pt"
).input_features
predicted_ids = model.generate(
input_features,
max_length=256,
num_beams=5,
)
transcription = processor.tokenizer.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
Model Config
model.config.forced_decoder_ids = None
model.config.suppress_tokens = []
Limitations
- Trained on only 5 hours of in-domain data โ may not generalize well to out-of-domain Kreol Morisien speech.
- Kreol Morisien is not an officially supported Whisper language, so no language token is set.
- Signs of overfitting observed (training loss near 0, validation loss increasing after step 100). Best checkpoint selected by WER.
Framework
- Transformers (Seq2SeqTrainer)
- Evaluate (WER metric)
- Downloads last month
- 45
Model tree for Shagufta/whisper-medium-km-indomain5-withoutac
Base model
openai/whisper-mediumEvaluation results
- WER on Kreol Morisien In-Domain (5h)self-reported16.490