Automatic Speech Recognition
Transformers
Safetensors
Morisyen
whisper
speech
kreol-morisien
mauritian-creole
fine-tuned
Eval Results (legacy)
Instructions to use Shagufta/whisper-medium-km-indomain20-withac with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Shagufta/whisper-medium-km-indomain20-withac with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="Shagufta/whisper-medium-km-indomain20-withac")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("Shagufta/whisper-medium-km-indomain20-withac") model = AutoModelForSpeechSeq2Seq.from_pretrained("Shagufta/whisper-medium-km-indomain20-withac") - Notebooks
- Google Colab
- Kaggle
Whisper Medium โ Kreol Morisien (In-Domain Fine-Tuned, With AC, 20h)
This model is a fine-tuned version of openai/whisper-medium on 20 hours of in-domain Kreol Morisien (Mauritian Creole) audio data, with accent conditioning.
Model Details
| Parameter | Value |
|---|---|
| Base Model | openai/whisper-medium |
| Parameters | ~769M |
| Language | Kreol Morisien (mfe) |
| Task | Transcription |
| Best WER | 11.92% |
Training Data
- Domain: In-domain Kreol Morisien speech
- Total Duration: ~20 hours
- Split: 90% train / 10% validation
- Format: 16kHz mono WAV
- Preprocessing: Text normalization (lowercased, whitespace-collapsed, smart quotes replaced)
Training Configuration
| Hyperparameter | Value |
|---|---|
| Epochs | 7 |
| Batch Size (per device) | 8 |
| Gradient Accumulation Steps | 4 |
| Effective Batch Size | 32 |
| Learning Rate | 3e-5 |
| LR Scheduler | Cosine |
| Warmup Steps | 200 |
| Weight Decay | 0.1 |
| Precision | bf16 |
| Generation Beams (eval) | 1 (greedy) |
| Best Checkpoint | Step 700 |
Training Results
| Step | Training Loss | Validation Loss | WER |
|---|---|---|---|
| 100 | 2.1428 | 0.3598 | 0.2017 |
| 200 | 0.8615 | 0.2757 | 0.1656 |
| 300 | 0.3892 | 0.2720 | 0.1397 |
| 400 | 0.1775 | 0.2743 | 0.1377 |
| 500 | 0.0715 | 0.2739 | 0.1402 |
| 600 | 0.0256 | 0.2821 | 0.1239 |
| 700 | 0.0082 | 0.2824 | 0.1192 |
| 742 | 0.0082 | 0.2828 | 0.1192 |
Usage
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf
processor = WhisperProcessor.from_pretrained("Shagufta/whisper-medium-km-indomain20-withac")
model = WhisperForConditionalGeneration.from_pretrained("Shagufta/whisper-medium-km-indomain20-withac")
speech, sr = sf.read("audio.wav", dtype="float32")
input_features = processor.feature_extractor(
speech, sampling_rate=16000, return_tensors="pt"
).input_features
predicted_ids = model.generate(
input_features,
max_length=256,
num_beams=5,
)
transcription = processor.tokenizer.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
Model Config
model.config.forced_decoder_ids = None
model.config.suppress_tokens = []
Limitations
- Trained on 20 hours of in-domain data โ may not generalize well to out-of-domain Kreol Morisien speech.
- Kreol Morisien is not an officially supported Whisper language, so no language token is set.
- Signs of overfitting observed (training loss near 0 while validation loss plateaus). Best checkpoint selected by WER.
Framework
- Transformers (Seq2SeqTrainer)
- Evaluate (WER metric)
- Downloads last month
- -
Model tree for Shagufta/whisper-medium-km-indomain20-withac
Base model
openai/whisper-mediumEvaluation results
- WER on Kreol Morisien In-Domain (20h)self-reported11.920