Automatic Speech Recognition
Transformers
Safetensors
Morisyen
whisper
speech
kreol-morisien
mauritian-creole
fine-tuned
Eval Results (legacy)
Instructions to use Shagufta/whisper-medium-km-indomain10-withac with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Shagufta/whisper-medium-km-indomain10-withac with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="Shagufta/whisper-medium-km-indomain10-withac")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("Shagufta/whisper-medium-km-indomain10-withac") model = AutoModelForSpeechSeq2Seq.from_pretrained("Shagufta/whisper-medium-km-indomain10-withac") - Notebooks
- Google Colab
- Kaggle
Whisper Medium โ Kreol Morisien (In-Domain Fine-Tuned, With AC, 10h)
This model is a fine-tuned version of openai/whisper-medium on 10 hours of in-domain Kreol Morisien (Mauritian Creole) audio data, with accent conditioning.
Model Details
| Parameter | Value |
|---|---|
| Base Model | openai/whisper-medium |
| Parameters | ~769M |
| Language | Kreol Morisien (mfe) |
| Task | Transcription |
| Best WER | 14.32% |
Training Data
- Domain: In-domain Kreol Morisien speech
- Total Duration: ~10 hours
- Split: 90% train / 10% validation
- Format: 16kHz mono WAV
- Preprocessing: Text normalization (lowercased, whitespace-collapsed, smart quotes replaced)
Training Configuration
| Hyperparameter | Value |
|---|---|
| Epochs | 10 |
| Batch Size (per device) | 8 |
| Gradient Accumulation Steps | 4 |
| Effective Batch Size | 32 |
| Learning Rate | 3e-5 |
| LR Scheduler | Cosine |
| Warmup Steps | 200 |
| Weight Decay | 0.1 |
| Precision | bf16 |
| Generation Beams (eval) | 1 (greedy) |
| Best Checkpoint | Step 300 |
Training Results
| Step | Training Loss | Validation Loss | WER |
|---|---|---|---|
| 100 | 1.7493 | 0.3540 | 0.1931 |
| 200 | 0.2670 | 0.3534 | 0.1647 |
| 300 | 0.0985 | 0.3570 | 0.1432 |
| 400 | 0.0161 | 0.3488 | 0.1489 |
| 440 | 0.0161 | 0.3511 | 0.1488 |
Usage
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf
processor = WhisperProcessor.from_pretrained("Shagufta/whisper-medium-km-indomain10-withac")
model = WhisperForConditionalGeneration.from_pretrained("Shagufta/whisper-medium-km-indomain10-withac")
speech, sr = sf.read("audio.wav", dtype="float32")
input_features = processor.feature_extractor(
speech, sampling_rate=16000, return_tensors="pt"
).input_features
predicted_ids = model.generate(
input_features,
max_length=256,
num_beams=5,
)
transcription = processor.tokenizer.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
Model Config
model.config.forced_decoder_ids = None
model.config.suppress_tokens = []
Limitations
- Trained on 10 hours of in-domain data โ may not generalize well to out-of-domain Kreol Morisien speech.
- Kreol Morisien is not an officially supported Whisper language, so no language token is set.
- Signs of overfitting observed (training loss near 0 while validation loss plateaus). Best checkpoint selected by WER.
Framework
- Transformers (Seq2SeqTrainer)
- Evaluate (WER metric)
- Downloads last month
- 42
Model tree for Shagufta/whisper-medium-km-indomain10-withac
Base model
openai/whisper-mediumEvaluation results
- WER on Kreol Morisien In-Domain (10h)self-reported14.320