Whisper Small β Bengali-English Code-Switching ASR
This model is a fine-tuned version of openai/whisper-small for automatic speech recognition (ASR) on Bengali-English code-switched audio.
It is trained to transcribe audio clips where the speaker switches between Bengali and English in natural conversation.
π§ Model Details
- Base Model:
openai/whisper-small - Languages: Bengali (bn), English (en)
- Fine-tuning task: Speech-to-text transcription
- Use case: Lecture notes, interviews, social media, bilingual speech transcription
- Training samples: 194 manually prepared code-switching audio chunks (~30s each)
π Evaluation
| Metric | Score |
|---|---|
| WER | 0.4123 |
| CER | (your CER here) |
Evaluation was done on a 10% held-out validation set from the original dataset.
π Files
config.json,pytorch_model.bin: Fine-tuned weightstokenizer.json,vocab.json,merges.txt: Whisper tokenizerpreprocessor_config.json: Feature extractor config
π‘ Usage
You can use the model directly with transformers:
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio
model_id = "YOUR_USERNAME/whisper-small-benglish"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)
# Load audio and resample to 16kHz if needed
waveform, sr = torchaudio.load("your-audio.wav")
if sr != 16000:
resampler = torchaudio.transforms.Resample(sr, 16000)
waveform = resampler(waveform)
inputs = processor(waveform.squeeze().numpy(), sampling_rate=16000, return_tensors="pt")
predicted_ids = model.generate(inputs.input_features)
text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
- Downloads last month
- 34