YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
LexiCore Wav2Vec2 XLS-R 300M CTC — শব্দতরী Bangla Dialect ASR
This model is a fine-tuned version ofarijitx/wav2vec2-xls-r-300m-bengali
for the “শব্দতরী: Where Dialects Flow into Bangla” competition.
- Task: dialectal Bangla speech → standard Bangla text
- Data: 3,350 audio clips from 20 regions of Bangladesh (competition dataset only)
- Metric: Normalized Levenshtein Similarity (char-level)
- Decoding: CTC + 5-gram KenLM (
pyctcdecode) + small punctuation rule - Training:
- 20 epochs
- LR = 1e-4
- Batch size ≈ 8 (4 × 2 grad accumulation)
- Strong waveform augmentations (speed, gain, noise, time-drop)
Intended Use
- Research and experimentation on Bangla ASR for low-resource and dialectal settings
- Non-commercial applications, respecting the original competition and dataset license
Limitations
- Trained only on short, scripted sentences from 20 Bangladeshi regions
- May not generalize to very long utterances, noisy real-world audio, or code-switching
- Output is in standard written Bangla, not dialect spelling
Usage (pseudo-code)
from transformers import Wav2Vec2Processor, AutoModelForCTC
import torch, torchaudio
processor = Wav2Vec2Processor.from_pretrained("your-username/your-repo")
model = AutoModelForCTC.from_pretrained("your-username/your-repo").to("cuda").eval()
waveform, sr = torchaudio.load("example.wav")
# resample to 16k if needed...
inputs = processor(waveform.squeeze().numpy(), sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
logits = model(inputs.input_values.to("cuda")).logits
pred_ids = torch.argmax(logits, dim=-1)
transcript = processor.batch_decode(pred_ids)[0]
print(transcript)
- Downloads last month
- 4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support