YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
ECAPA-TDNN Fine-tuned for Bengali Speaker Diarization
BUET DL Sprint 4.0 - Problem 2
Model Description
ECAPA-TDNN speaker embedding model fine-tuned on Bengali talkshow audio
for speaker diarization. Base model: speechbrain/spkrec-ecapa-voxceleb.
Usage
# In the nuclear diarization notebook, this model is loaded automatically:
from huggingface_hub import hf_hub_download
ckpt = hf_hub_download("nymtheescobar/ecapa-bengali-finetuned", "ecapa_bengali_finetuned.pt")
state_dict = torch.load(ckpt)
ecapa.mods.embedding_model.load_state_dict(state_dict, strict=False)
Training Data
- Competition data: 10 files, ~10h, ~170 speakers (ground truth)
- Bengali Talkshow Dataset: 207 speakers from 30 files auto-labeled with pyannote 3.1
- Total: 318 speakers, 9457 segments
Training
- Loss: Triplet (margin=0.3)
- Epochs: 5, LR: 0.0001
- Segment length: 3.0s
Results
- Pretrained DER: 0.6140
- Fine-tuned DER: 0.5668
- Improvement: 0.0472
Dataset
https://huggingface.co/datasets/nymtheescobar/bengali-talkshow-audio
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support