nymtheescobar
/

ecapa-bengali-finetuned

Model card Files Files and versions

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ECAPA-TDNN Fine-tuned for Bengali Speaker Diarization

BUET DL Sprint 4.0 - Problem 2

Model Description

ECAPA-TDNN speaker embedding model fine-tuned on Bengali talkshow audio for speaker diarization. Base model: speechbrain/spkrec-ecapa-voxceleb.

Usage

# In the nuclear diarization notebook, this model is loaded automatically:
from huggingface_hub import hf_hub_download
ckpt = hf_hub_download("nymtheescobar/ecapa-bengali-finetuned", "ecapa_bengali_finetuned.pt")
state_dict = torch.load(ckpt)
ecapa.mods.embedding_model.load_state_dict(state_dict, strict=False)

Training Data

Competition data: 10 files, ~10h, ~170 speakers (ground truth)
Bengali Talkshow Dataset: 207 speakers from 30 files auto-labeled with pyannote 3.1
Total: 318 speakers, 9457 segments

Training

Loss: Triplet (margin=0.3)
Epochs: 5, LR: 0.0001
Segment length: 3.0s

Results

Pretrained DER: 0.6140
Fine-tuned DER: 0.5668
Improvement: 0.0472

Dataset

https://huggingface.co/datasets/nymtheescobar/bengali-talkshow-audio

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support