YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ECAPA-TDNN Fine-tuned for Bengali Speaker Diarization

BUET DL Sprint 4.0 - Problem 2

Model Description

ECAPA-TDNN speaker embedding model fine-tuned on Bengali talkshow audio for speaker diarization. Base model: speechbrain/spkrec-ecapa-voxceleb.

Usage

# In the nuclear diarization notebook, this model is loaded automatically:
from huggingface_hub import hf_hub_download
ckpt = hf_hub_download("nymtheescobar/ecapa-bengali-finetuned", "ecapa_bengali_finetuned.pt")
state_dict = torch.load(ckpt)
ecapa.mods.embedding_model.load_state_dict(state_dict, strict=False)

Training Data

  • Competition data: 10 files, ~10h, ~170 speakers (ground truth)
  • Bengali Talkshow Dataset: 207 speakers from 30 files auto-labeled with pyannote 3.1
  • Total: 318 speakers, 9457 segments

Training

  • Loss: Triplet (margin=0.3)
  • Epochs: 5, LR: 0.0001
  • Segment length: 3.0s

Results

  • Pretrained DER: 0.6140
  • Fine-tuned DER: 0.5668
  • Improvement: 0.0472

Dataset

https://huggingface.co/datasets/nymtheescobar/bengali-talkshow-audio

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support