Pyannote Segmentation 3.0 — GGUF

Native GGUF port of pyannote/segmentation-3.0 for speaker diarization.

Model details

Property	Value
Architecture	SincNet + 4× biLSTM + Linear + LogSoftmax
Format	GGUF (F32)
Size	5.7 MB
Tensors	41
Output classes	7 (powerset mapping → 3 speakers)
Input	10 s mono 16 kHz audio frames

The model performs joint voice-activity detection, speaker segmentation, and overlapped-speech detection on short audio chunks. Downstream clustering then produces full-file speaker diarization.

Usage with CrispASR

crispasr \
  --diarize-method pyannote \
  --sherpa-segment-model pyannote-seg-3.0.gguf \
  audio.wav

Provenance

Weights were exported directly from the original PyTorch checkpoint (pyannote/segmentation-3.0) into GGUF format, preserving full F32 precision across all 41 tensors.