mlx-community/diar_sortformer_4spk-v1-fp16
This model was converted to MLX format from nvidia/diar_sortformer_4spk-v1 using mlx-audio version 0.3.2.
Refer to the original model card for more details on the model.
Use with mlx-audio
pip install -U mlx-audio
Python Example β Offline Inference:
from mlx_audio.vad import load
model = load("mlx-community/diar_sortformer_4spk-v1-fp16")
result = model.generate("meeting.wav", threshold=0.5, verbose=True)
print(result.text)
for seg in result.segments:
print(f"Speaker {seg.speaker}: {seg.start:.2f}s - {seg.end:.2f}s")
Python Example β Streaming Inference:
from mlx_audio.vad import load
model = load("mlx-community/diar_sortformer_4spk-v1-fp16")
for result in model.generate_stream("meeting.wav", chunk_duration=5.0):
for seg in result.segments:
print(f"Speaker {seg.speaker}: {seg.start:.2f}s - {seg.end:.2f}s")
Python Example β Real-time Microphone Streaming:
from mlx_audio.vad import load
model = load("mlx-community/diar_sortformer_4spk-v1-fp16")
state = model.init_streaming_state()
for chunk in mic_stream(): # your audio source
result, state = model.feed(chunk, state, sample_rate=16000)
for seg in result.segments:
print(f"Speaker {seg.speaker}: {seg.start:.2f}s - {seg.end:.2f}s")
Model Details
- Architecture: FastConformer (18 layers) + Transformer Encoder (18 layers) + Sortformer Modules
- Mel bins: 80
- Max speakers: 4
- Input: 16kHz mono audio
- Output: Per-frame speaker activity probabilities
Ported from NVIDIA NeMo SortformerEncLabelModel.
- Downloads last month
- 67
Hardware compatibility
Log In to add your hardware
Quantized
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for mlx-community/diar_sortformer_4spk-v1-fp16
Base model
nvidia/diar_sortformer_4spk-v1