Conv-TasNet Speaker Separation (ONNX)

Speaker separation model for splitting overlapping speech into individual speaker streams. Exported from SpeechBrain's pretrained SepFormer on WSJ0-2Mix.

Available Models

File	Format	Size	SNR vs FP32	Notes
`conv_tasnet_libri2mix_int8.onnx`	QUInt8	29.7 MB	22.6 dB	Recommended — 71% smaller, negligible quality loss
`conv_tasnet_libri2mix.onnx`	FP32	101.2 MB	baseline	Full precision reference

Model Details

Property	Value
Architecture	SepFormer (encoder + Transformer masknet + decoder)
Training data	WSJ0-2Mix
Sample rate	8kHz mono
Input	`mixture` — `[batch, time]` float32 waveform
Output	`separated` — `[batch, 2, time]` float32 (2 separated sources)
ONNX opset	17
Parameters	~26M
Quantization	Dynamic QUInt8 on MatMul ops (Transformer attention weights)

Usage (Python)

import onnxruntime as ort
import numpy as np

# Use int8 model (recommended)
sess = ort.InferenceSession("conv_tasnet_libri2mix_int8.onnx")

# Input: mono waveform at 8kHz
mixture = np.random.randn(1, 8000).astype(np.float32)  # 1 second
separated = sess.run(None, {"mixture": mixture})[0]

# separated.shape = (1, 2, 8000) — two speaker sources
source_1 = separated[0, 0, :]
source_2 = separated[0, 1, :]

Usage (Rust / ort)

use ort::session::Session;
use ndarray::Array2;

let session = Session::builder()?.commit_from_file("conv_tasnet_libri2mix_int8.onnx")?;
let input = Array2::<f32>::zeros((1, 8000)); // 1s at 8kHz
let outputs = session.run(ort::inputs![input.view()])?;
// outputs[0] shape: [1, 2, 8000]

Integration in Second Brain

Used by core-asr for real-time speaker separation during detected speech overlap:

System audio (16kHz) → overlap detected?
  ├─ NO  → normal single-speaker path
  └─ YES → resample 16→8kHz → SepFormer → 2 sources → resample 8→16kHz
           → WeSpeaker embedding per source → cluster → per-speaker decode

Auto-downloaded via ensure_moonshine_models() from Mazino0/conv-tasnet-onnx.

Export & Quantization

pip install speechbrain torch onnx onnxruntime

# Export FP32 model
python scripts/export_conv_tasnet.py

# Quantize to int8
python -c "
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic(
    'conv_tasnet_libri2mix.onnx',
    'conv_tasnet_libri2mix_int8.onnx',
    weight_type=QuantType.QUInt8,
    op_types_to_quantize=['MatMul'],
)
"

License

The SepFormer model weights are from SpeechBrain (Apache 2.0). WSJ0-2Mix data is from the Wall Street Journal corpus (LDC license).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support