YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Conv-TasNet Speaker Separation (ONNX)
Speaker separation model for splitting overlapping speech into individual speaker streams. Exported from SpeechBrain's pretrained SepFormer on WSJ0-2Mix.
Available Models
| File | Format | Size | SNR vs FP32 | Notes |
|---|---|---|---|---|
conv_tasnet_libri2mix_int8.onnx |
QUInt8 | 29.7 MB | 22.6 dB | Recommended β 71% smaller, negligible quality loss |
conv_tasnet_libri2mix.onnx |
FP32 | 101.2 MB | baseline | Full precision reference |
Model Details
| Property | Value |
|---|---|
| Architecture | SepFormer (encoder + Transformer masknet + decoder) |
| Training data | WSJ0-2Mix |
| Sample rate | 8kHz mono |
| Input | mixture β [batch, time] float32 waveform |
| Output | separated β [batch, 2, time] float32 (2 separated sources) |
| ONNX opset | 17 |
| Parameters | ~26M |
| Quantization | Dynamic QUInt8 on MatMul ops (Transformer attention weights) |
Usage (Python)
import onnxruntime as ort
import numpy as np
# Use int8 model (recommended)
sess = ort.InferenceSession("conv_tasnet_libri2mix_int8.onnx")
# Input: mono waveform at 8kHz
mixture = np.random.randn(1, 8000).astype(np.float32) # 1 second
separated = sess.run(None, {"mixture": mixture})[0]
# separated.shape = (1, 2, 8000) β two speaker sources
source_1 = separated[0, 0, :]
source_2 = separated[0, 1, :]
Usage (Rust / ort)
use ort::session::Session;
use ndarray::Array2;
let session = Session::builder()?.commit_from_file("conv_tasnet_libri2mix_int8.onnx")?;
let input = Array2::<f32>::zeros((1, 8000)); // 1s at 8kHz
let outputs = session.run(ort::inputs![input.view()])?;
// outputs[0] shape: [1, 2, 8000]
Integration in Second Brain
Used by core-asr for real-time speaker separation during detected speech overlap:
System audio (16kHz) β overlap detected?
ββ NO β normal single-speaker path
ββ YES β resample 16β8kHz β SepFormer β 2 sources β resample 8β16kHz
β WeSpeaker embedding per source β cluster β per-speaker decode
Auto-downloaded via ensure_moonshine_models() from Mazino0/conv-tasnet-onnx.
Export & Quantization
pip install speechbrain torch onnx onnxruntime
# Export FP32 model
python scripts/export_conv_tasnet.py
# Quantize to int8
python -c "
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic(
'conv_tasnet_libri2mix.onnx',
'conv_tasnet_libri2mix_int8.onnx',
weight_type=QuantType.QUInt8,
op_types_to_quantize=['MatMul'],
)
"
License
The SepFormer model weights are from SpeechBrain (Apache 2.0). WSJ0-2Mix data is from the Wall Street Journal corpus (LDC license).
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support