Audio-to-Audio
PyTorch
audio
speech
speech-language-models

will there be an int8 version of the decoder ?

#2
by Ryan2025A - opened

Just gave this distilled model a try on a i5(Gen5, AVX2) laptop, the encoding phase is a lot faster indeeed, while the decoding performance is more or less on par with neucodec-onnx-decoder-int8.

Model: neuphonic/neucodec-onnx-decoder-int8 Inference: 9.152593s Audio: 2.30s RTF: 3.979
Model: neuphonic/neucodec-onnx-decoder-int8 Inference: 18.279955s Audio: 6.94s RTF: 2.634
Model: neuphonic/neucodec-onnx-decoder-int8 Inference: 12.904078s Audio: 5.38s RTF: 2.399
Model: neuphonic/neucodec-onnx-decoder-int8 Inference: 22.264509s Audio: 11.48s RTF: 1.939
Model: neuphonic/neucodec-onnx-decoder-int8 Inference: 9.905630s Audio: 3.16s RTF: 3.135
Model: neuphonic/neucodec-onnx-decoder-int8 Inference: 12.576807s Audio: 5.04s RTF: 2.495
Model: neuphonic/neucodec-onnx-decoder-int8 Inference: 14.879936s Audio: 5.30s RTF: 2.808

Model: neuphonic/distill-neucodec Inference: 12.049049s Audio: 4.40s RTF: 2.738
tts error: No valid speech tokens found in the output.
Model: neuphonic/distill-neucodec Inference: 18.939708s Audio: 9.44s RTF: 2.006
Model: neuphonic/distill-neucodec Inference: 9.876328s Audio: 3.96s RTF: 2.494
Model: neuphonic/distill-neucodec Inference: 12.199324s Audio: 4.82s RTF: 2.531
Model: neuphonic/distill-neucodec Inference: 10.960573s Audio: 4.04s RTF: 2.713
tts error: No valid speech tokens found in the output.
Model: neuphonic/distill-neucodec Inference: 16.211850s Audio: 7.46s RTF: 2.173
Model: neuphonic/distill-neucodec Inference: 15.201254s Audio: 7.06s RTF: 2.153
Model: neuphonic/distill-neucodec Inference: 11.448093s Audio: 4.30s RTF: 2.662
Model: neuphonic/distill-neucodec Inference: 26.927220s Audio: 14.58s RTF: 1.847

Another weird things with this distill-neucodec is sometimes an engine instance would result in the random error No valid speech tokenstting found in the output. usually a simple retry is enough to resolve. But other times, an engine instance could work perfectly fine without such issue.

Generating audio for input text: 'You: "No, sir. My... my dog is sick. I need to take him to the vet."'
tts error: No valid speech tokens found in the output.

It would be nice to have a quantized version for a fair comparison.
Please keep up the exceellent work, thanks.

Sign up or log in to comment