will there be an int8 version of the decoder ?

by Ryan2025A - opened Mar 10

Ryan2025A

Mar 10

Just gave this distilled model a try on a i5(Gen5, AVX2) laptop, the encoding phase is a lot faster indeeed, while the decoding performance is more or less on par with neucodec-onnx-decoder-int8.

Model: neuphonic/neucodec-onnx-decoder-int8 Inference: 9.152593s Audio: 2.30s RTF: 3.979
Model: neuphonic/neucodec-onnx-decoder-int8 Inference: 18.279955s Audio: 6.94s RTF: 2.634
Model: neuphonic/neucodec-onnx-decoder-int8 Inference: 12.904078s Audio: 5.38s RTF: 2.399
Model: neuphonic/neucodec-onnx-decoder-int8 Inference: 22.264509s Audio: 11.48s RTF: 1.939
Model: neuphonic/neucodec-onnx-decoder-int8 Inference: 9.905630s Audio: 3.16s RTF: 3.135
Model: neuphonic/neucodec-onnx-decoder-int8 Inference: 12.576807s Audio: 5.04s RTF: 2.495
Model: neuphonic/neucodec-onnx-decoder-int8 Inference: 14.879936s Audio: 5.30s RTF: 2.808

Model: neuphonic/distill-neucodec Inference: 12.049049s Audio: 4.40s RTF: 2.738
tts error: No valid speech tokens found in the output.
Model: neuphonic/distill-neucodec Inference: 18.939708s Audio: 9.44s RTF: 2.006
Model: neuphonic/distill-neucodec Inference: 9.876328s Audio: 3.96s RTF: 2.494
Model: neuphonic/distill-neucodec Inference: 12.199324s Audio: 4.82s RTF: 2.531
Model: neuphonic/distill-neucodec Inference: 10.960573s Audio: 4.04s RTF: 2.713
tts error: No valid speech tokens found in the output.
Model: neuphonic/distill-neucodec Inference: 16.211850s Audio: 7.46s RTF: 2.173
Model: neuphonic/distill-neucodec Inference: 15.201254s Audio: 7.06s RTF: 2.153
Model: neuphonic/distill-neucodec Inference: 11.448093s Audio: 4.30s RTF: 2.662
Model: neuphonic/distill-neucodec Inference: 26.927220s Audio: 14.58s RTF: 1.847

Another weird things with this distill-neucodec is sometimes an engine instance would result in the random error No valid speech tokenstting found in the output. usually a simple retry is enough to resolve. But other times, an engine instance could work perfectly fine without such issue.

Generating audio for input text: 'You: "No, sir. My... my dog is sick. I need to take him to the vet."'
tts error: No valid speech tokens found in the output.

It would be nice to have a quantized version for a fair comparison.
Please keep up the exceellent work, thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment