CoreML BigVGAN decoder

by nhe-ai - opened Jan 11

Jan 11

Thanks for this sf version. I've been improving the original scripts for lower resource machines, and using this .safetensors loaded to MPS. The improvement is great, and also compared to the FP32 which is nonetheless (I guess) putting it to FP16 during inference, since the latent was exactly the same for both models... just one problem, sometimes after several steps/its, the model gets stuck, I don't know if it's due the pressure or how MPS handle it or why, skipping the generation into the next one, works fine again. Maybe it's a problem with the model itself.

Once getting the latent, the heavy thing —for me at least— occurs mostly during the decoding (BigVGAN), I tried converting the model to ONNX, some improvement seems to happen, but there's too unstable, particularly using CoreMLExecutionProvider or mixed with CPUExecutionProvider. And worse, I saw no difference in decoding time between ONNX FP32 and FP16 in timing, and actually closer to the original .bin(PT).

Thus, after many attempts, I could convert the decoder into CoreML to, apparently leverage all units (CPU, GPU, ANE), here's the result and some guidance to use it: https://huggingface.co/nhe-ai/BigVGAN_T50_48k_CoreML

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment