This model cannot actually be run!

by OrinZ - opened Mar 13

Mar 13

I was very excited to try this. I worked quite awhile trying to work around the quirks of the layers of size=64, using torch_awq, gptqmodel, AutoAWQ, every variation of support libraries I could think of, both Win11 and Ubuntu, with zero luck. Every single attempt ran up against issues that prevent it from running, and the oddly-size layers (not divisible by 64) are my best guess as to what the issue is. VibeVoice is an odd beast, for a language model, it seems.

So, um... any hints? How have folks actually managed to load this thing?

OrinZ

18 days ago

•

edited 18 days ago

From everything I can tell, this model CANNOT be run.

The audio projection and encoder layers were incorrectly altered from FP16, causing a shape mismatch. VibeVoice uses 64-feature layers, but the quantized weights here need to be multiples of 128 for memory alignment. Those layers are relatively small and should simply been excluded from quantization... the kernel literally cannot read the ones in this quantized model and it ALWAYS crashes.

The model is not usable. I'm sorry to be the bearer of bad news here; was really hoping someone would've contradicted me 🤷

OrinZ changed discussion title from How can this model actually be run? to This model cannot actually be run! 18 days ago

lemuriandezapada

Owner 16 days ago

•

edited 16 days ago

hm, it runs on my machine. Then again I did make some extra changes to the vllm since I posted this.
Did you also apply the vllm patches?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment