Great model quality, but desperately needs universal quantization/optimization (took 35 mins

#3
by Spawn - opened

The image quality from this model is truly impressive, but the generation time is practically unusable right now. It took me about 35 minutes to generate a single image.

I really appreciate the effort to port this and make it compatible with ComfyUI. However, we desperately need universally compatible quantization or compute optimizations, rather than implementations that only benefit specific, latest-generation NVIDIA GPUs.

I am currently running an AMD Radeon 7900 XTX with 64GB of system RAM, and the current FP8 implementation causes massive memory swapping and severe bottlenecking. It would be amazing to see a more universally supported quantization format (like standard INT8 or GGUF) so that users with diverse hardware setups can actually utilize this fantastic model.
BitDance_00001_

BitDance_00002_

bitgpu

Sign up or log in to comment