Flash Attention and Sage Attention Wheels to use in combination with YanWenKun/ComfyUI-Docker ComfyUI images. The wheels have been built to work with the cu130-slim image in particular, using Cuda 13, Python 3.13, and Pytorch 2.10 on Linux. My system uses a mix of Ampere and Blackwell GPUs.

To use the wheels, place them in the root folder of your container, and update the pre-start script according to your needs. In your docker compose, use the cli-args to either --use-flash-attention or --use-sage-attention

WARNING: The Sage Attention 3 wheel has been built specifically for blackwell gpus. It also uses a different package name than previous sage attention version and cannot be activated by CLI Arguments, only using custom nodes. Many models are still unsupported.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support