Remove flash-attn from requirements and GPU inference example

by YingxuHe - opened about 1 month ago

←

MERaLiON org about 1 month ago

Remove flash-attn as a required dependency and remove attn_implementation="flash_attention_2" from the GPU inference example.

The model works with PyTorch's built-in SDPA attention which is auto-selected by transformers when flash-attn is not installed.

YingxuHe changed pull request status to merged about 1 month ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment