Remove flash-attn from requirements and GPU inference example

by YingxuHe - opened Mar 20

MERaLiON org Mar 20

Remove flash-attn as a required dependency and remove attn_implementation="flash_attention_2" from the GPU inference example.

The model works with PyTorch's built-in SDPA attention which is auto-selected by transformers when flash-attn is not installed.

MERaLiON org Mar 20

Closing — opened as discussion instead of PR by mistake.

YingxuHe changed discussion status to closed Mar 20

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment