Remove flash-attn from requirements and GPU inference example
#2
by YingxuHe - opened
Remove flash-attn as a required dependency and remove attn_implementation="flash_attention_2" from the GPU inference example.
The model works with PyTorch's built-in SDPA attention which is auto-selected by transformers when flash-attn is not installed.
YingxuHe changed pull request status to merged