It says flash attention 2 is not available for this model. But I was thinking maybe I could only use FA2 for the gemma part, do you think this is possible ?
@edmond you can check out TGI's flash PaliGemma implementation here it's implemented for vision head but doesn't have as much effect as flash in decoder