Hugging Face implementation is very slow during prefill
#2
by Jellyfish042 - opened
Why is that?
and the results look very strange
Hi @Jellyfish042
Thanks for the issue ! I think you might be running the non-Mamba kernel path, we need to upstream a fix on HF transformers @DhiyaEddine
Meanwhile can you try to run the model on GPU and make sure to install mamba-ssm and causal-conv1d ? pip install mamba-ssm causal-conv1d
Already on it !