Hugging Face implementation is very slow during prefill

by Jellyfish042 - opened May 22, 2025

Discussion

Jellyfish042

May 22, 2025

Why is that?

Jellyfish042

May 22, 2025

and the results look very strange

ybelkada

Technology Innovation Institute org May 22, 2025

Hi @Jellyfish042
Thanks for the issue ! I think you might be running the non-Mamba kernel path, we need to upstream a fix on HF transformers @DhiyaEddine
Meanwhile can you try to run the model on GPU and make sure to install mamba-ssm and causal-conv1d ? pip install mamba-ssm causal-conv1d

DhiyaEddine

Technology Innovation Institute org May 22, 2025

Already on it !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment