Add EXL2, INT8, and/or INT4 version of the model, PLEASE!
#21
by Abdelhak - opened
The model is too big to run for people with less than 24GB. Please, make a quantized version of it.
Abdelhak changed discussion title from Add am EXL2, INT8, and/or INT4 of the model, PLEASE! to Add EXL2, INT8, and/or INT4 version of the model, PLEASE!
It is taking 60GB of ram for me, and taking around 15 minutes to process each prompt, running on CPU. We really need a Quantized version
There is an nf4 version here:
https://huggingface.co/mistralai/Pixtral-12B-2409/discussions/21#66f347780dc1833d4e484073
exllamav2 doesn't support vision fwiw