Any chance/timeline for a q8 version?
#3
by skyrien - opened
This model is awesome, even the q5 version! Though with a RTX 4090, I can't quite run the fp16 version properly, and offloading any layers at all seems to break it entirely.
Any chance for a q8 version?
Be about 8 hours.
PsiPi changed discussion status to closed
This model is awesome, even the q5 version! Though with a RTX 4090, I can't quite run the fp16 version properly, and offloading any layers at all seems to break it entirely.
Any chance for a q8 version?
What sort of tokens/second do you get on the rtx 4090? how large are the images you're passing in?