Q8 Version

#2
by Hunterx - opened

Hia thanks for bringing this model some more reasoning.

testing on a 5090 at q8_0
sampling : temp0, topk20, repeat1 ,top p 0.95min p0,

This is the most coherent 27b model even better than some 200b models now

image

Thank you for your support!

How many tokens / s you have on your hardware?

Sign up or log in to comment