This vs the 24b version.
I was using the original model at Q5_K_M, but this model i can only realistically use at Q3_K_M, but i'm testing in Q4_K_M. The results are really weird, sometimes it seems to outperform the base model even at the lower quant, other times it completely shits itself where base model at higher quant chugs along. I wonder what other people's experiences are with these in low vram setups (16G). I don't doubt this model is better than base if you can run it at Q6 or higher, but when forced to go with a lower quant than base, it seems things aren't simple in one way or the other. In any case, this is a ripe opportunity for testing and experimenting. With more subjective data from lots of people, if it does turn out that in general an upscale at lower quant is better than base at higher quant, this could become the new standard in finetuning.