Was Q8_0 quantized from original model or is it a copy of -FP8 variant?
#3
by lostmsu - opened
Would highly prefer the latter as re-release of the official.
It's from the original model, using FP8 wouldn't work nicely because we'd have to cast it back up to BF16 before quantizing anyways
llama.cpp doesn't have an FP8 quant type, it doesn't map 1:1 to Q8_0