Was Q8_0 quantized from original model or is it a copy of -FP8 variant?

by lostmsu - opened Mar 3

Mar 3

Would highly prefer the latter as re-release of the official.

Owner Mar 3

It's from the original model, using FP8 wouldn't work nicely because we'd have to cast it back up to BF16 before quantizing anyways

llama.cpp doesn't have an FP8 quant type, it doesn't map 1:1 to Q8_0

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment