This model wasn't trained with FP4 or NVFP4

#8
by yangus87 - opened

Obviously, this model wasn't trained with FP4 or NVFP4. Its size is half that of the original model, which has FP16 accuracy. If it had been trained or compressed with FP4, it shouldn't weigh more than 20GB. This appears to be FP8 compression, not NVFP4.

Exactly, that's why I further quantized it to 18.4 GB.

Here is the model card of Gemma 4 31B Turbo ⚡️
https://huggingface.co/LilaRest/gemma-4-31B-it-NVFP4-turbo

Sign up or log in to comment