This model wasn't trained with FP4 or NVFP4

by yangus87 - opened 11 days ago

Obviously, this model wasn't trained with FP4 or NVFP4. Its size is half that of the original model, which has FP16 accuracy. If it had been trained or compressed with FP4, it shouldn't weigh more than 20GB. This appears to be FP8 compression, not NVFP4.

LilaRest

11 days ago

Exactly, that's why I further quantized it to 18.4 GB.

Here is the model card of Gemma 4 31B Turbo ⚡️
https://huggingface.co/LilaRest/gemma-4-31B-it-NVFP4-turbo

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment