Is 158B or 284b params ?

#17
by celsowm - opened

its confusing because on this model card is 158B size but on README is 284b

have no idea, maybe the news fp8 tensors types cheats de hugginface count.

It's 284B. The huggingface count gets confused with compression

Its an 158b-sized 284b model with fp8/fp4-fused weights.

Its an 158b-sized 284b model with fp8/fp4-fused weights.

what does this even mean?? genuine question, "158b-sized 284b model"?

what does this even mean?? genuine question, "158b-sized 284b model"?

This is a natively quantized model with about 158GB weight files which is the same as a standard 158b model in fp8 precision

Its an 158b-sized 284b model with fp8/fp4-fused weights.

what does this even mean?? genuine question, "158b-sized 284b model"?

Theoretically, flash is indeed a 284b parameter model.
However, due to the excellent mixing and quantization design, thanks to the extensive use of the latest fp4 floating point precision, the file size of flash is only as large as a 158b model, so the HF system mistakenly thinks that it only has 158b parameters.

Sign up or log in to comment