Weight size and VRAM usage much more than the original model

#2
by SteveImmanuel - opened

Hi, your model seems to handle better in not refusing request. I am curious, however, why your final weights are much bigger than the original gpt-oss-120b. Is it because unquantized or maybe your approach does indeed require storing more weights? Thanks

If you quantize it to mxfp4 yourself you shouldn’t have any problems with that

Anyone got mxfp4 of this model? Not gguf I mean.

Sign up or log in to comment