Weight size and VRAM usage much more than the original model
#2
by SteveImmanuel - opened
Hi, your model seems to handle better in not refusing request. I am curious, however, why your final weights are much bigger than the original gpt-oss-120b. Is it because unquantized or maybe your approach does indeed require storing more weights? Thanks
If you quantize it to mxfp4 yourself you shouldn’t have any problems with that
Anyone got mxfp4 of this model? Not gguf I mean.