Please move metal/ and original/ folder in a separate branch
#160
by mratsim - opened
They are tripling the size needed to be downloaded.
I don't think there is a single use-case where you want to infer on the main mxfp4 weights + metal + original
literally this!!
Man I thought i had to download ~65gb to run this model, I used nvidia playbook guides..
it downloads the mess and then some:
All the safetensors files -- total 65GB
Apple metal bin file --- total 65GB
some safetensor bla la -- 10GB 6+ files... I don't what is going on and this was nvidia official instructions of TensorRT-LLM so so stupid .. should have used llama.cpp