REAP & Auto Round
can some one do a REAP on the model weights to reduce the memory size
https://github.com/CerebrasResearch/reap
It will be interesting to see how well it can be quantised afterwards also using the intel auto round
can some one do a REAP on the model weights to reduce the memory size
https://github.com/CerebrasResearch/reap
It will be interesting to see how well it can be quantised afterwards also using the intel auto round
which harward are you going to deploy the model on?
I have a mac m2 with 96gb that I can try models on , I wish I had more ram though , I was going to build a pc with 512gb but then ram prices went through the roof so ended up buying a second hand m2 mac to test on for the moment.
I am quite surprised with some good quantisation a lot of the biggest models can fit within that ram size, I'm just pushing the limits at the moment.
for application or for research? it's much more difficult to run this model on a mac m2 with 96gb... the memory is too small, even with doubled machines...
Just wanted to use the model and see if I can get it working, mostly comparing it against other models, just trying to find the ideal model for running on my machine