Is it possible to reduce model size to 16B ?

#1
by thesby - opened

Large model is very hard to run on PC. Is it possible to reduce the model to run on local PC?

Despite doubling the original 26B model to achieve a diverse 48B model. simply compressing this 48B model to 16B is likely to result in a more significant degradation of functionality than compressing the original 26B model to 16B. If you need a 16B MoE model, we recommend trying the REAP model derived from the official model.

We are trying to prune the 26B model to make it smaller.

We are trying to prune the 26B model to make it smaller.

If it’s feasible, would you consider pruning the 31B model down to 14B as well? I think a 14B dense model could be a really good “sweet spot” for practical local use. Thanks!

MoE works even at reduced scale because each expert has its own sub-neural network. In contrast, 31B is a dense model (a single, massive neural network). It requires a different compression method than pruning or merging like MoE, and the most recent example is turbo quant. In other words, it's not going to be easy.

MoE works even at reduced scale because each expert has its own sub-neural network. In contrast, 31B is a dense model (a single, massive neural network). It requires a different compression method than pruning or merging like MoE, and the most recent example is turbo quant. In other words, it's not going to be easy.

Thanks for the answer. I’m not a developer, so I didn’t realize it would be that difficult, but I’m hopeful we’ll see new 14B dense models in the future.

Sign up or log in to comment