Is it possible to reduce model size to 16B ?

by thesby - opened 10 days ago

Discussion

thesby

10 days ago

Large model is very hard to run on PC. Is it possible to reduce the model to run on local PC?

testamentaddress01

8 days ago

Despite doubling the original 26B model to achieve a diverse 48B model. simply compressing this 48B model to 16B is likely to result in a more significant degradation of functionality than compressing the original 26B model to 16B. If you need a 16B MoE model, we recommend trying the REAP model derived from the official model.

huihui-ai

Owner 8 days ago

We are trying to prune the 26B model to make it smaller.

NewOneZ

8 days ago

We are trying to prune the 26B model to make it smaller.

If it’s feasible, would you consider pruning the 31B model down to 14B as well? I think a 14B dense model could be a really good “sweet spot” for practical local use. Thanks!

testamentaddress01

8 days ago

MoE works even at reduced scale because each expert has its own sub-neural network. In contrast, 31B is a dense model (a single, massive neural network). It requires a different compression method than pruning or merging like MoE, and the most recent example is turbo quant. In other words, it's not going to be easy.

NewOneZ

8 days ago

MoE works even at reduced scale because each expert has its own sub-neural network. In contrast, 31B is a dense model (a single, massive neural network). It requires a different compression method than pruning or merging like MoE, and the most recent example is turbo quant. In other words, it's not going to be easy.

Thanks for the answer. I’m not a developer, so I didn’t realize it would be that difficult, but I’m hopeful we’ll see new 14B dense models in the future.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment