Mixture of Experts (MoE)
Collection
Sometimes I finetune models specifically to take on expert roles in a MoE configuration, sometimes I find interesting models others have fine tuned. • 8 items • Updated
An 18B parameter Mixture of Experts model combining 8 specialized 3B experts, with 2 experts activated per token by default (configurable up to 4 at inference).
For more information about this model, including access to the safetensor files, please see theprint/theprint-moe-8x3-0126.
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Base model
theprint/theprint-moe-8x3-0126