Is this a merge?
#3
by mrfakename - opened
Hi,
Is this a merge or pretrained model?
At its core, GemMoE comprises 8 separately fine-tuned Gemma models, with 2 experts per token
thanks! i assume this means that each expert was finetuned, then merged?
combine them using a hidden gate with a heavily modified version of mergekit, a tool developed by the brilliant Charles Goddard.
Ah, makes sense. Thanks for the clarification!!
mrfakename changed discussion status to closed