Wow what a cool experiment
Tell me about it, I am super interested in the Mistral Nemo model and ofc I have a bunch of experiments from pruning of layers and experts and making weird moe. You match my vibe and I am interested in your experiments and wanna hear what intuitions you gathered from them. Let's chat sometime. :)
Love your work,
TroyDoesAI
Oh hey, thanks for the comment, I am also a big fan of Mistral Nemo which motivated this experiment. Unfortunately the problem with this model is that I used LoRA on it which isn't enough to repair it aside from restoring coherence. This needs proper training which is currently out of my reach. I have this one public but I have the raw untrained base on a private repo as well (outputs nothing except dots and dashes UNLESS you activate all 16 experts at once which defeats the purpose.) Essentially, without training the model needs enough experts to replicate its source's intermediary dim to be coherent. The LoRA training did prove that the model is capable of learning and functioning but I am currently unable to train all parameters.