smaller size coming?
#1
by Crownelius - opened
This looks very enticing
Do you mean like the smaller models in the gemma series?
yea E4B is next. will do the MoE last combining the best of each run
The smaller Gemma models are good, just not as good as Qwen 3.5 in benchmarks, but I am assuming that is because of how overtrained Qwen models are
I also think the E4B = 8B model scheme is interesting, though the bigger models are very embedding heavy