gemma-4-26B-A4B-it
Could you throw these tunes (and possibly the Opus finetune) at the 26B MoE model? Allegedly it has much higher performance on strix halo/spark systems than the 31B dense model. Fwiw, the 31B dense model is already outperforming Qwen 3.5.
Separately, and this may be a local optimization problem, but gemma-4 absolutely crawls once going over the 30k input token range in a way that other models like GPT-OSS /Qwen3.5 didn't. Haven't had a chance to test the vanilla gemma4, so this is more of an anecdotal "FYI" than a bug report.
Working on a Gemma 4 19B-A4B right now [fine tune] ; will be out later today providing it "pans out" okay.
26B / 21B - A4Bs are on the horizon.
RE: Crawls ; make sure your llamacpp / ai app is up to date ; as there have been major changes to address these issues in the past few days.