A very impressive model!
I've tested Qwen3-30B-A3B-Mixture-2507 using a somewhat informal code reasoning test which I've devised to help me figure out which models are worth using for my work. This model performs very well, both in terms of speed as well as coming up with the correct answers. It does far better than the Qwen3 models upon which it was based.
I've played around with several of your other models too, but thus far, I like this one the best. Here's what I've tried:
Qwen3-30B-A3B-CoderThinking-YOYO-linear:
I like this one too. It also performs well, both in terms of speed as well as coming up with good answers. (The "Mixture" model provides better answers though, at least in the limited testing that I've done so far.)
Qwen3-30B-A3B-YOYO-V2:
Qwen3-30B-A3B-YOYO-V3:
For both of these, using vllm, I had to start them like this:
VLLM_ATTENTION_BACKEND=DUAL_CHUNK_FLASH_ATTN vllm serve YOYO-AI/Qwen3-30B-A3B-YOYO-V3 -tp 2 --max-model-len 262144 --dtype bfloat16 --enforce-eager
On my hardware (dual NVIDIA RTX Pro 6000 Blackwell), this does not perform well. In fact, changing -tp 2 to -tp 1 might be slightly faster. On my reasoning test, I see 9-10 tps with these models, whereas I often see 90+ tps with your other models. That said, it could be that I just don't know the correct way to start these models.
Regardless, I do appreciate your efforts and can't wait to see what you come up with next!