Where can I try this?
The model looks really good, but I don't have the hardware to run it - I can't find any inference providers, neither here on HF or on OpenRouter - Qwen themselves don't even seem to have this hosted and available?
What type of GPU would I need to rent to self-host this? The full quality model, not quantized.
2 RTX 5090s will be able to fit the full bf16 version.
The model looks really good, but I don't have the hardware to run it - I can't find any inference providers, neither here on HF or on OpenRouter - Qwen themselves don't even seem to have this hosted and available?
What type of GPU would I need to rent to self-host this? The full quality model, not quantized.
FP8 is good enough, which only needs half the GPU you'd need for the full FP16.
FP8 version is 55% the size of BF16 so you would still need 2 RTX 4090s to run that which is not much cheaper to rent than 5090s
3.6 27B just got added to Openrouter https://openrouter.ai/qwen/qwen3.6-27b
sadly, no caching from anyone hosting the full model - including Alibaba
$3.60/M output seems expensive for such a small model?
