Where can I try this?

#21
by mindplay - opened

The model looks really good, but I don't have the hardware to run it - I can't find any inference providers, neither here on HF or on OpenRouter - Qwen themselves don't even seem to have this hosted and available?

What type of GPU would I need to rent to self-host this? The full quality model, not quantized.

2 RTX 5090s will be able to fit the full bf16 version.

The model looks really good, but I don't have the hardware to run it - I can't find any inference providers, neither here on HF or on OpenRouter - Qwen themselves don't even seem to have this hosted and available?

What type of GPU would I need to rent to self-host this? The full quality model, not quantized.

FP8 is good enough, which only needs half the GPU you'd need for the full FP16.

FP8 version is 55% the size of BF16 so you would still need 2 RTX 4090s to run that which is not much cheaper to rent than 5090s

3.6 27B just got added to Openrouter https://openrouter.ai/qwen/qwen3.6-27b

3.6 27B just got added to Openrouter https://openrouter.ai/qwen/qwen3.6-27b

sadly, no caching from anyone hosting the full model - including Alibaba

image

$3.60/M output seems expensive for such a small model?

you can try the model on https://chat.qwen.ai

Sign up or log in to comment