AWQ 4-bit version of this Opus-Distilled-v2 model?

#10
by celikburak - opened

Hi,
Thank you for your excellent GGUF quantizations.
I'm using Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 (the v2 version with 14k Opus samples). It's currently the best reasoning model I have for coding and agent tasks - shorter CoT, better efficiency than base Qwen3.5-27B.

However, I'm on a single RTX 5090 and really want to run it with vLLM + FlashInfer to get MTP, continuous batching and higher speed.
Would you consider making an AWQ 4-bit version of this Opus-Distilled-v2 model?
The distillation dataset is public, so the data is already available. Many users with 40/50-series cards are waiting for a good AWQ quant of this specific model.
Thanks in advance!

Best regards

Thank you for your support and kind words — I’m really glad you found the model useful!

For the current v2 version, it is trained purely with SFT, so it mainly learns to imitate the teacher model’s behavior, but the actual performance improvement is not very significant. Due to limited training resources at the moment, I haven’t been able to provide additional quantized versions like AWQ 4-bit.

You might want to check the community to see if others have already made an AWQ 4-bit version.

Right now, I’m working on reinforcement learning training and testing, aiming to bring a better version to the community in the future.

Thanks again for your support!

@cpatonn I believe you're the best person to help. Hope you can give it a look!

Sure will do :)

I post the AWQ quantized models as an individual hobbyist and not as cyankiwi, so please visit my profile for the AWQ versions of both the v1 and v2 models :)

Sign up or log in to comment