Why delete Intel/Qwen3.6-35B-A3B-int4-AutoRound?
Thanks for this Intel/Qwen3.6-27B-int4-AutoRound model, but where is the existingIntel/Qwen3.6-35B-A3B-int4-AutoRound repo? Why deleted? Thanks in advance!
as some users report infinite loop issue, we are re-quantizing it. There is a backup at here https://www.modelscope.ai/models/Intel/Qwen3.6-35B-A3B-int4-AutoRound
as some users report infinite loop issue, we are re-quantizing it. There is a backup at here https://www.modelscope.ai/models/Intel/Qwen3.6-35B-A3B-int4-AutoRound
Desperately waiting for this new version. The previous model was so fast and good already, just had to limit its thinking budget because of infinite looping in reasoning. Intel/Qwen3.6-35B-A3B-int4-AutoRound is my goto model for 24GB single GPUs.
working on it, ETA 1 day
I got an infinate loop (!!!!!!) on this with 27b
Could you share the prompts you used for reproduction?
Recently, I read an evaluation report showing that the quantized version of this model is prone to hitting the max_new_tokens limit, and that linear_attn plays a key role in maintaining performance. However, linear_attn accounts for a large portion of the model, so falling back to higher precision for those layers does not seem like a good tradeoff.
I got an infinate loop (!!!!!!) on this with 27b
Thanks for the extra info. I think your last comment go into a different thread. Since the issue is highly model-specific and not easy to reproduce, I don’t think we have the resources to root-cause it at the moment. For now, I would recommend using the repetition penalty provided by the serving framework or trying alternative quantized versions instead.
working on it, ETA 1 day
Thank you for the work you are putting in!
I think the model was again put up online and taken down?
What is the current timeline in which we can expect a working model of Intel/Qwen3.6-35B-A3B-int4-AutoRound?
Thank you for making it online again.
Were you able to reduce / eliminate the infinite loop issue?
As a small team, we have limited resources for deep evaluation, especially since this issue is not easy to reproduce. Because of this, we heavily rely on community feedback. We’d love for you to give it a try and let us know if you run into any issues!
Thank you for making it online again.
Were you able to reduce / eliminate the infinite loop issue?
I don't think so since the infinite loop is present also in the base fp16 model as reported by some users.
Are we sure that the original qwen model doesn't suffer from the infinite loop?
I'm going to test it today also, if I see any more issues I will provide feedback
Are we sure that the original qwen model doesn't suffer from the infinite loop?
As I said, there reports that the original model also suffers from the same issue. Just limit the reasoning tokens budget to something like extra_body: thinking_token_budget: 8192
Ahh, thank you! I missed your previous message. I guess this thread is closed then. Thank you everyone.