Why delete Intel/Qwen3.6-35B-A3B-int4-AutoRound?

by bgeneto - opened Apr 24

Apr 24

Thanks for this Intel/Qwen3.6-27B-int4-AutoRound model, but where is the existingIntel/Qwen3.6-35B-A3B-int4-AutoRound repo? Why deleted? Thanks in advance!

wenhuach

Intel org Apr 25

as some users report infinite loop issue, we are re-quantizing it. There is a backup at here https://www.modelscope.ai/models/Intel/Qwen3.6-35B-A3B-int4-AutoRound

bgeneto

27 days ago

as some users report infinite loop issue, we are re-quantizing it. There is a backup at here https://www.modelscope.ai/models/Intel/Qwen3.6-35B-A3B-int4-AutoRound

Desperately waiting for this new version. The previous model was so fast and good already, just had to limit its thinking budget because of infinite looping in reasoning. Intel/Qwen3.6-35B-A3B-int4-AutoRound is my goto model for 24GB single GPUs.

wenhuach

Intel org 26 days ago

working on it, ETA 1 day

NoNamesLeft1

14 days ago

I got an infinate loop (!!!!!!) on this with 27b

wenhuach

Intel org 14 days ago

Could you share the prompts you used for reproduction?

Recently, I read an evaluation report showing that the quantized version of this model is prone to hitting the max_new_tokens limit, and that linear_attn plays a key role in maintaining performance. However, linear_attn accounts for a large portion of the model, so falling back to higher precision for those layers does not seem like a good tradeoff.

wenhuach

Intel org 11 days ago

•

edited 11 days ago

I got an infinate loop (!!!!!!) on this with 27b

Thanks for the extra info. I think your last comment go into a different thread. Since the issue is highly model-specific and not easy to reproduce, I don’t think we have the resources to root-cause it at the moment. For now, I would recommend using the repetition penalty provided by the serving framework or trying alternative quantized versions instead.

Mayank22201

8 days ago

working on it, ETA 1 day

Thank you for the work you are putting in!
I think the model was again put up online and taken down?
What is the current timeline in which we can expect a working model of Intel/Qwen3.6-35B-A3B-int4-AutoRound?

wenhuach

Intel org 8 days ago

https://huggingface.co/Intel/Qwen3.6-35B-A3B-int4-mixed-AutoRound

Mayank22201

4 days ago

Thank you for making it online again.

Were you able to reduce / eliminate the infinite loop issue?

wenhuach

Intel org 4 days ago

As a small team, we have limited resources for deep evaluation, especially since this issue is not easy to reproduce. Because of this, we heavily rely on community feedback. We’d love for you to give it a try and let us know if you run into any issues!

bgeneto

4 days ago

Thank you for making it online again.

Were you able to reduce / eliminate the infinite loop issue?

I don't think so since the infinite loop is present also in the base fp16 model as reported by some users.

Mayank22201

4 days ago

Are we sure that the original qwen model doesn't suffer from the infinite loop?

ep150de

4 days ago

I'm going to test it today also, if I see any more issues I will provide feedback

bgeneto

4 days ago

Are we sure that the original qwen model doesn't suffer from the infinite loop?

As I said, there reports that the original model also suffers from the same issue. Just limit the reasoning tokens budget to something like extra_body: thinking_token_budget: 8192

bgeneto changed discussion status to closed 4 days ago

bgeneto changed discussion status to open 4 days ago

Mayank22201

4 days ago

Ahh, thank you! I missed your previous message. I guess this thread is closed then. Thank you everyone.

bgeneto changed discussion status to closed 4 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment