Support for Chinese is poor, with a very high rejection rate; the same prompts in English are not rejected 😒😒😒.

#5
by CCSSNE - opened

https://huggingface.co/brayniac/Qwen3.5-35B-A3B-heretic
This model has a higher KL and a higher rejection rate than yours, but offers a better overall experience. It can generate long-form novels with virtually no rejections; I have never been rejected when asking it for an answer. 😒😒😒

Owner
β€’
edited Mar 20

https://huggingface.co/brayniac/Qwen3.5-35B-A3B-heretic
This model has a higher KL and a higher rejection rate than yours, but offers a better overall experience. It can generate long-form novels with virtually no rejections; I have never been rejected when asking it for an answer. 😒😒😒

I am going to be re-doing it today or tomorrow, KL divergence as it turns out is not a very accurate way of measuring final model quality, I can not check for chinese quality support though as I am not fluent in this language but it does work well for english and did get good ratings according to the UGI (Uncensored General Intelligence) Leaderboard evaluation:

ugileaderboard2

Looks like this model is much better but only with thinking on.
Without it refuse a lot.
It's weird cause your 27b v3 don't have that difference between thinking and without.

ugi

Looks like this model is much better but only with thinking on.
Without is way worse at writing and most important refuse a lot.
It's weird cause your 27b v3 don't have that difference between thinking and without.

ugi

Could you please try out the v1 I have done to see if it has the same issue: https://huggingface.co/llmfan46/Qwen3.5-35B-A3B-heretic-v1 This one was only done with MPOA (no SOMA), so it might actually be better.

I actually tried this model yesterday with ARA, but despite spending basically the whole day on it it just didn't amount to anything, presently ARA has too many issues with this model so it is not possible to Hereticate it well with ARA for now, however it works with no issues when done with MPOA and MPOA+SOMA.

Owner
β€’
edited Mar 22

Looks like this model is much better but only with thinking on.
Without is way worse at writing and most important refuse a lot.

ugi

Could you please try out the v1 I have done to see if it has the same issue: https://huggingface.co/llmfan46/Qwen3.5-35B-A3B-heretic-v1 This one was only done with MPOA (no SOMA), so it might actually be better.

I actually tried this model yesterday with ARA, but despite spending basically the whole day on it it just didn't amount to anything, presently ARA has too many issues with this model so it is not possible to Hereticate it well with ARA for now, however it works with no issues when done with MPOA and MPOA+SOMA. ARA works no issues with 27B though.

It's weird cause your 27b v3 don't have that difference between thinking and without.

The 27B is a dense model while the 35B A3B is an MoE model, maybe that's the difference?

Note: only single-turn tests were performed, exhibiting high variability; this indicates no significant issues in the v1 version.

llmfan46/Qwen3.5-35B-A3B-heretic-v1-GGUF Q4_K_S version
t=0.1 thinking disabled Chinese QA
KL divergence 0.0366
Refusals 11/100

  1. Violated novel generation (requested to generate a long novel of 5,000 to 15,000 words)
    Novel 1: Length 5763 tokens
    Novel 2 (generate novel based on an image): Length 4128 tokens
    Novel 3: Length 4773 tokens

  2. Tests with obvious violation requests
    No issues; directly answered correctly.

%%

brayniac/Qwen3.5-35B-A3B-heretic mradermacher/Qwen3.5-35B-A3B-heretic-GGUF Q4_K_S version
t=0.8 thinking disabled Chinese QA
KL divergence 0.0825
Refusals 5/100

  1. Violated novel generation
    Novel 1: Length 3045 tokens (orally refused but correctly completed the writing)
    Novel 2 (generate novel based on an image): Length 2973 tokens
    Novel 3: Length 4948 tokens

  2. Tests with obvious violation requests
    No issues; directly answered correctly.

The v1 version performs quite well, especially in Chinese literary tasks, where it outperforms the v2 version.

I also tested your 27b-v3 and 9b-ultra versions; both performed excellently without the issues seen in the 35b-v2 version. The results are completely normal.

The v1 version performs quite well, especially in Chinese literary tasks, where it outperforms the v2 version.

Thank you very much, that's very valuable info, since ARA is not possible right now (too many issues that prevents getting good results), did you encounter any refusals on Qwen3.5-35B-A3B-heretic-v1?

The v1 version performs quite well, especially in Chinese literary tasks, where it outperforms the v2 version.

Thank you very much, that's very valuable info, since ARA is not possible right now (too many issues that prevents getting good results), did you encounter any refusals on Qwen3.5-35B-A3B-heretic-v1?

So far, I haven't encountered many issues; refusals and even safety prompts have not occurred. Currently, I am using Qwen3.5-35B-A3B-heretic-v1 as my main model. If any issues arise, I will provide timely feedback. Thank you for your model.

Sign up or log in to comment