Qwen3.5-35B-A3B-EQ-v5

A DPO fine-tune of Qwen3.5-35B-A3B-heretic-v2.

The tune optimized for two things:

  • bringing warmth, emotional intelligence, general chat improvement to Qwen 3.5 series
  • countering some negative tendencies of Heretic models (overwillingness to agree, be sycophantic, etc) without sacrificing derestriction

This is still intended as a general use model (agentic, coding, general chat). Tuning was lightly & with precision. More general benchmarks to follow.

What this model does

This model is trained to be a better conversational partner in emotionally complex situations, while maintaining base model capabilities. It:

  • Validates without sycophancy — empathizes with frustration without rubber-stamping bad behavior
  • Sets boundaries warmly — names uncomfortable truths without lecturing
  • Sounds human — conversational tone, not therapist-speak. better tone vs vanilla Qwen 3.5, e.g. "It sounds like"

Key specs

Base Qwen/Qwen3.5-35B-A3B
Parent llmfan46/Qwen3.5-35B-A3B-heretic-v2 (decensored via MPOA+SOMA)
Fine-tune DPO with LoRA (r=32, alpha=64)
Training data DPO preference pairs with diverse, simulated (real-situation-based) generated dialogue
Precision bf16

EQ-Bench 3 results

Evaluated on EQ-Bench 3 — 45 emotional intelligence scenarios.

Leaderboard ranking (raw rubric score, Sonnet 3.7 judge)

Re-judged with claude-3.7-sonnet to match the official leaderboard methodology. These are raw rubric scores, not the official ELO ranking — higher is higher but not necessarily better (see eqbench.com for normalized ELO). This is the best apples-to-apples comparison available without submitting for ELO. Rankings sourced from the EQ-Bench 3 canonical leaderboard data (2026-03-19 snapshot). Newer models (gpt-5.4, claude-sonnet-4-6, claude-opus-4-6) are judged with Opus on the live leaderboard and are not yet in the official repo data with Sonnet scores.

# Model Raw Score Judge
1 horizon-alpha 202.3 claude-3.7-sonnet
2 Kimi-K2-Instruct 202.0 claude-3.7-sonnet
3 gemini-2.5-pro-preview-06-05 200.5 claude-3.7-sonnet
4 o3 199.0 claude-3.7-sonnet
5 gpt-5 195.6 claude-3.7-sonnet
6 GLM-4.5 195.0 claude-3.7-sonnet
7 gemini-2.5-pro 193.7 claude-3.7-sonnet
8 EQ-v5 (this model, 3B active) 193.6 claude-3.7-sonnet
9 grok-4 192.8 claude-3.7-sonnet
10 claude-opus-4 192.6 claude-3.7-sonnet
11 gpt-oss-120b 192.2 claude-3.7-sonnet
12 claude-sonnet-4 191.6 claude-3.7-sonnet
13 Qwen3-235B-A22B 191.1 claude-3.7-sonnet

Qwen family comparison (all claude-3.7-sonnet judge)

Model Params (active) Raw Score Notes
EQ-v1 (35B MoE, first DPO) 3B 195.6
Qwen3.5-27B dense 27B 194.1
EQ-v5 (this model) 3B 193.6
EQ-v2-ckpt600 3B 191.1
Qwen3-235B-A22B 22B 191.1 leaderboard
heretic-v2-27B base 27B 190.5
Qwen3.5-35B-A3B vanilla 3B 185.5 our base model
Qwen3-8B 8B 181.8 leaderboard
Qwen3-32B 32B 179.7 leaderboard
Qwen3-30B-A3B 3B 166.3 leaderboard

Note on EQ-v1 and Qwen3.5-27B scores: While EQ-v1 and the 27B dense model score slightly higher on raw rubric, we recommend EQ-v5 for real-world use. The earlier models and the 27B dense produce verbose, formulaic responses that score well on analytical dimensions but feel robotic in conversation. EQ-v5 speaks more naturally — less therapist, more human. The heretic-v2 base was specifically chosen because it preserves empathy and emotional range while being de-restricted, giving EQ-v5 a more authentic voice that the vanilla Qwen models lack.

Version history

EQ-v5 is the fifth iteration of the EQ fine-tune series on the Qwen3.5-35B-A3B architecture.

Key improvements over previous versions:

  • Less sycophantic (reduced blind validation)
  • More humanlike and conversational tone
  • Better pragmatic advice
  • Small warmth trade-off for increased honesty

Strengths: Warmth, humanlike quality, low moralising. Competitive with frontier on insight and analytical. Gaps: Assertiveness lags behind frontier — the model is still too agreeable in some scenarios.

HumanEval+ (coding)

Benchmark pass@1
HumanEval (base) 95.1%
HumanEval+ (extended tests) 88.4%

Thinking enabled, temperature=0.6, top_p=0.95. Scores from FP8 quantization.

Training details

  • Method: Standard DPO (sigmoid loss) with LoRA
  • Data: DPO preference pairs covering emotional warmth, boundary-setting, and anti-sycophancy training. The heretic-v2 base is de-restricted, so targeted training was added to maintain appropriate pushback on moralising and overly agreeable behavior.
  • LoRA: r=32, alpha=64, all attention + MLP projections
  • LR: 2e-6 cosine, warmup 0.1, beta=0.3

Serving

vllm serve nivvis/Qwen3.5-35B-A3B-EQ-v5 \
  --served-model-name Qwen3.5-35B-A3B-EQ-v5 \
  --max-model-len 32768 \
  --trust-remote-code \
  --dtype bfloat16 \
  --reasoning-parser qwen3

Sampling recommendations

  • With thinking: temp=0.7, top_p=0.9, max_tokens=4096
  • Without thinking: temp=0.7, top_p=0.8, max_tokens=2048

To disable thinking mode:

extra_body={"chat_template_kwargs": {"enable_thinking": False}}

Lineage

Qwen/Qwen3.5-35B-A3B
  → llmfan46/Qwen3.5-35B-A3B-heretic-v2 (decensored)
    → nivvis/Qwen3.5-35B-A3B-EQ-v5 (this model — DPO for EQ)

Limitations

  • Assertiveness is below frontier — the model can be too agreeable in scenarios requiring pushback
  • Best insights sometimes stay in thinking tokens and don't fully surface in the response
  • Trained on English conversational data only
  • Not a therapist — do not use for mental health advice

License

Apache 2.0, following the base Qwen3.5 license.

Downloads last month
1,191
Safetensors
Model size
35B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nivvis/Qwen3.5-35B-A3B-EQ-v5

Adapter
(25)
this model
Quantizations
1 model

Collection including nivvis/Qwen3.5-35B-A3B-EQ-v5