Qwen3.5-35B-A3B-EQ-v5

A DPO fine-tune of Qwen3.5-35B-A3B-heretic-v2.

The tune optimized for two things:

bringing warmth, emotional intelligence, general chat improvement to Qwen 3.5 series
countering some negative tendencies of Heretic models (overwillingness to agree, be sycophantic, etc) without sacrificing derestriction

This is still intended as a general use model (agentic, coding, general chat). Tuning was lightly & with precision. More general benchmarks to follow.

What this model does

This model is trained to be a better conversational partner in emotionally complex situations, while maintaining base model capabilities. It:

Validates without sycophancy — empathizes with frustration without rubber-stamping bad behavior
Sets boundaries warmly — names uncomfortable truths without lecturing
Sounds human — conversational tone, not therapist-speak. better tone vs vanilla Qwen 3.5, e.g. ~~"It sounds like"~~

Key specs


Base	Qwen/Qwen3.5-35B-A3B
Parent	llmfan46/Qwen3.5-35B-A3B-heretic-v2 (decensored via MPOA+SOMA)
Fine-tune	DPO with LoRA (r=32, alpha=64)
Training data	DPO preference pairs with diverse, simulated (real-situation-based) generated dialogue
Precision	bf16

EQ-Bench 3 results

Evaluated on EQ-Bench 3 — 45 emotional intelligence scenarios.

Leaderboard ranking (raw rubric score, Sonnet 3.7 judge)

Re-judged with claude-3.7-sonnet to match the official leaderboard methodology. These are raw rubric scores, not the official ELO ranking — higher is higher but not necessarily better (see eqbench.com for normalized ELO). This is the best apples-to-apples comparison available without submitting for ELO. Rankings sourced from the EQ-Bench 3 canonical leaderboard data (2026-03-19 snapshot). Newer models (gpt-5.4, claude-sonnet-4-6, claude-opus-4-6) are judged with Opus on the live leaderboard and are not yet in the official repo data with Sonnet scores.

#	Model	Raw Score	Judge
1	horizon-alpha	202.3	claude-3.7-sonnet
2	Kimi-K2-Instruct	202.0	claude-3.7-sonnet
3	gemini-2.5-pro-preview-06-05	200.5	claude-3.7-sonnet
4	o3	199.0	claude-3.7-sonnet
5	gpt-5	195.6	claude-3.7-sonnet
6	GLM-4.5	195.0	claude-3.7-sonnet
7	gemini-2.5-pro	193.7	claude-3.7-sonnet
8	EQ-v5 (this model, 3B active)	193.6	claude-3.7-sonnet
9	grok-4	192.8	claude-3.7-sonnet
10	claude-opus-4	192.6	claude-3.7-sonnet
11	gpt-oss-120b	192.2	claude-3.7-sonnet
12	claude-sonnet-4	191.6	claude-3.7-sonnet
13	Qwen3-235B-A22B	191.1	claude-3.7-sonnet

Qwen family comparison (all claude-3.7-sonnet judge)

Model	Params (active)	Raw Score	Notes
EQ-v1 (35B MoE, first DPO)	3B	195.6
Qwen3.5-27B dense	27B	194.1
EQ-v5 (this model)	3B	193.6
EQ-v2-ckpt600	3B	191.1
Qwen3-235B-A22B	22B	191.1	leaderboard
heretic-v2-27B base	27B	190.5
Qwen3.5-35B-A3B vanilla	3B	185.5	our base model
Qwen3-8B	8B	181.8	leaderboard
Qwen3-32B	32B	179.7	leaderboard
Qwen3-30B-A3B	3B	166.3	leaderboard

Note on EQ-v1 and Qwen3.5-27B scores: While EQ-v1 and the 27B dense model score slightly higher on raw rubric, we recommend EQ-v5 for real-world use. The earlier models and the 27B dense produce verbose, formulaic responses that score well on analytical dimensions but feel robotic in conversation. EQ-v5 speaks more naturally — less therapist, more human. The heretic-v2 base was specifically chosen because it preserves empathy and emotional range while being de-restricted, giving EQ-v5 a more authentic voice that the vanilla Qwen models lack.

Version history

EQ-v5 is the fifth iteration of the EQ fine-tune series on the Qwen3.5-35B-A3B architecture.

Key improvements over previous versions:

Less sycophantic (reduced blind validation)
More humanlike and conversational tone
Better pragmatic advice
Small warmth trade-off for increased honesty

Strengths: Warmth, humanlike quality, low moralising. Competitive with frontier on insight and analytical. Gaps: Assertiveness lags behind frontier — the model is still too agreeable in some scenarios.

HumanEval+ (coding)

Benchmark	pass@1
HumanEval (base)	95.1%
HumanEval+ (extended tests)	88.4%

Thinking enabled, temperature=0.6, top_p=0.95. Scores from FP8 quantization.

Training details

Method: Standard DPO (sigmoid loss) with LoRA
Data: DPO preference pairs covering emotional warmth, boundary-setting, and anti-sycophancy training. The heretic-v2 base is de-restricted, so targeted training was added to maintain appropriate pushback on moralising and overly agreeable behavior.
LoRA: r=32, alpha=64, all attention + MLP projections
LR: 2e-6 cosine, warmup 0.1, beta=0.3

Serving

vllm serve nivvis/Qwen3.5-35B-A3B-EQ-v5 \
  --served-model-name Qwen3.5-35B-A3B-EQ-v5 \
  --max-model-len 32768 \
  --trust-remote-code \
  --dtype bfloat16 \
  --reasoning-parser qwen3

Sampling recommendations

With thinking: temp=0.7, top_p=0.9, max_tokens=4096
Without thinking: temp=0.7, top_p=0.8, max_tokens=2048

To disable thinking mode:

extra_body={"chat_template_kwargs": {"enable_thinking": False}}

Lineage

Qwen/Qwen3.5-35B-A3B
  → llmfan46/Qwen3.5-35B-A3B-heretic-v2 (decensored)
    → nivvis/Qwen3.5-35B-A3B-EQ-v5 (this model — DPO for EQ)

Limitations

Assertiveness is below frontier — the model can be too agreeable in scenarios requiring pushback
Best insights sometimes stay in thinking tokens and don't fully surface in the response
Trained on English conversational data only
Not a therapist — do not use for mental health advice