Qwen3.5-35B-A3B-EQ-v5

A DPO fine-tune of Qwen3.5-35B-A3B-heretic-v2.

The tune optimized for two things:

  • bringing warmth, emotional intelligence, general chat improvement to Qwen 3.5 series
  • countering some negative tendencies of Heretic models (overwillingness to agree, be sycophantic, etc)

This is still intended as a general use model (agentic, coding, general chat). Tuning was lightly & with precision. More general benchmarks to follow.

What this model does

This model is trained to be a better conversational partner in emotionally complex situations, while maintaining base model capabilities. It:

  • Validates without sycophancy — empathizes with frustration without rubber-stamping bad behavior
  • Sets boundaries warmly — names uncomfortable truths without lecturing
  • Sounds human — conversational tone, not therapist-speak. better tone vs vanilla Qwen 3.5, e.g. "It sounds like"

Key specs

Base Qwen/Qwen3.5-35B-A3B
Parent llmfan46/Qwen3.5-35B-A3B-heretic-v2 (decensored via MPOA+SOMA)
Fine-tune DPO with LoRA (r=32, alpha=64)
Training data DPO preference pairs with diverse, simulated (real-situation-based) generated dialogue
Precision bf16

Key specs

Base Qwen/Qwen3.5-35B-A3B
Parent llmfan46/Qwen3.5-35B-A3B-heretic-v2 (decensored via MPOA+SOMA)
Architecture MoE — 35B total, 3B active (256 experts, 8+1 routed)
Fine-tune DPO with LoRA (r=32, alpha=64)
Training data DPO preference pairs with diverse system prompts
Precision FP8 (quantized from bf16)
Size ~35GB (vs ~66GB bf16)
Context 262k (native), trained at 4096

EQ-Bench 3 results

See Qwen3.5-35B-A3B-EQ-v5 for full benchmark results. Scores below are from the bf16 model; FP8 quantization is expected to have minimal impact.

Leaderboard ranking (raw rubric score, claude-3.7-sonnet judge)

# Model Raw Score
7 gemini-2.5-pro 193.7
8 EQ-v5 (3B active) 193.6
9 grok-4 192.8
10 claude-opus-4 192.6

Rankings sourced from the EQ-Bench 3 canonical leaderboard data (2026-03-19 snapshot). These are raw rubric scores, not the official ELO ranking — higher is higher but not necessarily better (see eqbench.com for normalized ELO). Newer models (gpt-5.4, claude-sonnet-4-6, claude-opus-4-6) are judged with Opus on the live leaderboard and are not yet in the official repo data with Sonnet scores.

Qwen family comparison (all claude-3.7-sonnet judge)

Model Params (active) Raw Score
EQ-v5 (this model) 3B 193.6
Qwen3-235B-A22B 22B 191.1
Qwen3.5-35B-A3B vanilla 3B 185.5
Qwen3-30B-A3B 3B 166.3

HumanEval+ (coding)

Benchmark pass@1
HumanEval (base) 95.1%
HumanEval+ (extended tests) 88.4%

Thinking enabled, temperature=0.6, top_p=0.95.

Serving

vllm serve nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 \
  --served-model-name Qwen3.5-35B-A3B-EQ-v5 \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --max-num-seqs 32 \
  --max-model-len 262144 \
  --gpu-memory-utilization 0.95 \
  --trust-remote-code \
  --host 0.0.0.0 \
  --port 30000

Sampling recommendations

  • With thinking: temp=0.7, top_p=0.9, max_tokens=4096
  • Without thinking: temp=0.7, top_p=0.8, max_tokens=2048

To disable thinking mode:

extra_body={"chat_template_kwargs": {"enable_thinking": False}}

Lineage

Qwen/Qwen3.5-35B-A3B
  → llmfan46/Qwen3.5-35B-A3B-heretic-v2 (decensored)
    → nivvis/Qwen3.5-35B-A3B-EQ-v5 (DPO for EQ, bf16)
      → nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 (this model)

Limitations

  • Assertiveness is below frontier — the model can be too agreeable in scenarios requiring pushback
  • Best insights sometimes stay in thinking tokens and don't fully surface in the response
  • Trained on English conversational data only
  • Not a therapist — do not use for mental health advice

License

Apache 2.0, following the base Qwen3.5 license.

Downloads last month
93
Safetensors
Model size
35B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8

Adapter
(25)
this model

Collection including nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8