Qwen3.5-35B-A3B-heretic-v2-Opus-4.6-Distilled

A reasoning-enhanced, abliterated Qwen3.5-35B-A3B MoE model (35B total / 3B active parameters). Built on top of llmfan46/Qwen3.5-35B-A3B-heretic-v2, fine-tuned on high-quality Chain-of-Thought reasoning traces distilled from Claude Opus 4.6 and Claude Opus 4.5, with LoRA merged at epoch 3 in bf16 precision.

The model produces structured reasoning within <think>...</think> tags before delivering final responses.

Training Pipeline

Qwen/Qwen3.5-35B-A3B (original)
 │
 │  Heretic v1.2.0 (SOMA + MPOA abliteration, v2 config)
 ▼
llmfan46/Qwen3.5-35B-A3B-heretic-v2 (abliterated base, by llmfan46)
 │
 │  LoRA SFT with Unsloth (epoch 3 / 5 merged)
 ▼
Jongsim/Qwen3.5-35B-A3B-heretic-v2-Opus-4.6-Distilled (this model)

Architecture

Property	Value
Architecture	Qwen3.5 MoE (Gated DeltaNet + Gated Attention + MoE)
Total Parameters	35B
Active Parameters	~3B per token
Hidden Dimension	2048
Layers	40 (10 repeating blocks: 3× DeltaNet-MoE + 1× Attention-MoE)
Experts	256 total, 8 routed + 1 shared active
Expert Intermediate Dim	512
Context Length	262,144 tokens (native)
Precision	bf16
Vocabulary	248,320 tokens

Fine-Tuning Details

LoRA Configuration

Parameter	Value
Framework	Unsloth + PEFT
Method	LoRA (Low-Rank Adaptation)
Rank (r)	16
Alpha	32
Dropout	0.0
Target Modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`, `gate_up_proj`

Training Configuration

Parameter	Value
Trainer	SFTTrainer (`train_on_responses_only`)
Optimizer	AdamW 8-bit
Learning Rate	2e-5
LR Scheduler	Cosine
Batch Size	1 (per device)
Gradient Accumulation	8
Effective Batch Size	8
Max Sequence Length	2,048 tokens
Warmup Ratio	0.03
Total Epochs	5 (merged at epoch 3)
Steps per Epoch	1,603
Merged Checkpoint	Step 4,809 (epoch 3)
Precision	bf16
Hardware	NVIDIA DGX Spark (GB10 Blackwell GPU, 128GB unified memory)

Training Loss

Epoch	Step	Train Loss
1	1,603	0.3792
2	3,206	0.3602
3 (merged)	4,809	0.1715
4	6,412	0.1530
5	8,015	0.1490

Epoch 3 was selected for merging as it shows significant convergence (loss dropped from 0.36 → 0.17) while avoiding potential overfitting from later epochs (diminishing returns: epoch 4→5 only 0.004 improvement).

Training Datasets

Dataset	Rows	Description
nohurry/Opus-4.6-Reasoning-3000x-filtered	2,308	Claude Opus 4.6 reasoning traces (filtered)
TeichAI/claude-4.5-opus-high-reasoning-250x	250	Claude Opus 4.5 high-quality reasoning
Jackrong/Qwen3.5-reasoning-700x	633	Qwen3.5 reasoning examples
Roman1111111/claude-opus-4.6-10000x	9,631	Claude Opus 4.6 large-scale reasoning
Total	12,822

All datasets use the ChatML conversation format with <think>...</think> reasoning blocks in assistant responses.

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Jongsim/Qwen3.5-35B-A3B-heretic-v2-Opus-4.6-Distilled"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="bfloat16", device_map="auto")

messages = [
    {"role": "system", "content": "You are a helpful assistant. Think step by step."},
    {"role": "user", "content": "Explain the proof that there are infinitely many prime numbers."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="Jongsim/Qwen3.5-35B-A3B-heretic-v2-Opus-4.6-Distilled", dtype="bfloat16")
params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=2048)

messages = [{"role": "user", "content": "Solve this step by step: What is 23 * 47?"}]
output = llm.chat(messages, sampling_params=params)
print(output[0].outputs[0].text)

Abliteration (Stage 0)

The base model (llmfan46/Qwen3.5-35B-A3B-heretic-v2) was created by llmfan46 using Heretic v1.2.0:

SOMA (Self-Organizing Map Abliteration): 4×4 SOM discovering multiple refusal directions, top-4 ablated
MPOA (Magnitude-Preserving Orthogonal Ablation): Projected ablation with row normalization (rank 3)
Bayesian optimization: 200 Optuna trials for optimal hyperparameters

License

This model inherits the Apache 2.0 license from the base Qwen3.5-35B-A3B model.

Acknowledgments

Qwen Team for Qwen3.5-35B-A3B
llmfan46 for the abliterated heretic-v2 base model
Heretic by Philipp Emanuel Weidmann for abliteration
Unsloth for efficient LoRA fine-tuning
Dataset creators: nohurry, TeichAI, Jackrong, Roman1111111

Downloads last month: 574

Safetensors

Model size

35B params

Tensor type

F32

BF16

Model tree for Jongsim/Qwen3.5-35B-A3B-heretic-v2-Opus-4.6-Distilled

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Finetuned

llmfan46/Qwen3.5-35B-A3B-ultra-uncensored-heretic

Adapter

(1)

this model

Adapters

1 model

Jongsim
/

Qwen3.5-35B-A3B-heretic-v2-Opus-4.6-Distilled