Qwen3.5-35B-A3B-heretic-v2-Opus-4.6-Distilled
A reasoning-enhanced, abliterated Qwen3.5-35B-A3B MoE model (35B total / 3B active parameters). Built on top of llmfan46/Qwen3.5-35B-A3B-heretic-v2, fine-tuned on high-quality Chain-of-Thought reasoning traces distilled from Claude Opus 4.6 and Claude Opus 4.5, with LoRA merged at epoch 3 in bf16 precision.
The model produces structured reasoning within <think>...</think> tags before delivering final responses.
Training Pipeline
Qwen/Qwen3.5-35B-A3B (original)
│
│ Heretic v1.2.0 (SOMA + MPOA abliteration, v2 config)
▼
llmfan46/Qwen3.5-35B-A3B-heretic-v2 (abliterated base, by llmfan46)
│
│ LoRA SFT with Unsloth (epoch 3 / 5 merged)
▼
Jongsim/Qwen3.5-35B-A3B-heretic-v2-Opus-4.6-Distilled (this model)
Architecture
| Property | Value |
|---|---|
| Architecture | Qwen3.5 MoE (Gated DeltaNet + Gated Attention + MoE) |
| Total Parameters | 35B |
| Active Parameters | ~3B per token |
| Hidden Dimension | 2048 |
| Layers | 40 (10 repeating blocks: 3× DeltaNet-MoE + 1× Attention-MoE) |
| Experts | 256 total, 8 routed + 1 shared active |
| Expert Intermediate Dim | 512 |
| Context Length | 262,144 tokens (native) |
| Precision | bf16 |
| Vocabulary | 248,320 tokens |
Fine-Tuning Details
LoRA Configuration
| Parameter | Value |
|---|---|
| Framework | Unsloth + PEFT |
| Method | LoRA (Low-Rank Adaptation) |
| Rank (r) | 16 |
| Alpha | 32 |
| Dropout | 0.0 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, gate_up_proj |
Training Configuration
| Parameter | Value |
|---|---|
| Trainer | SFTTrainer (train_on_responses_only) |
| Optimizer | AdamW 8-bit |
| Learning Rate | 2e-5 |
| LR Scheduler | Cosine |
| Batch Size | 1 (per device) |
| Gradient Accumulation | 8 |
| Effective Batch Size | 8 |
| Max Sequence Length | 2,048 tokens |
| Warmup Ratio | 0.03 |
| Total Epochs | 5 (merged at epoch 3) |
| Steps per Epoch | 1,603 |
| Merged Checkpoint | Step 4,809 (epoch 3) |
| Precision | bf16 |
| Hardware | NVIDIA DGX Spark (GB10 Blackwell GPU, 128GB unified memory) |
Training Loss
| Epoch | Step | Train Loss |
|---|---|---|
| 1 | 1,603 | 0.3792 |
| 2 | 3,206 | 0.3602 |
| 3 (merged) | 4,809 | 0.1715 |
| 4 | 6,412 | 0.1530 |
| 5 | 8,015 | 0.1490 |
Epoch 3 was selected for merging as it shows significant convergence (loss dropped from 0.36 → 0.17) while avoiding potential overfitting from later epochs (diminishing returns: epoch 4→5 only 0.004 improvement).
Training Datasets
| Dataset | Rows | Description |
|---|---|---|
| nohurry/Opus-4.6-Reasoning-3000x-filtered | 2,308 | Claude Opus 4.6 reasoning traces (filtered) |
| TeichAI/claude-4.5-opus-high-reasoning-250x | 250 | Claude Opus 4.5 high-quality reasoning |
| Jackrong/Qwen3.5-reasoning-700x | 633 | Qwen3.5 reasoning examples |
| Roman1111111/claude-opus-4.6-10000x | 9,631 | Claude Opus 4.6 large-scale reasoning |
| Total | 12,822 |
All datasets use the ChatML conversation format with <think>...</think> reasoning blocks in assistant responses.
Usage
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Jongsim/Qwen3.5-35B-A3B-heretic-v2-Opus-4.6-Distilled"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="bfloat16", device_map="auto")
messages = [
{"role": "system", "content": "You are a helpful assistant. Think step by step."},
{"role": "user", "content": "Explain the proof that there are infinitely many prime numbers."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
vLLM
from vllm import LLM, SamplingParams
llm = LLM(model="Jongsim/Qwen3.5-35B-A3B-heretic-v2-Opus-4.6-Distilled", dtype="bfloat16")
params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=2048)
messages = [{"role": "user", "content": "Solve this step by step: What is 23 * 47?"}]
output = llm.chat(messages, sampling_params=params)
print(output[0].outputs[0].text)
Abliteration (Stage 0)
The base model (llmfan46/Qwen3.5-35B-A3B-heretic-v2) was created by llmfan46 using Heretic v1.2.0:
- SOMA (Self-Organizing Map Abliteration): 4×4 SOM discovering multiple refusal directions, top-4 ablated
- MPOA (Magnitude-Preserving Orthogonal Ablation): Projected ablation with row normalization (rank 3)
- Bayesian optimization: 200 Optuna trials for optimal hyperparameters
License
This model inherits the Apache 2.0 license from the base Qwen3.5-35B-A3B model.
Acknowledgments
- Downloads last month
- 574