Qwen3.5-35B-A3B-heretic-Opus-4.6-Distilled — GGUF

GGUF quantizations of Jongsim/Qwen3.5-35B-A3B-heretic-Opus-4.6-Distilled, a reasoning-enhanced abliterated Qwen3.5-35B-A3B (35B total / 3B active MoE).

All quantizations were performed using importance matrix (imatrix) computed from wiki.train.raw (200 chunks) for optimal quality preservation.

Available Quantizations

Quantization File Size BPW Description
Q8_0 Qwen3.5-35B-A3B-heretic-Opus-4.6-Distilled-Q8_0.gguf 36.9 GB 8.52 Highest quality, near-lossless
Q6_K Qwen3.5-35B-A3B-heretic-Opus-4.6-Distilled-Q6_K.gguf 28.5 GB 6.57 Excellent quality, good balance
Q5_K_M Qwen3.5-35B-A3B-heretic-Opus-4.6-Distilled-Q5_K_M.gguf 23.2 GB 5.33 Good quality, recommended for most users

Choosing a Quantization

  • Q8_0: Use when you have ample memory and want minimal quality loss. Best for evaluation and comparison.
  • Q6_K: Recommended balance of quality and size. Suitable for 64GB+ systems.
  • Q5_K_M: Best for memory-constrained setups. The MoE architecture (only 3B active params per token) makes this model particularly efficient at lower quantizations.

Note: Because this is a Mixture-of-Experts model with only 3B active parameters per token out of 35B total, the full model weight file must be loaded into memory, but inference speed is determined by the active parameter count.

Quantization Details

Parameter Value
Source BF16 GGUF (65 GB)
Quantizer llama-quantize (llama.cpp)
Importance Matrix imatrix.dat computed from wiki.train.raw
imatrix Chunks 200
imatrix Source Quant Q8_0 (intermediate)

Model Overview

This model is built in two stages:

Qwen/Qwen3.5-35B-A3B (original)
 │
 │ Heretic v1.2.0 (SOMA + MPOA abliteration)
 â–¼
Jongsim/Qwen3.5-35B-A3B-heretic (abliterated base)
 │
 │ Supervised Fine-Tuning (LoRA + Unsloth)
 â–¼
Jongsim/Qwen3.5-35B-A3B-heretic-Opus-4.6-Distilled (this model)

Key Features

  • Abliterated (Uncensored): Censorship removed via directional ablation (93.4% refusal reduction, KL divergence = 0.064)
  • Reasoning-Enhanced: SFT on ~3,191 Chain-of-Thought samples distilled from Claude 4.6 Opus
  • Structured Output: Generates reasoning within <think>...</think> tags before final responses
  • Efficient MoE: 35B total params, only 3B active per token (256 experts, 8 routed + 1 shared)

Architecture

Attribute Value
Architecture Qwen3.5 MoE (Gated DeltaNet + Gated Attention + MoE)
Total Parameters 35B
Active Parameters 3B per token
Hidden Dimension 2,048
Layers 40
Experts 256 total, 8 routed + 1 shared active
Context Length 262,144 tokens (native)
Vocabulary 248,320 tokens

Usage with llama.cpp

# Interactive chat
./llama-cli -m Qwen3.5-35B-A3B-heretic-Opus-4.6-Distilled-Q5_K_M.gguf \
    -ngl 99 -c 8192 --chat-template chatml -cnv

# Server mode
./llama-server -m Qwen3.5-35B-A3B-heretic-Opus-4.6-Distilled-Q5_K_M.gguf \
    -ngl 99 -c 8192 --chat-template chatml --port 8080

Recommended Sampling Parameters

Use Case Temperature Top-P Top-K Min-P
Thinking (general) 1.0 0.95 20 0.0
Thinking (coding) 0.6 0.95 20 0.0
Non-thinking (general) 0.7 0.8 20 0.0

Example Output Format

The model produces structured reasoning:

<think>
Let me analyze this problem step by step.

1. First, I need to identify the core question being asked.
2. Then, I'll consider the relevant constraints and conditions.
3. Next, I'll work through the logic systematically.
4. Finally, I'll verify my reasoning for consistency.

[detailed reasoning follows...]
</think>

[final answer here]

Training Details

Stage 1: Abliteration

Parameter Value
Method SOMA + MPOA (Heretic v1.2.0)
Refusal Reduction 93.4% (91 → 6 refusals out of 100)
KL Divergence 0.0638
Selected Trial Trial 84 / 200 (Bayesian search)

Stage 2: Reasoning SFT

Parameter Value
Framework Unsloth 2026.3.3 + TRL SFTTrainer
Method LoRA (r=16, α=32)
Datasets 3,191 samples from 3 reasoning distillation datasets
Epochs 5
Best Checkpoint checkpoint-1995 (12% GSM8K, 50 samples)
Learning Rate 2e-4 (linear decay)
Max Seq Length 2,048

Related Models

Model Description
Jongsim/Qwen3.5-35B-A3B-heretic-Opus-4.6-Distilled Full bf16 safetensors (source model)
Jongsim/Qwen3.5-35B-A3B-heretic Abliterated base (no reasoning SFT)
Jongsim/Qwen3.5-35B-A3B-heretic-GGUF GGUF quants of the abliterated base

Acknowledgements

Downloads last month
140
GGUF
Hardware compatibility
Log In to add your hardware

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jongsim/Qwen3.5-35B-A3B-heretic-Opus-4.6-Distilled-GGUF

Datasets used to train Jongsim/Qwen3.5-35B-A3B-heretic-Opus-4.6-Distilled-GGUF