Qwen3.5-2B-Distilled-OPUS-Heretic-MLX-VLM (fp16)

MLX-VLM fp16 conversion of Jackrong/Qwen3.5-2B-Claude-4.6-Opus-Reasoning-Distilled with heretic abliteration applied to remove refusal/censorship behavior.

Features

Reasoning: Claude Opus 4.6 reasoning distilled with <think> chain-of-thought tags
Uncensored: Heretic abliteration removes refusal directions from all 24 transformer layers
Multimodal: Full vision support (24-layer ViT, 1024 hidden, patch 16) preserved
MLX-VLM: Optimized for Apple Silicon inference via mlx-vlm

Architecture

Qwen3.5-2B hybrid architecture:

24 decoder layers (23 GatedDeltaNet linear attention + 1 full self-attention)
2048 hidden size, 6144 intermediate size
Vision tower: 24-layer ViT with 1024 hidden size

Abliteration Details

Custom abliteration applied using directional ablation (Arditi et al. 2024):

256 harmful + 256 harmless prompt residuals collected
Refusal direction computed per layer as normalized mean difference
Direction removed from attention output projections (linear_attn.out_proj / self_attn.o_proj) and mlp.down_proj
Scale: 1.0

Usage

from mlx_vlm import load, generate

model, processor = load("andrevp/Qwen3.5-2B-Distilled-OPUS-Heretic-MLX-VLM-fp16")

# Text-only
prompt = processor.apply_chat_template(
    [{"role": "user", "content": "Explain quantum entanglement step by step."}],
    add_generation_prompt=True
)
result = generate(model, processor, prompt, max_tokens=500)

# With image
prompt = "<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>Describe this image.<|im_end|>\n<|im_start|>assistant\n"
result = generate(model, processor, prompt, image=["path/to/image.jpg"], max_tokens=200)

Other Versions

fp16 (this model, 4.2 GB)
8-bit (2.5 GB)
4-bit (1.6 GB)

Downloads last month: 110

Safetensors

Model size

2B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for andrevp/Qwen3.5-2B-Distilled-OPUS-Heretic-MLX-VLM-fp16

Base model

Jackrong/Qwen3.5-2B-Claude-4.6-Opus-Reasoning-Distilled

Finetuned

(4)

this model