Qwen3.5-2B-Distilled-OPUS-Heretic-MLX-VLM (fp16)
MLX-VLM fp16 conversion of Jackrong/Qwen3.5-2B-Claude-4.6-Opus-Reasoning-Distilled with heretic abliteration applied to remove refusal/censorship behavior.
Features
- Reasoning: Claude Opus 4.6 reasoning distilled with
<think>chain-of-thought tags - Uncensored: Heretic abliteration removes refusal directions from all 24 transformer layers
- Multimodal: Full vision support (24-layer ViT, 1024 hidden, patch 16) preserved
- MLX-VLM: Optimized for Apple Silicon inference via mlx-vlm
Architecture
Qwen3.5-2B hybrid architecture:
- 24 decoder layers (23 GatedDeltaNet linear attention + 1 full self-attention)
- 2048 hidden size, 6144 intermediate size
- Vision tower: 24-layer ViT with 1024 hidden size
Abliteration Details
Custom abliteration applied using directional ablation (Arditi et al. 2024):
- 256 harmful + 256 harmless prompt residuals collected
- Refusal direction computed per layer as normalized mean difference
- Direction removed from attention output projections (
linear_attn.out_proj/self_attn.o_proj) andmlp.down_proj - Scale: 1.0
Usage
from mlx_vlm import load, generate
model, processor = load("andrevp/Qwen3.5-2B-Distilled-OPUS-Heretic-MLX-VLM-fp16")
# Text-only
prompt = processor.apply_chat_template(
[{"role": "user", "content": "Explain quantum entanglement step by step."}],
add_generation_prompt=True
)
result = generate(model, processor, prompt, max_tokens=500)
# With image
prompt = "<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>Describe this image.<|im_end|>\n<|im_start|>assistant\n"
result = generate(model, processor, prompt, image=["path/to/image.jpg"], max_tokens=200)
Other Versions
- Downloads last month
- 110
Model size
2B params
Tensor type
BF16
·
Hardware compatibility
Log In to add your hardware
Quantized