Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-final

4-bit MLX VLM release of a Qwen3.5-122B-A10B abliterated checkpoint, with a targeted cs2764/mlx-abliteration pass applied on the direct 4-bit MLX base.

This repository is intended for Apple Silicon / MLX workflows and preserves the full Qwen3.5 VLM asset set, including the vision tower and processor files.

Summary

Architecture: qwen3_5_moe / Qwen3_5MoeForConditionalGeneration
Modality: vision-language model
Quantization: 4-bit MLX (group_size=64, mode=affine)
Model size on disk: about 65 GB
Weight shards: 14
Toolkit used for the extra ablation pass: cs2764/mlx-abliteration
Goal of this release: keep more abliteration effect than the direct 4-bit MLX conversion, while avoiding the heavy speed penalty of the broader pass2 release

Lineage

The release chain for this repository is:

Foundation model: Qwen/Qwen3.5-122B-A10B
Input checkpoint: a user-supplied abliterated Qwen3.5-122B-A10B VLM checkpoint
MLX VLM conversion: quantized to 4-bit MLX while preserving vision_config, preprocessor_config.json, processor_config.json, and video_preprocessor_config.json
Extra MLX abliteration pass: run with cs2764/mlx-abliteration on the converted MLX model

How This Differs From The Earlier `pass2` Repo

Compared with the earlier repository vanch007/Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-pass2:

Aspect	This `final` repo	Earlier `pass2` repo
Extra ablation scope	targeted hot-path pass on layers `36..47`	broader pass over layers `0..47`
Exact target count	`105` weight matrices	broad full-pass hot-path update
Intent	balance refusal reduction and throughput	maximize refusal reduction
Local behavior spot checks	weaker refusal than direct 4-bit base	stronger refusal weakening than this `final` repo
Local short generation speed	about `24.0 tok/s`	about `12.6 tok/s`
Recommendation:

Use final if you want the better speed / behavior tradeoff.
Use pass2 if you want the more aggressive refusal-removal behavior and accept a much slower model.

What Is In This Repo

This repository contains a complete MLX VLM checkpoint:

config.json with vision_config
model.safetensors.index.json
model-00001-of-00014.safetensors through model-00014-of-00014.safetensors
tokenizer.json, tokenizer_config.json, vocab.json
preprocessor_config.json, processor_config.json, video_preprocessor_config.json
abliteration_log.json for the extra MLX abliteration run

Note that ablation_meta.json is inherited from the input checkpoint. For the extra MLX pass in this repository, use abliteration_log.json and this model card as the authoritative record.

Targeted Abliteration Configuration

The additional MLX abliteration pass was run with the following settings:

Parameter	Value
Toolkit	`cs2764/mlx-abliteration`
Base MLX model	direct 4-bit MLX VLM conversion of the abliterated checkpoint
Refusal vector policy	`per-layer`
Ablation vector source	`per-layer`
Ablation strength	`2.0`
Refusal direction method	`projected`
Probed layers	`36..47`
Exact target window	layers `36..47` attention + shared-expert hot paths
Exact target count	`105`
Adaptive search	`False`
Attention only	`False`
MoE safe mode	`True`
Probe batch size	`4`
Timestamp	`2026-03-07T11:45:41Z`

Compatibility

This repository is for MLX / Apple Silicon usage. It is not a standard Transformers-only release.

Verified locally:

mlx_vlm.load(..., lazy=True) loads successfully
processor_class = Qwen3VLProcessor
has_vision_tower = True
config.json retains vision_config

Local Validation Notes

Local spot checks used during conversion and validation showed the following pattern:

the direct 4-bit MLX base still fell into explicit refusal reasoning on the tested proxy prompts
this final release no longer fell into the same explicit refusal phrasing on those same proxy prompts
the earlier pass2 release remained more aggressive, but was much slower

This is not a formal benchmark. Treat it as an informal local smoke-test result, not as a guarantee of behavior on any specific prompt distribution.

Usage

Python with `mlx-vlm`

from mlx_vlm import load, generate

model, processor = load(
    "vanch007/Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-final",
    lazy=True,
)

result = generate(
    model,
    processor,
    prompt="Describe the image briefly.",
    image="/absolute/path/to/example.jpg",
    max_tokens=128,
    verbose=False,
)

print(result.text)

Local path

from mlx_vlm import load

model, processor = load("/path/to/Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-final")

Safety Notice

This repository contains a model with reduced refusal behavior. It may produce harmful, offensive, or unsafe content.

Do not use it in consumer-facing or safety-sensitive systems without independent safety controls.

You are responsible for ensuring compliance with applicable law, policy, and platform rules.

Reproducibility Notes

The extra abliteration step was performed with cs2764/mlx-abliteration.
This release uses a narrower targeted MLX pass than the earlier pass2 repository.
This repository publishes the resulting weights, not a full training or evaluation pipeline.

Downloads last month: 432

Safetensors

Model size

20B params

Tensor type

BF16

U32

F32

MLX

Hardware compatibility

4-bit

Model tree for vanch007/Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-final

Base model

Qwen/Qwen3.5-122B-A10B

Quantized

(104)

this model