Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-final

4-bit MLX VLM release of a Qwen3.5-122B-A10B abliterated checkpoint, with a targeted cs2764/mlx-abliteration pass applied on the direct 4-bit MLX base.

This repository is intended for Apple Silicon / MLX workflows and preserves the full Qwen3.5 VLM asset set, including the vision tower and processor files.

Summary

  • Architecture: qwen3_5_moe / Qwen3_5MoeForConditionalGeneration
  • Modality: vision-language model
  • Quantization: 4-bit MLX (group_size=64, mode=affine)
  • Model size on disk: about 65 GB
  • Weight shards: 14
  • Toolkit used for the extra ablation pass: cs2764/mlx-abliteration
  • Goal of this release: keep more abliteration effect than the direct 4-bit MLX conversion, while avoiding the heavy speed penalty of the broader pass2 release

Lineage

The release chain for this repository is:

  1. Foundation model: Qwen/Qwen3.5-122B-A10B
  2. Input checkpoint: a user-supplied abliterated Qwen3.5-122B-A10B VLM checkpoint
  3. MLX VLM conversion: quantized to 4-bit MLX while preserving vision_config, preprocessor_config.json, processor_config.json, and video_preprocessor_config.json
  4. Extra MLX abliteration pass: run with cs2764/mlx-abliteration on the converted MLX model

How This Differs From The Earlier pass2 Repo

Compared with the earlier repository vanch007/Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-pass2:

Aspect This final repo Earlier pass2 repo
Extra ablation scope targeted hot-path pass on layers 36..47 broader pass over layers 0..47
Exact target count 105 weight matrices broad full-pass hot-path update
Intent balance refusal reduction and throughput maximize refusal reduction
Local behavior spot checks weaker refusal than direct 4-bit base stronger refusal weakening than this final repo
Local short generation speed about 24.0 tok/s about 12.6 tok/s
Recommendation:
  • Use final if you want the better speed / behavior tradeoff.
  • Use pass2 if you want the more aggressive refusal-removal behavior and accept a much slower model.

What Is In This Repo

This repository contains a complete MLX VLM checkpoint:

  • config.json with vision_config
  • model.safetensors.index.json
  • model-00001-of-00014.safetensors through model-00014-of-00014.safetensors
  • tokenizer.json, tokenizer_config.json, vocab.json
  • preprocessor_config.json, processor_config.json, video_preprocessor_config.json
  • abliteration_log.json for the extra MLX abliteration run

Note that ablation_meta.json is inherited from the input checkpoint. For the extra MLX pass in this repository, use abliteration_log.json and this model card as the authoritative record.

Targeted Abliteration Configuration

The additional MLX abliteration pass was run with the following settings:

Parameter Value
Toolkit cs2764/mlx-abliteration
Base MLX model direct 4-bit MLX VLM conversion of the abliterated checkpoint
Refusal vector policy per-layer
Ablation vector source per-layer
Ablation strength 2.0
Refusal direction method projected
Probed layers 36..47
Exact target window layers 36..47 attention + shared-expert hot paths
Exact target count 105
Adaptive search False
Attention only False
MoE safe mode True
Probe batch size 4
Timestamp 2026-03-07T11:45:41Z

Compatibility

This repository is for MLX / Apple Silicon usage. It is not a standard Transformers-only release.

Verified locally:

  • mlx_vlm.load(..., lazy=True) loads successfully
  • processor_class = Qwen3VLProcessor
  • has_vision_tower = True
  • config.json retains vision_config

Local Validation Notes

Local spot checks used during conversion and validation showed the following pattern:

  • the direct 4-bit MLX base still fell into explicit refusal reasoning on the tested proxy prompts
  • this final release no longer fell into the same explicit refusal phrasing on those same proxy prompts
  • the earlier pass2 release remained more aggressive, but was much slower

This is not a formal benchmark. Treat it as an informal local smoke-test result, not as a guarantee of behavior on any specific prompt distribution.

Usage

Python with mlx-vlm

from mlx_vlm import load, generate

model, processor = load(
    "vanch007/Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-final",
    lazy=True,
)

result = generate(
    model,
    processor,
    prompt="Describe the image briefly.",
    image="/absolute/path/to/example.jpg",
    max_tokens=128,
    verbose=False,
)

print(result.text)

Local path

from mlx_vlm import load

model, processor = load("/path/to/Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-final")

Safety Notice

This repository contains a model with reduced refusal behavior. It may produce harmful, offensive, or unsafe content.

Do not use it in consumer-facing or safety-sensitive systems without independent safety controls.

You are responsible for ensuring compliance with applicable law, policy, and platform rules.

Reproducibility Notes

  • The extra abliteration step was performed with cs2764/mlx-abliteration.
  • This release uses a narrower targeted MLX pass than the earlier pass2 repository.
  • This repository publishes the resulting weights, not a full training or evaluation pipeline.
Downloads last month
432
Safetensors
Model size
20B params
Tensor type
BF16
U32
F32
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for vanch007/Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-final

Quantized
(104)
this model