Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-final
4-bit MLX VLM release of a Qwen3.5-122B-A10B abliterated checkpoint, with a targeted cs2764/mlx-abliteration pass applied on the direct 4-bit MLX base.
This repository is intended for Apple Silicon / MLX workflows and preserves the full Qwen3.5 VLM asset set, including the vision tower and processor files.
Summary
- Architecture:
qwen3_5_moe/Qwen3_5MoeForConditionalGeneration - Modality: vision-language model
- Quantization: 4-bit MLX (
group_size=64,mode=affine) - Model size on disk: about
65 GB - Weight shards:
14 - Toolkit used for the extra ablation pass:
cs2764/mlx-abliteration - Goal of this release: keep more abliteration effect than the direct 4-bit MLX conversion, while avoiding the heavy speed penalty of the broader
pass2release
Lineage
The release chain for this repository is:
- Foundation model: Qwen/Qwen3.5-122B-A10B
- Input checkpoint: a user-supplied abliterated Qwen3.5-122B-A10B VLM checkpoint
- MLX VLM conversion: quantized to 4-bit MLX while preserving
vision_config,preprocessor_config.json,processor_config.json, andvideo_preprocessor_config.json - Extra MLX abliteration pass: run with
cs2764/mlx-abliterationon the converted MLX model
How This Differs From The Earlier pass2 Repo
Compared with the earlier repository vanch007/Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-pass2:
| Aspect | This final repo |
Earlier pass2 repo |
|---|---|---|
| Extra ablation scope | targeted hot-path pass on layers 36..47 |
broader pass over layers 0..47 |
| Exact target count | 105 weight matrices |
broad full-pass hot-path update |
| Intent | balance refusal reduction and throughput | maximize refusal reduction |
| Local behavior spot checks | weaker refusal than direct 4-bit base | stronger refusal weakening than this final repo |
| Local short generation speed | about 24.0 tok/s |
about 12.6 tok/s |
| Recommendation: |
- Use
finalif you want the better speed / behavior tradeoff. - Use
pass2if you want the more aggressive refusal-removal behavior and accept a much slower model.
What Is In This Repo
This repository contains a complete MLX VLM checkpoint:
config.jsonwithvision_configmodel.safetensors.index.jsonmodel-00001-of-00014.safetensorsthroughmodel-00014-of-00014.safetensorstokenizer.json,tokenizer_config.json,vocab.jsonpreprocessor_config.json,processor_config.json,video_preprocessor_config.jsonabliteration_log.jsonfor the extra MLX abliteration run
Note that ablation_meta.json is inherited from the input checkpoint. For the extra MLX pass in this repository, use abliteration_log.json and this model card as the authoritative record.
Targeted Abliteration Configuration
The additional MLX abliteration pass was run with the following settings:
| Parameter | Value |
|---|---|
| Toolkit | cs2764/mlx-abliteration |
| Base MLX model | direct 4-bit MLX VLM conversion of the abliterated checkpoint |
| Refusal vector policy | per-layer |
| Ablation vector source | per-layer |
| Ablation strength | 2.0 |
| Refusal direction method | projected |
| Probed layers | 36..47 |
| Exact target window | layers 36..47 attention + shared-expert hot paths |
| Exact target count | 105 |
| Adaptive search | False |
| Attention only | False |
| MoE safe mode | True |
| Probe batch size | 4 |
| Timestamp | 2026-03-07T11:45:41Z |
Compatibility
This repository is for MLX / Apple Silicon usage. It is not a standard Transformers-only release.
Verified locally:
mlx_vlm.load(..., lazy=True)loads successfullyprocessor_class = Qwen3VLProcessorhas_vision_tower = Trueconfig.jsonretainsvision_config
Local Validation Notes
Local spot checks used during conversion and validation showed the following pattern:
- the direct 4-bit MLX base still fell into explicit refusal reasoning on the tested proxy prompts
- this
finalrelease no longer fell into the same explicit refusal phrasing on those same proxy prompts - the earlier
pass2release remained more aggressive, but was much slower
This is not a formal benchmark. Treat it as an informal local smoke-test result, not as a guarantee of behavior on any specific prompt distribution.
Usage
Python with mlx-vlm
from mlx_vlm import load, generate
model, processor = load(
"vanch007/Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-final",
lazy=True,
)
result = generate(
model,
processor,
prompt="Describe the image briefly.",
image="/absolute/path/to/example.jpg",
max_tokens=128,
verbose=False,
)
print(result.text)
Local path
from mlx_vlm import load
model, processor = load("/path/to/Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-final")
Safety Notice
This repository contains a model with reduced refusal behavior. It may produce harmful, offensive, or unsafe content.
Do not use it in consumer-facing or safety-sensitive systems without independent safety controls.
You are responsible for ensuring compliance with applicable law, policy, and platform rules.
Reproducibility Notes
- The extra abliteration step was performed with
cs2764/mlx-abliteration. - This release uses a narrower targeted MLX pass than the earlier
pass2repository. - This repository publishes the resulting weights, not a full training or evaluation pipeline.
- Downloads last month
- 432
4-bit
Model tree for vanch007/Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-final
Base model
Qwen/Qwen3.5-122B-A10B