LibraxisAI/Qwen3.5-VL-122B-A10B-mlx-crk-mxfp4

A LibraxisAI MLX rebuild of the public CRACK release of Qwen 3.5 VL 122B-A10B, prepared for high-memory Apple Silicon machines and quantized to mxfp4.

Overview

This repository contains a large multimodal Qwen 3.5 VL model in MLX/VLM format for local inference on Apple Silicon. It was rebuilt from the public dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK release through a local bfloat16 dequantized master and then requantized to mxfp4.

This is a rebuild, not the original full-precision CRACK checkpoint. The goal of this release is simple: keep the public CRACK lineage available in a form that is practical for local serving, batching, and multimodal workloads on Mac hardware.

What This Model Includes

  • MLX/VLM weights for Apple Silicon
  • Qwen 3.5 VL 122B-A10B multimodal architecture
  • mxfp4 default quantization with group_size=16
  • LibraxisAI chat-template defaults for serving behavior
  • support for text, image, and video inputs in compatible MLX/VLM stacks

Lineage

  • Base architecture: Qwen/Qwen3.5-122B-A10B
  • Public CRACK source used for rebuild: dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK
  • Rebuild path: 8-bit MLX -> bfloat16 local master -> mxfp4 MLX

Quantization Notes

The default quantization mode in this release is mxfp4 with group_size=16.

Some modules remain recorded with per-layer 8-bit settings in config.json. That is expected for this rebuild and reflects the actual mixed quantization metadata produced by conversion, rather than a fake claim that every tensor is uniformly mxfp4.

If you want the exact quantization layout, inspect:

  • quantization
  • quantization_config

Chat Template Defaults

This variant includes the LibraxisAI instruction layer in chat_template.jinja. By default it:

  • answers in Polish
  • identifies itself as created by LibraxisAI when asked
  • stays concise and concrete
  • avoids emoji
  • uses kaomoji naturally in responses

These are serving and persona defaults only. The underlying weights still follow the public CRACK lineage described above.

Intended Use

This model is meant for local multimodal inference on high-memory Apple Silicon systems. It is a good fit for:

  • local text, image, and video understanding
  • custom OpenAI-compatible serving stacks
  • batch-oriented inference pipelines
  • workstation-class deployments where a single strong local VLM is preferred over multiple smaller replicas

It is not intended for low-memory Macs.

Quick Start with mlx-vlm

pip install -U mlx-vlm
python -m mlx_vlm.generate \
  --model LibraxisAI/Qwen3.5-VL-122B-A10B-mlx-crk-mxfp4 \
  --prompt "Opisz zawarto艣膰 obrazu." \
  --image /path/to/image.png \
  --max-tokens 200

Rebuild Reference

python -m mlx_vlm.convert \
  --hf-path dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK \
  --dequantize --dtype bfloat16 \
  --mlx-path /path/to/Qwen3.5-VL-122B-A10B-mlx-abl

python -m mlx_vlm.convert \
  --hf-path /path/to/Qwen3.5-VL-122B-A10B-mlx-abl \
  -q --q-mode mxfp4 --q-group-size 16 --dtype bfloat16 \
  --mlx-path /path/to/output

License

Apache 2.0, following the applicable upstream and derivative distribution terms of the source model family.

Downloads last month
261
Safetensors
Model size
20B params
Tensor type
U8
U32
BF16
F32
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for LibraxisAI/Qwen3.5-VL-122B-A10B-mlx-crk-mxfp4

Quantized
(104)
this model