LibraxisAI/Qwen3.5-VL-122B-A10B-mlx-crk-mxfp4

A LibraxisAI MLX rebuild of the public CRACK release of Qwen 3.5 VL 122B-A10B, prepared for high-memory Apple Silicon machines and quantized to mxfp4.

Overview

This repository contains a large multimodal Qwen 3.5 VL model in MLX/VLM format for local inference on Apple Silicon. It was rebuilt from the public dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK release through a local bfloat16 dequantized master and then requantized to mxfp4.

This is a rebuild, not the original full-precision CRACK checkpoint. The goal of this release is simple: keep the public CRACK lineage available in a form that is practical for local serving, batching, and multimodal workloads on Mac hardware.

What This Model Includes

MLX/VLM weights for Apple Silicon
Qwen 3.5 VL 122B-A10B multimodal architecture
mxfp4 default quantization with group_size=16
LibraxisAI chat-template defaults for serving behavior
support for text, image, and video inputs in compatible MLX/VLM stacks

Lineage

Base architecture: Qwen/Qwen3.5-122B-A10B
Public CRACK source used for rebuild: dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK
Rebuild path: 8-bit MLX -> bfloat16 local master -> mxfp4 MLX

Quantization Notes

The default quantization mode in this release is mxfp4 with group_size=16.

Some modules remain recorded with per-layer 8-bit settings in config.json. That is expected for this rebuild and reflects the actual mixed quantization metadata produced by conversion, rather than a fake claim that every tensor is uniformly mxfp4.

If you want the exact quantization layout, inspect:

quantization
quantization_config

Chat Template Defaults

This variant includes the LibraxisAI instruction layer in chat_template.jinja. By default it:

answers in Polish
identifies itself as created by LibraxisAI when asked
stays concise and concrete
avoids emoji
uses kaomoji naturally in responses

These are serving and persona defaults only. The underlying weights still follow the public CRACK lineage described above.

Intended Use

This model is meant for local multimodal inference on high-memory Apple Silicon systems. It is a good fit for:

local text, image, and video understanding
custom OpenAI-compatible serving stacks
batch-oriented inference pipelines
workstation-class deployments where a single strong local VLM is preferred over multiple smaller replicas

It is not intended for low-memory Macs.

Quick Start with mlx-vlm

pip install -U mlx-vlm

python -m mlx_vlm.generate \
  --model LibraxisAI/Qwen3.5-VL-122B-A10B-mlx-crk-mxfp4 \
  --prompt "Opisz zawartość obrazu." \
  --image /path/to/image.png \
  --max-tokens 200

Rebuild Reference

python -m mlx_vlm.convert \
  --hf-path dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK \
  --dequantize --dtype bfloat16 \
  --mlx-path /path/to/Qwen3.5-VL-122B-A10B-mlx-abl

python -m mlx_vlm.convert \
  --hf-path /path/to/Qwen3.5-VL-122B-A10B-mlx-abl \
  -q --q-mode mxfp4 --q-group-size 16 --dtype bfloat16 \
  --mlx-path /path/to/output

License

Apache 2.0, following the applicable upstream and derivative distribution terms of the source model family.

Downloads last month: 261

Safetensors

Model size

20B params

Tensor type

U32

BF16

F32

MLX

Hardware compatibility

8-bit

Model tree for LibraxisAI/Qwen3.5-VL-122B-A10B-mlx-crk-mxfp4

Base model

Qwen/Qwen3.5-122B-A10B

Quantized

(104)

this model