LibraxisAI/Qwen3.5-VL-122B-A10B-mlx-crk-mxfp4
A LibraxisAI MLX rebuild of the public CRACK release of Qwen 3.5 VL 122B-A10B, prepared for high-memory Apple Silicon machines and quantized to mxfp4.
Overview
This repository contains a large multimodal Qwen 3.5 VL model in MLX/VLM format for local inference on Apple Silicon.
It was rebuilt from the public dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK release through a local bfloat16 dequantized master and then requantized to mxfp4.
This is a rebuild, not the original full-precision CRACK checkpoint. The goal of this release is simple: keep the public CRACK lineage available in a form that is practical for local serving, batching, and multimodal workloads on Mac hardware.
What This Model Includes
- MLX/VLM weights for Apple Silicon
- Qwen 3.5 VL 122B-A10B multimodal architecture
mxfp4default quantization withgroup_size=16- LibraxisAI chat-template defaults for serving behavior
- support for text, image, and video inputs in compatible MLX/VLM stacks
Lineage
- Base architecture:
Qwen/Qwen3.5-122B-A10B - Public CRACK source used for rebuild:
dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK - Rebuild path:
8-bit MLX -> bfloat16 local master -> mxfp4 MLX
Quantization Notes
The default quantization mode in this release is mxfp4 with group_size=16.
Some modules remain recorded with per-layer 8-bit settings in config.json.
That is expected for this rebuild and reflects the actual mixed quantization metadata produced by conversion, rather than a fake claim that every tensor is uniformly mxfp4.
If you want the exact quantization layout, inspect:
quantizationquantization_config
Chat Template Defaults
This variant includes the LibraxisAI instruction layer in chat_template.jinja.
By default it:
- answers in Polish
- identifies itself as created by LibraxisAI when asked
- stays concise and concrete
- avoids emoji
- uses kaomoji naturally in responses
These are serving and persona defaults only. The underlying weights still follow the public CRACK lineage described above.
Intended Use
This model is meant for local multimodal inference on high-memory Apple Silicon systems. It is a good fit for:
- local text, image, and video understanding
- custom OpenAI-compatible serving stacks
- batch-oriented inference pipelines
- workstation-class deployments where a single strong local VLM is preferred over multiple smaller replicas
It is not intended for low-memory Macs.
Quick Start with mlx-vlm
pip install -U mlx-vlm
python -m mlx_vlm.generate \
--model LibraxisAI/Qwen3.5-VL-122B-A10B-mlx-crk-mxfp4 \
--prompt "Opisz zawarto艣膰 obrazu." \
--image /path/to/image.png \
--max-tokens 200
Rebuild Reference
python -m mlx_vlm.convert \
--hf-path dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK \
--dequantize --dtype bfloat16 \
--mlx-path /path/to/Qwen3.5-VL-122B-A10B-mlx-abl
python -m mlx_vlm.convert \
--hf-path /path/to/Qwen3.5-VL-122B-A10B-mlx-abl \
-q --q-mode mxfp4 --q-group-size 16 --dtype bfloat16 \
--mlx-path /path/to/output
License
Apache 2.0, following the applicable upstream and derivative distribution terms of the source model family.
- Downloads last month
- 261
8-bit
Model tree for LibraxisAI/Qwen3.5-VL-122B-A10B-mlx-crk-mxfp4
Base model
Qwen/Qwen3.5-122B-A10B