OsaurusAI/Mistral-Medium-3.5-128B-JANGTQ
Quantized mistralai/Mistral-Medium-3.5-128B for Apple Silicon (MLX) — dense 128B parameter multimodal LLM with image input + 256K context.
| Source | mistralai/Mistral-Medium-3.5-128B |
| Architecture | mistral3 wrapper — ministral3 text decoder (88L × 12288, GQA 96/8) + pixtral vision encoder (48L × 1664) |
| Quant format | JANGTQ (TurboQuant 2-bit + Hadamard, group_size=64) |
| Bundle size on disk | 40.76 GB (37 safetensors shards) |
| License | Apache-2.0 (inherits from upstream) |
| Modalities | Text + image in / text out (no audio, no video) |
| Context | 262 144 tokens (YaRN, factor=64 from orig=4096) |
What's quantized
- Text decoder linears (88 layers × 7 projections = q/k/v/o + gate/up/down) → TurboQuant 2-bit with Hadamard pre-rotation (group_size=64)
embed_tokens→ affine 8-bit (mx.quantize)- All RMSNorms (input/post-attention) → fp16 passthrough
vision_tower.*,multi_modal_projector.*,lm_head→ fp16 passthrough (matches upstreammodules_to_not_convert)
Vision tower (model.vision_tower.*, 48 layers, 1664 hidden, patch=14, image_size=1540, spatial_merge=2) — kept bf16 → fp16 passthrough, matching upstream quantization_config.modules_to_not_convert. Images dispatch to the same pixtral encoder; embeddings are folded into the LM via a 2-layer GELU multimodal projector (also fp16 passthrough).
lm_head is fp16 passthrough (matches upstream ignored set) — no quantization noise on the final logits.
Codec round-trip validation (this bundle)
7/7 PASS at cosine ≥ 0.94 across L0 → L87 covering attn + MLP + GQA k/v projections — the expected 2-bit TurboQuant noise floor. Source FP8 e4m3 dequant uses weight_scale_inv per-tensor scale; vision/projector/lm_head pass through unchanged.
Run on Apple Silicon
pip install mlx safetensors transformers pillow
python -m jang_tools.mistral3.runtime \
--src ~/.mlxstudio/models/OsaurusAI/Mistral-Medium-3.5-128B-JANGTQ \
--prompt "Describe this image." \
--image /path/to/photo.jpg \
--max-new 64
The runtime auto-detects weight_format and dispatches; image preprocessing matches the pixtral spec (jang_tools.vl.pixtral.PixtralImageProcessor).
Build
python -m jang_tools.convert_mistral3_jangtq \
~/.mlxstudio/models/_sources/Mistral-Medium-3.5-128B \
~/.mlxstudio/models/JANGQ-AI/Mistral-Medium-3.5-128B-JANGTQ JANGTQ2
Credits
Quantized by Jinho Jang (eric@osaurus.ai). MLX-native pipeline; 88 dense decoder layers + 48 pixtral vision layers run on M-series Macs.
- Downloads last month
- 944
Quantized
Model tree for OsaurusAI/Mistral-Medium-3.5-128B-JANGTQ
Base model
mistralai/Mistral-Medium-3.5-128B