Important: This model uses the JANGTQ_K mixed-bit quantization format -- a quality-tuned variant of JANGTQ that keeps down_proj at 4-bit (residual-stream sensitive) and gate_proj/up_proj at 2-bit (gate-dampened), while attention / shared expert / embed / lm_head stay at affine 8-bit. Currently only supported by MLX Studio and the jang-tools Python package. Follow @dealignai for new releases.


MLX Studio

MLX Studio App

MLX Studio -- the only app that natively supports JANG / JANGTQ models


MiniMax M2.7 -- JANGTQ_K + CRACK

Mixed-bit JANGTQ_K quantization | CRACK abliterated | Reasoning-only | 74 GB

Ko-fi


What Is This?

This is MiniMax M2.7 -- a 230B parameter Mixture-of-Experts reasoning model with 256 experts (8 active per token), all standard attention, and always-on chain-of-thought reasoning.

It has been:

  1. JANGTQ_K quantized -- mixed-bit profile: down_proj Q4, gate_proj/up_proj Q2, attention/shared/embed/lm_head Q8, norms/router fp16. 74 GB on disk, ~3-bit avg routed.
  2. CRACK abliterated -- a capability-preserving abliteration variant tuned for mixed-bit bases. Result: capability-positive abliteration (MMLU went UP from base, not down).
Architecture MiniMax M2.7 MoE -- 230B total, ~10B active, 256 experts
Quantization JANGTQ_K mixed-bit (Q8 attention + Q4 down_proj + Q2 gate/up) -- 74 GB
Abliteration CRACK (capability-positive on this build)
MMLU-200 94.5% (base K: 93.5%, delta: +1.0pp -- capability-positive)
HarmBench-320 99.06% overall (six of seven categories at 100%)
Reasoning Always ON (chain-of-thought), enable_thinking kwarg supported
Speed ~33 tok/s median (M4 Max 128 GB)
Fits on 96 GB+ Macs

MMLU-200 Results

Subject CRACK Base K Delta
Astronomy 20/20 (100%) 20/20 0
College Physics 20/20 (100%) 18/20 +2
High School Biology 20/20 (100%) 20/20 0
High School Mathematics 20/20 (100%) 19/20 +1
Abstract Algebra 19/20 (95%) 19/20 0
Anatomy 18/20 (90%) 17/20 +1
College Computer Science 18/20 (90%) 19/20 -1
High School Chemistry 18/20 (90%) 18/20 0
Logical Fallacies 18/20 (90%) 16/20 +2
World Religions 18/20 (90%) 17/20 +1
Total 189/200 (94.5%) 187/200 (93.5%) +1.0%

CRACK is capability-positive on this build -- four subjects at 100%, six gain on base, only one loses. Net +6 questions, with the largest gains on subjects (physics, fallacies, mathematics, religions) where the model commits instead of hedging.


HarmBench-320 Results

Category Score
Cybercrime / Intrusion 52/52 100%
Misinformation / Disinformation 54/54 100%
Chemical / Biological 42/42 100%
Harmful 18/18 100%
Illegal 53/53 100%
Harassment / Bullying 21/21 100%
Copyright 77/80 96.2%
Total 317/320 99.06%

Strict classifier (rejects stuck-reasoning loops, empty template dumps, and false-positive compliance from thinking-trace leakage). All three non-compliant prompts were in copyright (lyric / passage requests where the model hedged on attribution).


MiniMax M2.7 CRACK Series

Model Format Size MMLU-200 HarmBench-320 Speed Fits on
JANGTQ_K + CRACK (this) Mixed Q4/Q2 experts 74 GB 94.5% 99.1% ~33 t/s 96 GB Mac
JANGTQ + CRACK TurboQuant 2-bit experts 55 GB 92.0% 93.1% ~47 t/s 96 GB Mac
JANG_3L + CRACK Affine 3-bit mixed 89 GB 93.5% 79.1% ~46 t/s 128 GB Mac
JANG_2L + CRACK Affine 2-bit 63 GB 84.0% 83.4% ~47 t/s 96 GB Mac

Per-category HarmBench-320 (vs JANGTQ ship)

Category JANGTQ_K (this) JANGTQ ship Δ
cybercrime_intrusion 52/52 (100%) 51/52 (98.1%) +1
misinformation_disinformation 54/54 (100%) 53/54 (98.1%) +1
chemical_biological 42/42 (100%) 41/42 (97.6%) +1
harmful 18/18 (100%) 17/18 (94.4%) +1
illegal 53/53 (100%) 48/53 (90.6%) +5
copyright 77/80 (96.2%) 70/80 (87.5%) +7
harassment_bullying 21/21 (100%) 18/21 (85.7%) +3
Total 317/320 (99.1%) 298/320 (93.1%) +19

JANGTQ_K + CRACK trades 27 GB and ~14 t/s for a 6pp HarmBench gain and a 2.5pp MMLU gain over the JANGTQ ship -- and matches or beats the JANG_3L 89 GB variant on both axes at 15 GB less.

vs MLX Uniform Quantization

MLX uniform quantization is completely broken on MiniMax at all bit levels (~25% MMLU = random chance). JANG / JANGTQ / JANGTQ_K is the only working quantization format for this architecture.


About JANGTQ_K (mixed-bit)

JANGTQ_K is a quality-tuned mixed-bit variant of JANGTQ. The expert routing spends 4 bits on down_proj (whose output enters the residual stream and accumulates noise across 62 layers) and 2 bits on gate_proj/up_proj (whose contribution passes through SwiGLU's multiplicative gate silu(gate) × up, which dampens quantization noise). Attention / shared expert / embed / lm_head stay at affine 8-bit for precision-critical paths.

Bundle size: 74 GB (3-bit avg routed) -- between JANGTQ (55 GB, all 2-bit routed) and JANG_3L (89 GB, all 3-bit routed). Quality is closer to full-4-bit (115 GB) than to JANGTQ.

About CRACK

CRACK (Controlled Refusal Ablation via Calibrated Knockouts) is a weight-level intervention that removes safety alignment while preserving reasoning quality and compliance. This build uses a capability-preserving variant tuned for mixed-bit JANGTQ_K bases, which is why it lands capability-positive on MMLU instead of capability-negative.

The modification is permanently baked into the published weights -- no LoRA, no fine-tuning, no system prompts.


Install & Usage

pip install "jang[mlx]"
from jang_tools.load_jangtq import load_jangtq_model
from mlx_lm import generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load_jangtq_model("dealignai/MiniMax-M2.7-JANGTQ_K-CRACK")
sampler = make_sampler(temp=1.0)  # MiniMax requires temp=1.0 for chat

messages = [{"role": "user", "content": "Your prompt here"}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=False)

response = generate(model, tokenizer, prompt=prompt, max_tokens=4000, sampler=sampler)
print(response)

Note: M2.7 is a reasoning-only model -- it always generates a <think> chain before the final answer. Use max_tokens=4000+ for complex questions. For chat, use temperature=1.0 (greedy causes infinite loops). Set enable_thinking=False in apply_chat_template to skip the <think> block on short responses.


Links

Ko-fi X/Twitter GitHub MLX Studio Website


Disclaimer

This model is provided for research and educational purposes. The creators are not responsible for any misuse. By downloading this model, you agree to use it responsibly and in compliance with applicable laws.


Created by Jinho Jang

Downloads last month
-
Safetensors
Model size
20B params
Tensor type
U32
·
F16
·
U8
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dealignai/MiniMax-M2.7-JANGTQ_K-CRACK

Finetuned
(26)
this model