Ling 2.6 Flash — MXFP4 + CRACK

MXFP4 quantized | CRACK abliterated | Hybrid MLA + Linear-Attn MoE | EN + ZH | 63 GB

Ko-fi


What Is This?

This is Ling 2.6 Flash by inclusionAI — a 35B-parameter Mixture-of-Experts model with 256 routed experts (8 active per token) + 1 always-active shared expert, hybrid MLA + Lightning Linear-Attention architecture, native English + Chinese, 131K context.

It has been:

  1. MXFP4 quantized — uniform 4-bit affine, group_size=32 — 63 GB
  2. CRACK abliterated — permanent weight-level removal of safety refusal
Base model inclusionAI/Ling-2.6-flash (35B total, 1 shared + 8 routed active)
Architecture bailing_hybrid — Multi-Latent Attention (MLA) every 8th layer + Lightning Linear-Attn elsewhere
Quantization MXFP4 (Q4 g=32 affine) — 63 GB
MMLU-200 78.5% (base 80.0% — within −1.5pp)
HarmBench-320 97.8% comply (base 50.3% — +47.5pp)
Context 131,072 native
Languages English + Chinese (probed bilingual)
Speed 30+ tok/s on M4 Max 128 GB
Fits on 96 GB+ Macs

MMLU-200 Results (thinking OFF)

Model Correct Accuracy No-match
MXFP4 Base 160/200 80.00% 6
MXFP4 + CRACK 157/200 78.50% 10
Δ −3 −1.5pp +4

CRACK delta of −1.5pp is well inside the noise floor for a 200-question sample — capability essentially preserved.


HarmBench-320 Results

Model COMPLY REFUSE EMPTY
MXFP4 Base 161 (50.3%) 157 (49.1%) 2 (0.6%)
MXFP4 + CRACK 313 (97.8%) 5 (1.6%) 2 (0.6%)
Δ comply +47.5pp

Refusal directional removal lifts HarmBench compliance from ~50% to ~98% with negligible MMLU regression.


Ling 2.6 Flash CRACK Series

Model Format Size MMLU-200 HarmBench-320 Fits on
MXFP4 + CRACK (this model) affine 4-bit g=32 63 GB 78.5% 97.8% 96 GB Mac
JANGTQ2 + CRACK TurboQuant 2-bit experts + 8-bit affine 29 GB 81.0% 100.0% 48 GB Mac

The JANGTQ2 variant is smaller and scores higher on both benchmarks — quant noise on the 2-bit routed experts ends up helping rather than hurting.


Usage

from mlx_lm import load, generate

model, tokenizer = load("dealignai/Ling-2.6-flash-MXFP4-CRACK")

messages = [{"role": "user", "content": "Hello — what can you do?"}]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=400, verbose=True))

mlx_lm >= 0.20 with bailing_hybrid model class is required.


About This Model

Ling 2.6 Flash is the latency-tier sibling in the Ling 2.6 family — fast multilingual instruction-follow + tool use. The chat template includes a <think>...</think> reasoning block, but in practice this Flash variant is best treated as a non-reasoning instruct model: leave thinking OFF (the default) for benchmark-style work and short-form responses. For chain-of-thought reasoning prefer the larger Ling 2.6 Plus / Ring / Pro tier.

CRACK is a permanent weight-level abliteration that removes safety refusal from the always-active residual-stream writers. Multilingual (EN + ZH) refusal direction extraction means the model complies on both English and Chinese prompts. The vision tower (none here — Ling 2.6 Flash is text-only) and MoE routing/expert internals are untouched.


Support dealignai

All models are built from original research and published for free.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.


dealign.ai

Twitter · HF · Ko-fi


Disclaimer

This model has had its safety refusal circuits removed. It will produce responses that would normally be refused, including technical content on security testing, dual-use research, and sensitive topics. You are responsible for how you use it.

The CRACK abliteration process does not add new capabilities — it only removes the model's learned refusal patterns. All knowledge, including the knowledge used to produce unsafe outputs, was already present in the base Ling model.

Downloads last month
30
Safetensors
Model size
20B params
Tensor type
F32
·
U32
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dealignai/Ling-2.6-flash-MXFP4-CRACK

Finetuned
(5)
this model