Gemma-4-26B-A4B-JANG_2L-SAFE

9.9 GB mixed-precision quantized Gemma 4 26B MoE derivative with selected safety-related tensors restored from the original Google release.

Experimental derivative release. Despite the repository name, this model is not presented as fully safety-validated. It is a partial safety-restoration experiment built on top of a community JANG/CRACK derivative.

Compatibility

Runtime	Status
vMLX (1.3.26+)	✅ Recommended — native JANG format support
mlx_vlm.server	❌ Not compatible — requires uniform quantization, JANG uses mixed-precision (2/6/8-bit)
mlx-lm	❌ Not compatible — same reason as mlx_vlm

Important: This model uses JANG v2.0 mixed-precision quantization (2/6/8-bit per layer). Standard MLX tools (mlx-lm, mlx_vlm) only support uniform bit-width and cannot load this format. Use vMLX or compatible inference engines with JANG support.

Why this exists

dealignai created an excellent mixed-precision quantization (JANG_2L) that fits the 26B MoE model into 9.9 GB. However, their release removes safety guardrails (CRACK).

This model partially restores safer behavior by replacing the 11 modified tensors with originals from google/gemma-4-26b-a4b-it, quantized to matching 8-bit precision.

Same size. Same quantization profile. Experimental safety restoration.

Specs


Base Model	`google/gemma-4-26b-a4b-it`
Architecture	MoE — 70.2B total, ~4B active per token
Attention	Hybrid: 25 sliding-window + 5 full-attention layers
Model Size	~9.9 GB
Avg Bits	2.51 bits/weight
Context	262,144 tokens
Multimodal	Vision + Text
Safety	Partial restoration experiment; not fully safety-validated
Format	MLX safetensors (JANG v2.0)

JANG_2L Mixed-Precision Quantization

Tier	Components	Bits
CRITICAL	Attention (Q/K/V/O), router, shared MLP, embeddings	8
IMPORTANT	Gate projection, up projection	6
COMPRESS	Expert MLP (down proj), switch MLP	2

This approach protects the most sensitive pathways while aggressively compressing the 128 MoE experts, achieving excellent quality at 2.51 average bits.

Safety Restoration Details

The CRACK variant removed refusal vectors from o_proj weights in layers 15-25 (11 tensors, using MPOA method at strength 8.0).

This repository attempts a partial restoration:

Fetched original BF16 o_proj weights from google/gemma-4-26b-a4b-it
Quantized to 8-bit (group_size=64) matching the JANG CRITICAL tier
Swapped into the JANG_2L model, replacing abliterated tensors
Removed crack_surgery config

All other weights are identical to the CRACK version (they were never modified).

Safety disclaimer

This is a modified derivative of a community CRACK release, not an official Google safety release.
Only the 11 documented o_proj tensors were restored; no comprehensive red-team or policy eval is included in this repository.
Use this model as a research / experimentation artifact, not as proof of production-ready safety.

Requirements

Apple Silicon Mac with 16+ GB unified memory
vMLX 1.3.26+ (required — standard mlx-lm/mlx_vlm cannot load mixed-precision JANG format)

Usage with vMLX

Download and load in vMLX — it auto-detects the Gemma 4 architecture and JANG format.

Credits

JANG quantization method: dealignai — innovative mixed-precision approach
Base model: Google Gemma Team
Safety restoration & release: PINKlab

License

Gemma License — permits modification and redistribution. This is a modified derivative of Google's Gemma 4. Original license terms apply.

Downloads last month: 357

Safetensors

Model size

3B params

Tensor type

U32

F16

MLX

Hardware compatibility

Quantized