Gemma-4-26B-A4B-JANG_2L-SAFE

9.9 GB mixed-precision quantized Gemma 4 26B MoE derivative with selected safety-related tensors restored from the original Google release.

Experimental derivative release. Despite the repository name, this model is not presented as fully safety-validated. It is a partial safety-restoration experiment built on top of a community JANG/CRACK derivative.

Compatibility

Runtime Status
vMLX (1.3.26+) ✅ Recommended — native JANG format support
mlx_vlm.server ❌ Not compatible — requires uniform quantization, JANG uses mixed-precision (2/6/8-bit)
mlx-lm ❌ Not compatible — same reason as mlx_vlm

Important: This model uses JANG v2.0 mixed-precision quantization (2/6/8-bit per layer). Standard MLX tools (mlx-lm, mlx_vlm) only support uniform bit-width and cannot load this format. Use vMLX or compatible inference engines with JANG support.

Why this exists

dealignai created an excellent mixed-precision quantization (JANG_2L) that fits the 26B MoE model into 9.9 GB. However, their release removes safety guardrails (CRACK).

This model partially restores safer behavior by replacing the 11 modified tensors with originals from google/gemma-4-26b-a4b-it, quantized to matching 8-bit precision.

Same size. Same quantization profile. Experimental safety restoration.

Specs

Base Model google/gemma-4-26b-a4b-it
Architecture MoE — 70.2B total, ~4B active per token
Attention Hybrid: 25 sliding-window + 5 full-attention layers
Model Size ~9.9 GB
Avg Bits 2.51 bits/weight
Context 262,144 tokens
Multimodal Vision + Text
Safety Partial restoration experiment; not fully safety-validated
Format MLX safetensors (JANG v2.0)

JANG_2L Mixed-Precision Quantization

Tier Components Bits
CRITICAL Attention (Q/K/V/O), router, shared MLP, embeddings 8
IMPORTANT Gate projection, up projection 6
COMPRESS Expert MLP (down proj), switch MLP 2

This approach protects the most sensitive pathways while aggressively compressing the 128 MoE experts, achieving excellent quality at 2.51 average bits.

Safety Restoration Details

The CRACK variant removed refusal vectors from o_proj weights in layers 15-25 (11 tensors, using MPOA method at strength 8.0).

This repository attempts a partial restoration:

  1. Fetched original BF16 o_proj weights from google/gemma-4-26b-a4b-it
  2. Quantized to 8-bit (group_size=64) matching the JANG CRITICAL tier
  3. Swapped into the JANG_2L model, replacing abliterated tensors
  4. Removed crack_surgery config

All other weights are identical to the CRACK version (they were never modified).

Safety disclaimer

  • This is a modified derivative of a community CRACK release, not an official Google safety release.
  • Only the 11 documented o_proj tensors were restored; no comprehensive red-team or policy eval is included in this repository.
  • Use this model as a research / experimentation artifact, not as proof of production-ready safety.

Requirements

  • Apple Silicon Mac with 16+ GB unified memory
  • vMLX 1.3.26+ (required — standard mlx-lm/mlx_vlm cannot load mixed-precision JANG format)

Usage with vMLX

Download and load in vMLX — it auto-detects the Gemma 4 architecture and JANG format.

Credits

License

Gemma License — permits modification and redistribution. This is a modified derivative of Google's Gemma 4. Original license terms apply.

Downloads last month
357
Safetensors
Model size
3B params
Tensor type
U32
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support