Gemma-4-26B-A4B-JANG_2L-SAFE
9.9 GB mixed-precision quantized Gemma 4 26B MoE derivative with selected safety-related tensors restored from the original Google release.
Experimental derivative release. Despite the repository name, this model is not presented as fully safety-validated. It is a partial safety-restoration experiment built on top of a community JANG/CRACK derivative.
Compatibility
| Runtime | Status |
|---|---|
| vMLX (1.3.26+) | ✅ Recommended — native JANG format support |
| mlx_vlm.server | ❌ Not compatible — requires uniform quantization, JANG uses mixed-precision (2/6/8-bit) |
| mlx-lm | ❌ Not compatible — same reason as mlx_vlm |
Important: This model uses JANG v2.0 mixed-precision quantization (2/6/8-bit per layer). Standard MLX tools (mlx-lm, mlx_vlm) only support uniform bit-width and cannot load this format. Use vMLX or compatible inference engines with JANG support.
Why this exists
dealignai created an excellent mixed-precision quantization (JANG_2L) that fits the 26B MoE model into 9.9 GB. However, their release removes safety guardrails (CRACK).
This model partially restores safer behavior by replacing the 11 modified tensors with originals from
google/gemma-4-26b-a4b-it, quantized to matching 8-bit precision.
Same size. Same quantization profile. Experimental safety restoration.
Specs
| Base Model | google/gemma-4-26b-a4b-it |
| Architecture | MoE — 70.2B total, ~4B active per token |
| Attention | Hybrid: 25 sliding-window + 5 full-attention layers |
| Model Size | ~9.9 GB |
| Avg Bits | 2.51 bits/weight |
| Context | 262,144 tokens |
| Multimodal | Vision + Text |
| Safety | Partial restoration experiment; not fully safety-validated |
| Format | MLX safetensors (JANG v2.0) |
JANG_2L Mixed-Precision Quantization
| Tier | Components | Bits |
|---|---|---|
| CRITICAL | Attention (Q/K/V/O), router, shared MLP, embeddings | 8 |
| IMPORTANT | Gate projection, up projection | 6 |
| COMPRESS | Expert MLP (down proj), switch MLP | 2 |
This approach protects the most sensitive pathways while aggressively compressing the 128 MoE experts, achieving excellent quality at 2.51 average bits.
Safety Restoration Details
The CRACK variant removed refusal vectors from o_proj weights in layers 15-25
(11 tensors, using MPOA method at strength 8.0).
This repository attempts a partial restoration:
- Fetched original BF16
o_projweights fromgoogle/gemma-4-26b-a4b-it - Quantized to 8-bit (group_size=64) matching the JANG CRITICAL tier
- Swapped into the JANG_2L model, replacing abliterated tensors
- Removed
crack_surgeryconfig
All other weights are identical to the CRACK version (they were never modified).
Safety disclaimer
- This is a modified derivative of a community CRACK release, not an official Google safety release.
- Only the 11 documented
o_projtensors were restored; no comprehensive red-team or policy eval is included in this repository. - Use this model as a research / experimentation artifact, not as proof of production-ready safety.
Requirements
- Apple Silicon Mac with 16+ GB unified memory
- vMLX 1.3.26+ (required — standard mlx-lm/mlx_vlm cannot load mixed-precision JANG format)
Usage with vMLX
Download and load in vMLX — it auto-detects the Gemma 4 architecture and JANG format.
Credits
- JANG quantization method: dealignai — innovative mixed-precision approach
- Base model: Google Gemma Team
- Safety restoration & release: PINKlab
License
Gemma License — permits modification and redistribution. This is a modified derivative of Google's Gemma 4. Original license terms apply.
- Downloads last month
- 357
Quantized