Osaurus AI

Gemma 4 31B-it — JANG_4M (Mixed-Precision, 4-bit)

JANG — Jang Adaptive N-bit Grading | Mixed-Precision Quantization for Apple Silicon

Website  GitHub  PyPI  OsaurusAI


Osaurus natively supports JANG models. Download at osaurus.ai.


Model Details

Property Value
Base Model google/gemma-4-31b-it
Architecture Dense Transformer + Hybrid Sliding/Global Attention
Parameters 31B (29.2B weights)
Profile JANG_4M (CRITICAL=8-bit, COMPRESS=4-bit)
Avg Bits/Weight 5.1
Model Size 18 GB
Vision Yes (multimodal, float16 passthrough)
Context Length 128K tokens
Layers 60
Format JANG v2 (MLX-native safetensors, instant load)

JANG_4M Bit Allocation

Tier Components Bits
CRITICAL Attention (Q/K/V/O), embeddings 8
COMPRESS MLP (gate, up, down proj), remaining weights 4

JANG protects attention at full precision while compressing MLP weights — where dense models are most tolerant of quantization. Vision encoder is preserved in float16 for full multimodal quality.

Vision Weight Verification

All 355 vision tower tensors verified present and non-zero. The 31B dense model is text+vision (no audio tower).

Component Tensor Count Status
Vision Tower (SigLIP) 355 All non-zero
Language Model remaining All non-zero

Benchmarks

200-question MMLU (20 per subject x 10 subjects). Thinking OFF (enable_thinking=False), greedy decoding (temp=0.0).

Subject JANG_4M
Abstract Algebra 13/20
Anatomy 13/20
Astronomy 17/20
College CS 14/20
College Physics 14/20
HS Biology 19/20
HS Chemistry 15/20
HS Mathematics 9/20
Logical Fallacies 19/20
World Religions 20/20
Total 153/200 (76.5%)

Architecture Highlights

  • Dense transformer with 60 layers
  • Hybrid attention: sliding-window + full-attention layers (every 6th layer is full)
  • Dual head dimensions: 256 (sliding) / 512 (global)
  • K=V weight sharing on global attention layers
  • Vision encoder preserved in float16 for multimodal inference

Usage

# Requires Osaurus (https://osaurus.ai)
osaurus serve OsaurusAI/Gemma-4-31B-it-JANG_4M

Requirements

  • Apple Silicon Mac with 24+ GB unified memory
  • Osaurus or compatible MLX inference engine with Gemma 4 support

Quantized by Osaurus AI using JANG

Downloads last month
342
Safetensors
Model size
6B params
Tensor type
U32
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support