Gemma 3 4B Instruct – MLX 5-bit (Apple Silicon)

This repository provides a 5-bit quantized MLX version of Gemma 3 4B Instruct, optimized for efficient local inference on Apple Silicon (M1–M5).

Highlights

pip install mlx mlx-lm

mlx_lm.generate \
  --model ./gemma-3-4b-it-mlx-5bit \
  --prompt "Explain HVAC airflow calculation in simple terms."

Input

A room needs 12000 BTU/h cooling. If a system uses about 400 CFM per ton, estimate the airflow needed.

Expected reasoning:

12000 BTU/h = 1 ton → airflow ≈ 400 CFM

This model is a derivative work based on:

Google Gemma 3 4B Instruct

This repository includes the following modifications:

Converted to MLX format Quantized to 5-bit precision Optimized for Apple Silicon inference

Gemma is provided under and subject to the Gemma Terms of Use: https://ai.google.dev/gemma/terms

This is an independently modified version of the original model. Google is not responsible for this version or its outputs.

Google – for the Gemma model MLX team – for Apple Silicon inference framework

Safetensors

Model size

0.7B params

Tensor type

BF16

U32

MLX

Hardware compatibility

5-bit

Base model

Finetuned

Quantized

(209)

this model