Gemma 4 26B Codex (MLX 4-bit)

Goal: The ultimate goal of this project is to create the best Gemma 4 coder model available.

This model is a highly fine-tuned version of google/gemma-4-26B-A4B-it, optimized heavily on complex programming and software engineering tasks (such as the Evol-Instruct-Code dataset). It has been specifically quantized and converted to Apple's MLX format at 4-bit precision, making it an incredibly powerful, native, and high-speed coding assistant for Mac users with Apple Silicon.

Key Features

Unmatched Coding Ability: Fine-tuned specifically for reasoning, complex debugging, algorithmic generation, and software architecture.
MLX Optimized: Exported for native use on Apple Silicon (M1/M2/M3/M4).
4-bit Quantization: Squeezes the massive 26B parameter intelligence into a memory footprint that comfortably runs on MacBooks with 16GB+ Unified Memory while preserving high precision.
Continuous Batching & High Throughput: When run via mlx-lm or LM Studio, it leverages the unified memory pipeline for blazing-fast token generation.

How to use with LM Studio

Download and install LM Studio (v0.3.4 or newer).
In the search bar, look for this repository.
Download the model files.
Load the model. LM Studio will automatically engage the MLX engine instead of the standard llama.cpp engine.

How to use with Python (`mlx-lm`)

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("YOUR_HF_USERNAME/gemma4-26b-a4b-it-codex-mlx-4bit")

prompt = "Write a highly optimized Python script to merge overlapping intervals."
# Apply Gemma 4 Chat Template
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

response = generate(model, tokenizer, prompt=text, verbose=True, max_tokens=512)

Training Details

Base Model: google/gemma-4-26B-A4B-it
Dataset: Evol-Instruct-Code-80k-v1
Method: QLoRA via Unsloth (Rank 16, Alpha 32)
Epochs: 3.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Gemma 4 26B Codex (MLX 4-bit)

Key Features

How to use with LM Studio

How to use with Python (mlx-lm)

Training Details

How to use with Python (`mlx-lm`)