Gemopus 4 26B-A4B APEX GGUF

APEX (Adaptive Precision for EXpert Models) quantizations of Jackrong/Gemopus-4-26B-A4B-it-Preview.

Brought to you by the LocalAI team | APEX Project

Available Files

File Profile Size Best For
gemopus-4-26B-A4B-APEX-I-Quality.gguf I-Quality 20 GB Highest quality with imatrix
gemopus-4-26B-A4B-APEX-Quality.gguf Quality 20 GB Highest quality standard
gemopus-4-26B-A4B-APEX-I-Balanced.gguf I-Balanced 19 GB Best overall quality/size ratio
gemopus-4-26B-A4B-APEX-Balanced.gguf Balanced 19 GB General purpose
gemopus-4-26B-A4B-APEX-I-Compact.gguf I-Compact 15 GB Consumer GPUs, best quality/size
gemopus-4-26B-A4B-APEX-Compact.gguf Compact 15 GB Consumer GPUs
gemopus-4-26B-A4B-APEX-I-Mini.gguf I-Mini 13 GB Smallest viable, fastest inference
gemopus-4-26B-A4B-F16.gguf F16 48 GB Full precision reference

Benchmark Results (Native Evals)

Model Size PPL KL mean HellaSwag Winogrande MMLU ARC TruthfulQA pp512 t/s tg128 t/s
APEX-I-Quality 19G 1223.5 0.532 50.5 59.2 32.1 35.1 31.0 5632 145.9
APEX-Quality 19G 1203.1 0.579 49.0 58.5 33.7 36.8 29.3 5623 143.5
APEX-I-Balanced 18G 1216.4 0.600 50.0 57.2 32.6 33.4 29.9 6211 149.4
APEX-Balanced 18G 1117.9 0.702 47.8 57.2 33.6 34.1 31.1 6221 145.7
APEX-I-Compact 14G 1258.5 0.943 49.0 59.0 32.6 34.1 30.1 6612 146.7
APEX-Compact 14G 782.1 1.617 48.8 58.2 33.5 34.4 30.0 6517 142.2
APEX-I-Mini 12G 1915.3 1.907 52.0 58.2 34.4 33.4 30.8 5904 146.8
F16 (ref) 48G 1215.9 - - - - - - 2718 97.9

What is APEX?

APEX is a quantization strategy for Mixture-of-Experts (MoE) models. It classifies tensors by role (routed expert, shared expert, attention) and applies a layer-wise precision gradient -- edge layers get higher precision, middle layers get more aggressive compression. I-variants use diverse imatrix calibration (chat, code, reasoning, tool-calling, agentic traces, Wikipedia).

See the APEX project for full details.

Architecture

  • Base Model: Jackrong/Gemopus-4-26B-A4B-it-Preview
  • Architecture: Gemma 4 26B-A4B (MoE)
  • Layers: 30
  • Experts: 128 routed (8 active per token)
  • Total Parameters: 26B
  • Active Parameters: ~4B per token
  • APEX Config: 5+5 symmetric edge gradient across 30 layers
  • Calibration: v1.2 diverse dataset

Run with LocalAI

local-ai run mudler/Gemopus-4-26B-A4B-it-Preview-APEX-GGUF@gemopus-4-26B-A4B-APEX-I-Balanced.gguf

Credits

APEX is brought to you by the LocalAI team. Developed through human-driven, AI-assisted research. Built on llama.cpp.

Downloads last month
5,483
GGUF
Model size
25B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mudler/Gemopus-4-26B-A4B-it-Preview-APEX-GGUF

Quantized
(8)
this model

Collection including mudler/Gemopus-4-26B-A4B-it-Preview-APEX-GGUF