APEX Quants (GGUF)
Collection
MoE models quantized with the APEX Quantization technique ( https://github.com/mudler/apex-quant ) • 25 items • Updated • 56
APEX (Adaptive Precision for EXpert Models) quantizations of Jackrong/Gemopus-4-26B-A4B-it-Preview.
Brought to you by the LocalAI team | APEX Project
| File | Profile | Size | Best For |
|---|---|---|---|
| gemopus-4-26B-A4B-APEX-I-Quality.gguf | I-Quality | 20 GB | Highest quality with imatrix |
| gemopus-4-26B-A4B-APEX-Quality.gguf | Quality | 20 GB | Highest quality standard |
| gemopus-4-26B-A4B-APEX-I-Balanced.gguf | I-Balanced | 19 GB | Best overall quality/size ratio |
| gemopus-4-26B-A4B-APEX-Balanced.gguf | Balanced | 19 GB | General purpose |
| gemopus-4-26B-A4B-APEX-I-Compact.gguf | I-Compact | 15 GB | Consumer GPUs, best quality/size |
| gemopus-4-26B-A4B-APEX-Compact.gguf | Compact | 15 GB | Consumer GPUs |
| gemopus-4-26B-A4B-APEX-I-Mini.gguf | I-Mini | 13 GB | Smallest viable, fastest inference |
| gemopus-4-26B-A4B-F16.gguf | F16 | 48 GB | Full precision reference |
| Model | Size | PPL | KL mean | HellaSwag | Winogrande | MMLU | ARC | TruthfulQA | pp512 t/s | tg128 t/s |
|---|---|---|---|---|---|---|---|---|---|---|
| APEX-I-Quality | 19G | 1223.5 | 0.532 | 50.5 | 59.2 | 32.1 | 35.1 | 31.0 | 5632 | 145.9 |
| APEX-Quality | 19G | 1203.1 | 0.579 | 49.0 | 58.5 | 33.7 | 36.8 | 29.3 | 5623 | 143.5 |
| APEX-I-Balanced | 18G | 1216.4 | 0.600 | 50.0 | 57.2 | 32.6 | 33.4 | 29.9 | 6211 | 149.4 |
| APEX-Balanced | 18G | 1117.9 | 0.702 | 47.8 | 57.2 | 33.6 | 34.1 | 31.1 | 6221 | 145.7 |
| APEX-I-Compact | 14G | 1258.5 | 0.943 | 49.0 | 59.0 | 32.6 | 34.1 | 30.1 | 6612 | 146.7 |
| APEX-Compact | 14G | 782.1 | 1.617 | 48.8 | 58.2 | 33.5 | 34.4 | 30.0 | 6517 | 142.2 |
| APEX-I-Mini | 12G | 1915.3 | 1.907 | 52.0 | 58.2 | 34.4 | 33.4 | 30.8 | 5904 | 146.8 |
| F16 (ref) | 48G | 1215.9 | - | - | - | - | - | - | 2718 | 97.9 |
APEX is a quantization strategy for Mixture-of-Experts (MoE) models. It classifies tensors by role (routed expert, shared expert, attention) and applies a layer-wise precision gradient -- edge layers get higher precision, middle layers get more aggressive compression. I-variants use diverse imatrix calibration (chat, code, reasoning, tool-calling, agentic traces, Wikipedia).
See the APEX project for full details.
local-ai run mudler/Gemopus-4-26B-A4B-it-Preview-APEX-GGUF@gemopus-4-26B-A4B-APEX-I-Balanced.gguf
APEX is brought to you by the LocalAI team. Developed through human-driven, AI-assisted research. Built on llama.cpp.
16-bit
Base model
Jackrong/Gemopus-4-26B-A4B-it