ZAYA1-8B GGUF Quantizations

This repository contains GGUF quantizations of Zyphra/ZAYA1-8B.

⚠️ CRITICAL USAGE NOTE: The zaya architecture (which utilizes a unique Compressed Convolutional Attention mechanism and an MLP router) is currently undergoing experimental integration into llama.cpp. These quantizations were generated using the bleeding-edge Draft PR #23112. To protect the model's complex reasoning and routing logic, the highly sensitive cca_conv_grp layers were explicitly excluded from quantization and remain in higher precision.

To run these models, you must compile llama.cpp locally from that specific PR branch until official support is merged into the master branch.

Model Details

ZAYA1-8B is a frontier-level reasoning Mixture-of-Experts (MoE) model designed for high intelligence density and local deployment.

  • Total Parameters: ~8.4B
  • Active Parameters (per token): ~760M
  • Architecture: zaya (Sparse MoE)
  • License: Apache-2.0
  • Creator: Zyphra

Available Quants

Format File Size Description
Q3_K_M 4.51 GB Smallest viable quant. Heavy compression, potential logic degradation.
Q4_K_S 5.26 GB Very small. High compression, suitable for strict VRAM limits.
Q4_K_M 5.57 GB Recommended. The sweet spot for balancing VRAM usage and reasoning capabilities.
Q5_K_M 6.43 GB High precision. Great if you have slightly more RAM/VRAM to spare.
Q6_K 7.35 GB Very high precision. Near uncompressed performance.
Q8_0 9.49 GB Maximum precision quant. Negligible intelligence loss.

How to Run (Experimental)

Since standard llama.cpp releases do not yet recognize the zaya metadata keys, you must fetch the working pull request to run these files:

# Clone the main repository
git clone [https://github.com/ggerganov/llama.cpp.git](https://github.com/ggerganov/llama.cpp.git)
cd llama.cpp

# Fetch the specific Draft PR that contains working ZAYA inference
git fetch origin pull/23112/head:zaya-working-pr
git checkout zaya-working-pr

# Build with CUDA support (Recommended)
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j 8

# Run the model
./build/bin/llama-cli -m /path/to/ZAYA1-8B-Q4_K_M.gguf -p "Your complex reasoning prompt here" -ngl 99 -c 4096 -n 512
Downloads last month
6,513
GGUF
Model size
9B params
Architecture
zaya
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Abiray/ZAYA1-8B-GGUF

Finetuned
Zyphra/ZAYA1-8B
Quantized
(14)
this model

Space using Abiray/ZAYA1-8B-GGUF 1