Zeta 2.1 — GGUF

GGUF quantizations of zed-industries/zeta-2.1, a code edit prediction (next-edit suggestion) model from Zed Industries.

These files were produced from the original BF16 safetensors using llama.cpp b9085 (convert_hf_to_gguf.pyllama-quantize). No fine-tuning or weight modification beyond format conversion and quantization.

Files

Quant Size Notes
Q4_K_M 4.8 GB Smallest, recommended default for CPU / 8 GB-class GPUs.
Q5_K_M 5.5 GB Quality / size sweet spot.
Q8_0 8.2 GB Near-lossless vs. the original BF16 weights.
f16 16 GB Reference. Useful as the source for further quantization.

Quickstart

Ollama

ollama pull hf.co/adilkairolla/zeta-2.1-GGUF:Q4_K_M

Replace the tag with Q5_K_M, Q8_0, or f16 for a different quant. Zeta is a code-edit-prediction model (not a chat model) — call it via /api/generate with the FIM prompt below, not via /api/chat.

LM Studio

Search for adilkairolla/zeta-2.1-GGUF in the model browser and pick a quant. LM Studio loads it as a base completion model.

llama.cpp

# One-shot completion (correct binary for non-chat models)
./llama-completion -m zeta-2.1-Q4_K_M.gguf -p "$(cat your-prompt.txt)" -n 256 -c 4096

llama-cpp-python

from llama_cpp import Llama
llm = Llama(model_path="zeta-2.1-Q4_K_M.gguf", n_ctx=4096)
out = llm(prompt, max_tokens=256, stop=["<|marker_2|>"], echo=False)
print(out["choices"][0]["text"])

Prompt format

Zeta uses a Suffix-Prefix-Middle (SPM) FIM format with numbered region markers. Quoting the upstream model card:

<[fim-suffix]>
code after editable region
<[fim-prefix]><filename>related/file.py
related file content

<filename>edit_history
--- a/some_file.py
+++ b/some_file.py
-old
+new

<filename>path/to/target_file.py
code before editable region
<|marker_1|>
code that
needs to<|user_cursor|>
be rewritten
<|marker_2|>
<[fim-middle]>

Expected output:

<|marker_1|>
revised content for
the editable region
<|marker_2|>

See the upstream sample.prompt and sample.output for a real example.

Source & lineage

License

Released under the Apache License 2.0, inherited from the upstream model. The only modification relative to upstream is the conversion to GGUF and quantization to the formats listed above.

All credit for the model itself goes to Zed Industries and ByteDance-Seed. This repo is an unaffiliated quantization mirror.

Downloads last month
871
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for adilkairolla/zeta-2.1-GGUF

Quantized
(10)
this model

Space using adilkairolla/zeta-2.1-GGUF 1