Zen Reranker 0.6b Gguf

GGUF quantized compact Zen Reranker for efficient CPU-based re-scoring.

Overview

GGUF quantization for efficient CPU and mixed CPU/GPU inference using llama.cpp and compatible runtimes.

Developed by Hanzo AI and the Zoo Labs Foundation.

Quick Start

# Download and run with llama.cpp
./llama-cli -m zen-reranker-0.6B.Q4_K_M.gguf -p "Hello, how can I help you?" -n 512

# With llama-cpp-python
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="zenlm/zen-reranker-0.6B-GGUF",
    filename="*Q4_K_M.gguf",
)
output = llm("Hello!", max_tokens=512)
print(output["choices"][0]["text"])

Model Details

Attribute	Value
Parameters	0.6B
Format	GGUF (quantized)
Context	8K tokens
License	Apache 2.0

License

Apache 2.0

Downloads last month: 6

GGUF

Model size

0.8B params

Architecture

qwen3

Hardware compatibility

8-bit