Zen Reranker 0.6b Gguf

GGUF quantized compact Zen Reranker for efficient CPU-based re-scoring.

Overview

GGUF quantization for efficient CPU and mixed CPU/GPU inference using llama.cpp and compatible runtimes.

Developed by Hanzo AI and the Zoo Labs Foundation.

Quick Start

# Download and run with llama.cpp
./llama-cli -m zen-reranker-0.6B.Q4_K_M.gguf -p "Hello, how can I help you?" -n 512
# With llama-cpp-python
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="zenlm/zen-reranker-0.6B-GGUF",
    filename="*Q4_K_M.gguf",
)
output = llm("Hello!", max_tokens=512)
print(output["choices"][0]["text"])

Model Details

Attribute Value
Parameters 0.6B
Format GGUF (quantized)
Context 8K tokens
License Apache 2.0

License

Apache 2.0

Downloads last month
6
GGUF
Model size
0.8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support