How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="JusteLeo/ZAYA1-8B-GGUF",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

ZAYA1-8B GGUF

This repository contains the GGUF quantized formats of the Zyphra/ZAYA1-8B model.

⚠️ Important Note - Experimental Branch: Currently, running this model requires a specific, experimental branch of llama.cpp associated with PR #23112. Please note that because this implementation is still experimental. There may still be bugs, speed issues, or performance issues.

How to use

To run this model, you need to clone a custom fork of llama.cpp and checkout the Zaya1 branch. Follow the steps below:

1. Clone and Build the Custom llama.cpp

# Clone the specific repository
git clone https://github.com/Juste-Leo2/llama.cpp.git
cd llama.cpp

# Checkout the experimental branch
git checkout Zaya1

# Build the project (Example with CUDA enabled)
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

2. Download the Model

Download your preferred quantized model file (e.g., ZAYA1-8B-Q4_K_M.gguf) from this Hugging Face repository and place it in the root directory of the llama.cpp project you just cloned.

3. Run the Model

Once compiled and the model is downloaded, you can run inference using the following command:

./build/bin/llama-cli -m ZAYA1-8B-Q4_K_M.gguf

License

This model is released under the Apache 2.0 License.

Downloads last month
2,265
GGUF
Model size
9B params
Architecture
zaya
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JusteLeo/ZAYA1-8B-GGUF

Finetuned
Zyphra/ZAYA1-8B
Quantized
(14)
this model