Please do for aoxo/sarvam-30b-uncensored

#2
by shivshankar - opened

Please do for aoxo/sarvam-30b-uncensored

The PR for gguf support on llama cpp is still open. How to use this?

The PR for gguf support on llama cpp is still open. How to use this?

The GGUF files are ready to download from this repo. However, since the sarvam_moe architecture isn't merged into official llama.cpp yet, you will need to build from my fork to run them.

git clone https://github.com/sumitchatterjee13/llama.cpp
cd llama.cpp
git checkout add-sarvam-moe
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j$(nproc)
Then download any GGUF from this repo and run:

./build/bin/llama-cli -m sarvam-30B-Q6_K.gguf -p "Your prompt here" -n 512 -ngl 99

Replace -DGGML_CUDA=ON with -DGGML_VULKAN=ON or -DGGML_METAL=ON depending on your GPU. Once the PR is merged, any standard llama.cpp build will work.

Sign up or log in to comment