Please do for aoxo/sarvam-30b-uncensored

by shivshankar - opened Mar 9

Discussion

shivshankar

Mar 9

Please do for aoxo/sarvam-30b-uncensored

mku1988

Mar 10

The PR for gguf support on llama cpp is still open. How to use this?

Sumitc13

Owner Mar 11

The PR for gguf support on llama cpp is still open. How to use this?

The GGUF files are ready to download from this repo. However, since the sarvam_moe architecture isn't merged into official llama.cpp yet, you will need to build from my fork to run them.

git clone https://github.com/sumitchatterjee13/llama.cpp
cd llama.cpp
git checkout add-sarvam-moe
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j$(nproc)
Then download any GGUF from this repo and run:

./build/bin/llama-cli -m sarvam-30B-Q6_K.gguf -p "Your prompt here" -n 512 -ngl 99

Replace -DGGML_CUDA=ON with -DGGML_VULKAN=ON or -DGGML_METAL=ON depending on your GPU. Once the PR is merged, any standard llama.cpp build will work.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment