Please do for aoxo/sarvam-30b-uncensored
Please do for aoxo/sarvam-30b-uncensored
The PR for gguf support on llama cpp is still open. How to use this?
The PR for gguf support on llama cpp is still open. How to use this?
The GGUF files are ready to download from this repo. However, since the sarvam_moe architecture isn't merged into official llama.cpp yet, you will need to build from my fork to run them.
git clone https://github.com/sumitchatterjee13/llama.cpp
cd llama.cpp
git checkout add-sarvam-moe
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j$(nproc)
Then download any GGUF from this repo and run:
./build/bin/llama-cli -m sarvam-30B-Q6_K.gguf -p "Your prompt here" -n 512 -ngl 99
Replace -DGGML_CUDA=ON with -DGGML_VULKAN=ON or -DGGML_METAL=ON depending on your GPU. Once the PR is merged, any standard llama.cpp build will work.