Sarvam-1-v0.5 GGUF

GGUF quantized versions of sarvamai/sarvam-1-v0.5 for local inference with llama.cpp, Ollama, LM Studio, and GPT4All.

Sarvam-1 is an Indian multilingual LLM built by Sarvam AI — supporting 22 Indian languages including Hindi, Bengali, Tamil, Telugu, Kannada, Malayalam, Marathi, Punjabi, Gujarati, and Odia. Based on Llama architecture with 3.1B parameters.

Available Quantizations

File	Quant	Size	RAM Needed	Use Case
`sarvam-1-v0.5-Q8_0.gguf`	Q8_0	2.5 GB	~4 GB	Best quality, near-lossless
`sarvam-1-v0.5-f16.gguf`	F16	4.7 GB	~6 GB	Full precision, maximum quality

How to Use

With llama.cpp

./llama-cli -m sarvam-1-v0.5-Q8_0.gguf -p "भारत की राजधानी क्या है?" -n 256

With Ollama

# Create a Modelfile
echo 'FROM ./sarvam-1-v0.5-Q8_0.gguf' > Modelfile
ollama create sarvam -f Modelfile
ollama run sarvam

With LM Studio

Download the Q8_0 file
Open LM Studio → Load Model → Select the file
Start chatting in English or any supported Indian language

Model Details

Architecture: Llama
Parameters: 3.1B
Hidden Size: 2048
Layers: 28
Attention Heads: 16
Context Length: Check original model card
Languages: English + 22 Indian languages (Hindi, Bengali, Tamil, Telugu, Kannada, Malayalam, Marathi, Punjabi, Gujarati, Odia, and more)
License: Apache 2.0

Original Model

Built by Sarvam AI — India's leading AI research company. See the original model at sarvamai/sarvam-1-v0.5.

Quantized by

Shaswata Tripathy | GitHub | Medium | LinkedIn | Hugging Face

Downloads last month: 80

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

8-bit

16-bit

Model tree for tripathyShaswata/sarvam-1-v0.5-GGUF

Base model

sarvamai/sarvam-1-v0.5

Quantized

(7)

this model