Sarvam-1-v0.5 GGUF

GGUF quantized versions of sarvamai/sarvam-1-v0.5 for local inference with llama.cpp, Ollama, LM Studio, and GPT4All.

Sarvam-1 is an Indian multilingual LLM built by Sarvam AI — supporting 22 Indian languages including Hindi, Bengali, Tamil, Telugu, Kannada, Malayalam, Marathi, Punjabi, Gujarati, and Odia. Based on Llama architecture with 3.1B parameters.

Available Quantizations

File Quant Size RAM Needed Use Case
sarvam-1-v0.5-Q8_0.gguf Q8_0 2.5 GB ~4 GB Best quality, near-lossless
sarvam-1-v0.5-f16.gguf F16 4.7 GB ~6 GB Full precision, maximum quality

How to Use

With llama.cpp

./llama-cli -m sarvam-1-v0.5-Q8_0.gguf -p "भारत की राजधानी क्या है?" -n 256

With Ollama

# Create a Modelfile
echo 'FROM ./sarvam-1-v0.5-Q8_0.gguf' > Modelfile
ollama create sarvam -f Modelfile
ollama run sarvam

With LM Studio

  1. Download the Q8_0 file
  2. Open LM Studio → Load Model → Select the file
  3. Start chatting in English or any supported Indian language

Model Details

  • Architecture: Llama
  • Parameters: 3.1B
  • Hidden Size: 2048
  • Layers: 28
  • Attention Heads: 16
  • Context Length: Check original model card
  • Languages: English + 22 Indian languages (Hindi, Bengali, Tamil, Telugu, Kannada, Malayalam, Marathi, Punjabi, Gujarati, Odia, and more)
  • License: Apache 2.0

Original Model

Built by Sarvam AI — India's leading AI research company. See the original model at sarvamai/sarvam-1-v0.5.

Quantized by

Shaswata Tripathy | GitHub | Medium | LinkedIn | Hugging Face

Downloads last month
80
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tripathyShaswata/sarvam-1-v0.5-GGUF

Quantized
(7)
this model