Sarvam-30B GGUF (Quantized)

This repository provides the Q4_K_M quantization of sarvamai/sarvam-30b, an advanced Mixture-of-Experts (MoE) model.

By using the GGUF format and 4-bit quantization, the VRAM requirement is reduced significantly, making it runnable on consumer hardware like RTX 3090, or RTX 4090.

Quantization Details

Method: Q4_K_M (Medium, K-Quants)
Original Size: ~120 GB
Quantized Size: ~19 GB
Architecture: Sarvam MoE

License

This model is a quantized version of sarvamai/sarvam-30b. Both the original model and these weights are released under the Apache License 2.0.

Downloads last month: 19

GGUF

Model size

32B params

Architecture

sarvam_moe

Hardware compatibility

4-bit

Model tree for ThatCultivator/sarvam-30b-gguf

Base model

sarvamai/sarvam-30b

Quantized

(13)

this model