Sarvam-30B GGUF (Quantized)

This repository provides the Q4_K_M quantization of sarvamai/sarvam-30b, an advanced Mixture-of-Experts (MoE) model.

By using the GGUF format and 4-bit quantization, the VRAM requirement is reduced significantly, making it runnable on consumer hardware like RTX 3090, or RTX 4090.

Quantization Details

  • Method: Q4_K_M (Medium, K-Quants)
  • Original Size: ~120 GB
  • Quantized Size: ~19 GB
  • Architecture: Sarvam MoE

License

This model is a quantized version of sarvamai/sarvam-30b. Both the original model and these weights are released under the Apache License 2.0.

Downloads last month
19
GGUF
Model size
32B params
Architecture
sarvam_moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ThatCultivator/sarvam-30b-gguf

Quantized
(13)
this model