Sarvam-30B GGUF (Quantized)
This repository provides the Q4_K_M quantization of sarvamai/sarvam-30b, an advanced Mixture-of-Experts (MoE) model.
By using the GGUF format and 4-bit quantization, the VRAM requirement is reduced significantly, making it runnable on consumer hardware like RTX 3090, or RTX 4090.
Quantization Details
- Method: Q4_K_M (Medium, K-Quants)
- Original Size: ~120 GB
- Quantized Size: ~19 GB
- Architecture: Sarvam MoE
License
This model is a quantized version of sarvamai/sarvam-30b. Both the original model and these weights are released under the Apache License 2.0.
- Downloads last month
- 19
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for ThatCultivator/sarvam-30b-gguf
Base model
sarvamai/sarvam-30b