Sarvam-1-VL-4B-Instruct - GGUF (Quantized)

Model Description

GGUF quantized version for CPU/edge deployment using llama.cpp. Includes Q4_K_M quantization for optimal size/quality balance.

Files

  • qwen3-vl-4b-instruct.Q4_K_M.gguf - Quantized model (4-bit)
  • qwen3-vl-4b-instruct.BF16-mmproj.gguf - Quantized model (4-bit)

Training Details

  • Base Model: Qwen/Qwen3-VL-4B-Instruct
  • Quantization: Q4_K_M
  • Original Training: 2,000 steps, loss 6.25

Datasets

Trained on 4 datasets covering:

  • Translation (40%): BPCC - 22 Indic languages ↔ English
  • Instruction Following (20%): Pralekha - 11 language pairs
  • Document Layout (30%): IndicDLP - Document understanding
  • Visual QA (10%): DocVQA - Question answering

Supported Languages

Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, English

Usage with llama.cpp

# Run inference
llama-mtmd-cli \
  -m qwen3-vl-4b-instruct.Q4_K_M.gguf \
  --mmproj qwen3-vl-4b-instruct.BF16-mmproj.gguf \
  -p "Translate this to Hindi:" \
  --image document.jpg

Memory Requirements

  • Q4_K_M: ~2.5GB RAM
  • With mmproj: ~3GB RAM total

Performance

  • Speed: Fast CPU inference
  • Quality: Minimal degradation vs fp16
  • Deployment: Ideal for edge devices

License

Apache 2.0

Downloads last month
111
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for mashriram/Sarvam-1-VL-4B-Instruct-GGUF

Quantized
(65)
this model

Datasets used to train mashriram/Sarvam-1-VL-4B-Instruct-GGUF