Sarvam-1-VL-4B-Instruct - GGUF (Quantized)
Model Description
GGUF quantized version for CPU/edge deployment using llama.cpp. Includes Q4_K_M quantization for optimal size/quality balance.
Files
qwen3-vl-4b-instruct.Q4_K_M.gguf- Quantized model (4-bit)qwen3-vl-4b-instruct.BF16-mmproj.gguf- Quantized model (4-bit)
Training Details
- Base Model: Qwen/Qwen3-VL-4B-Instruct
- Quantization: Q4_K_M
- Original Training: 2,000 steps, loss 6.25
Datasets
Trained on 4 datasets covering:
- Translation (40%): BPCC - 22 Indic languages ↔ English
- Instruction Following (20%): Pralekha - 11 language pairs
- Document Layout (30%): IndicDLP - Document understanding
- Visual QA (10%): DocVQA - Question answering
Supported Languages
Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, English
Usage with llama.cpp
# Run inference
llama-mtmd-cli \
-m qwen3-vl-4b-instruct.Q4_K_M.gguf \
--mmproj qwen3-vl-4b-instruct.BF16-mmproj.gguf \
-p "Translate this to Hindi:" \
--image document.jpg
Memory Requirements
- Q4_K_M: ~2.5GB RAM
- With mmproj: ~3GB RAM total
Performance
- Speed: Fast CPU inference
- Quality: Minimal degradation vs fp16
- Deployment: Ideal for edge devices
License
Apache 2.0
- Downloads last month
- 111
Model tree for mashriram/Sarvam-1-VL-4B-Instruct-GGUF
Base model
Qwen/Qwen3-VL-4B-Instruct