q8 to q4

#1
by adnanPBI - opened

can you please provide me guide to quantize the deepseek-ocr-q8_0.gguf to deepseek-ocr-q4_0.gguf as I want to run it on my gtx 1050 gpu?

build/bin/llama-quantize gguf_models/deepseek-ai/deepseek-ocr-f16.gguf gguf_models/deepseek-ai/deepseek-ocr-Q4_K_M.gguf Q4_K_M

build/bin/llama-mtmd-cli \
-m gguf_models/deepseek-ai/deepseek-ocr-q4_k_m.gguf \ 
--mmproj gguf_models/deepseek-ai/mmproj-deepseek-ocr-f16.gguf \
--image tmp/mtmd_test_data/Deepseek-OCR-2510.18234v1_page1.png -p "Free OCR."  \                                                  
--chat-template deepseek-ocr --temp 0 -n 8192

Thanks for your reply.
Can you provide me iq2 matrix for the deepseek ocr?

Sign up or log in to comment