where is the mmproj ? or this one does not support vision
As the title says, the repo is missing the mmproj files thus can't enable vision. Does 3.6 not support vision like its predecessor?
Hello guys, I'm using llama.cpp serving this model, however I get segmentation fault when I'm trying to process a image, what should I do? Thank you!
srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000
srv update: - cache state: 0 prompts, 0.000 MiB (limits: 8192.000 MiB, 131072 tokens, 8589934592 est)
srv get_availabl: prompt cache update took 0.01 ms
slot launch_slot_: id 3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 3 | task 0 | processing task, is_child = 0
slot update_slots: id 3 | task 0 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 574
slot update_slots: id 3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 3 | task 0 | prompt processing progress, n_tokens = 6, batch.n_tokens = 6, progress = 0.010453
srv log_server_r: done request: POST /v1/chat/completions 200
slot update_slots: id 3 | task 0 | n_tokens = 6, memory_seq_rm [6, end)
srv process_chun: processing image...
encoding image slice...
Segmentation fault (core dumped)
Kindly ask, How much VRAM should be used for running Q4 as multimodal model ? Is it enough for 5060 TI 16GB VRAM ?
Kindly ask, How much VRAM should be used for running Q4 as multimodal model ? Is it enough for 5060 TI 16GB VRAM ?
For full inference (GPU Only): No. For partial inference (GPU/CPU split): Yes.
What will happen:
It will split with CPU but if the VRAM is empty you should get >50% GPU. I have the same card and was testing this model just now... Hope this helps!
