where is the mmproj ? or this one does not support vision

by pt-ml - opened 6 days ago

As the title says, the repo is missing the mmproj files thus can't enable vision. Does 3.6 not support vision like its predecessor?

danielhanchen

Unsloth AI org 6 days ago

Hello it's in there! https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF/blob/main/mmproj-BF16.gguf

UnSubsNoMore

6 days ago

Noticed the mmproj, thank you! Does that also take on the role of the video pre-processor that they list in their model card? Not sure if these are separate or one in the same.

Vincentzyx

6 days ago

Hello guys, I'm using llama.cpp serving this model, however I get segmentation fault when I'm trying to process a image, what should I do? Thank you!

srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000
srv update: - cache state: 0 prompts, 0.000 MiB (limits: 8192.000 MiB, 131072 tokens, 8589934592 est)
srv get_availabl: prompt cache update took 0.01 ms
slot launch_slot_: id 3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 3 | task 0 | processing task, is_child = 0
slot update_slots: id 3 | task 0 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 574
slot update_slots: id 3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 3 | task 0 | prompt processing progress, n_tokens = 6, batch.n_tokens = 6, progress = 0.010453
srv log_server_r: done request: POST /v1/chat/completions 200
slot update_slots: id 3 | task 0 | n_tokens = 6, memory_seq_rm [6, end)
srv process_chun: processing image...
encoding image slice...
Segmentation fault (core dumped)

Mapraw

5 days ago

Kindly ask, How much VRAM should be used for running Q4 as multimodal model ? Is it enough for 5060 TI 16GB VRAM ?

Blinkybtw

1 day ago

•

edited 1 day ago

Kindly ask, How much VRAM should be used for running Q4 as multimodal model ? Is it enough for 5060 TI 16GB VRAM ?

For full inference (GPU Only): No. For partial inference (GPU/CPU split): Yes.
What will happen:
It will split with CPU but if the VRAM is empty you should get >50% GPU. I have the same card and was testing this model just now... Hope this helps!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment