error when tried usage code
$ ./llm_demo /media/_models/mnn/Qwen3-VL-4B-Instruct-MNN/config.json prompt.txt
Can't open file:/sys/devices/system/cpu/cpufreq/ondemand/affected_cpus
CPU Group: [ 9 13 1 15 3 17 5 19 7 10 11 12 0 14 2 16 4 18 6 8 ], 1200000 - 2100000
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0
config path is /media/_models/mnn/Qwen3-VL-4B-Instruct-MNN/config.json
170 tensor [ deepstack_embeds ] is input but not found
Create module error
[Error]: Load module failed, please check model.
LLM init error
main, 266, cost time: 461.384003 ms
Please wait for the MNN code to be updated, then recompile and test.
ok waiting...
https://www.modelscope.cn/models/taobao-mnn/Qwen3-VL-4B-Instruct-MNN
404
Sorry, the page you visited does not exist.
$ git clone https://www.modelscope.cn/taobao-mnn/Qwen3-VL-4B-Instruct-MNN
Cloning into 'Qwen3-VL-4B-Instruct-MNN'...
remote: The project you were looking for could not be found or you don't have permission to view it.
fatal: repository 'https://www.modelscope.cn/taobao-mnn/Qwen3-VL-4B-Instruct-MNN/' not found
modelscope organization name is MNN
tried other model
https://huggingface.co/taobao-mnn/FastVLM-1.5B-Stage3-MNN
Can't open file:/sys/devices/system/cpu/cpufreq/ondemand/affected_cpus
CPU Group: [ 9 13 1 15 3 17 5 19 7 10 11 12 0 14 2 16 4 18 6 8 ], 1200000 - 2100000
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0
config path is /media/_models/mnn/FastVLM-1.5B-Stage3-MNN/
main, 266, cost time: 11399.670898 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 274, cost time: 1849.022949 ms
prompt file is prompt.txt
Hello! How can I help you today?
#################################
prompt tokens num = 9
decode tokens num = 10
vision time = 0.00 s
pixels_mp = 0.00 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 0.78 s
decode time = 2.22 s
sample time = 0.00 s
prefill speed = 11.51 tok/s
decode speed = 4.50 tok/s
vision speed = 0.000 MP/s
audio RTF = -nan
##################################
prompt.txt is for text prompt but how to proceed with image file?
@zhaode can we have converter colab notebooks please? I want pipelines to convert qwen3 and other models to mnn versions
MNN Code has been released: https://github.com/alibaba/MNN/commit/9a0546b3b990ce12e9712d4b5e7add6cf73678b0
../build/llm_demo /media/_models/mnn/Qwen3-VL-4B-Instruct-MNN/ prompt.txt
Can't open file:/sys/devices/system/cpu/cpufreq/ondemand/affected_cpus
CPU Group: [ 9 13 1 15 3 17 5 19 7 10 11 12 0 14 2 16 4 18 6 8 ], 1200000 - 2100000
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0
config path is /media/_models/mnn/Qwen3-VL-4B-Instruct-MNN/
main, 266, cost time: 39613.335938 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 274, cost time: 5247.416016 ms
prompt file is prompt.txt
Hello! How can I assist you today? 😊
#################################
prompt tokens num = 14
decode tokens num = 12
vision time = 0.00 s
pixels_mp = 0.00 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 3.12 s
decode time = 8.03 s
sample time = 0.00 s
prefill speed = 4.49 tok/s
decode speed = 1.50 tok/s
vision speed = 0.000 MP/s
audio RTF = -nan
##################################
now is ok
OK! Try vision input can using like
<img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg</img>介绍一下图片里的内容
or
<img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg<hw>256, 256</hw></img>介绍一下图片里的内容
results are not good compared with llama.cpp using InternVL3_5-30B-A3B gguf
does it work from local links?
here is what it gives i am not sure is it accept the image at all
#################################
prompt tokens num = 17
decode tokens num = 506
vision time = 0.00 s
pixels_mp = 0.00 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 5.92 s
decode time = 358.19 s
sample time = 0.17 s
prefill speed = 2.87 tok/s
decode speed = 1.41 tok/s
vision speed = 0.000 MP/s
audio RTF = -nan
##################################
with web example gives ''File has been downloaded successfully."
#################################
prompt tokens num = 20
decode tokens num = 153
vision time = 0.00 s
pixels_mp = 0.00 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 6.11 s
decode time = 105.82 s
sample time = 0.05 s
prefill speed = 3.27 tok/s
decode speed = 1.45 tok/s
vision speed = 0.000 MP/s
audio RTF = -nan
##################################
cmake .. -DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DLLM_SUPPORT_VISION=true -DMNN_BUILD_OPENCV=true -DMNN_IMGCODECS=true
-- WIN_USE_ASM:
-- x86_64: Open SSE
-- MNN_AVX512:OFF
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- MNN_IMGCODECS: true
-- Configuring done (1.2s)
-- Generating done (0.5s)
CMake Warning:
Manually-specified variables were not used by the project:
LLM_SUPPORT_VISION
Using new MACRO -DMNN_BUILD_LLM_OMNI=ON replace -D LLM_SUPPORT_VISION =ON
Local link is alright likes <img>demo.jpeg<hw>280, 280</hw></img>介绍一下这张图片
results are not good compared with llama.cpp using InternVL3_5-30B-A3B gguf
What's wrong? Maybe default quant-bit is too low?
Now the model's vIsion Encoder using 4bit quant.
after compiling this way:
cmake .. -DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DLLM_SUPPORT_VISION=true -DMNN_BUILD_OPENCV=true -DMNN_IMGCODECS=true
now seems working with web example
#################################
prompt tokens num = 191
decode tokens num = 161
vision time = 12.73 s
pixels_mp = 0.17 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 44.24 s
decode time = 107.00 s
sample time = 0.05 s
prefill speed = 4.32 tok/s
decode speed = 1.50 tok/s
vision speed = 0.014 MP/s
audio RTF = -nan
##################################
Great, thank you for your feedback! If you encounter any accuracy issues, please feel free to report them here, and we can adjust the default quantization configuration.
my local example also working thanks good results
The initial version had a simpler sampler_type configuration. I've now updated the sampler_type in config.json as per the official suggestions, which you can refer to for your changes.