taobao-mnn/Qwen3-VL-4B-Instruct-MNN · error when tried usage code

Oct 16, 2025

$ ./llm_demo /media/_models/mnn/Qwen3-VL-4B-Instruct-MNN/config.json prompt.txt
Can't open file:/sys/devices/system/cpu/cpufreq/ondemand/affected_cpus
CPU Group: [ 9 13 1 15 3 17 5 19 7 10 11 12 0 14 2 16 4 18 6 8 ], 1200000 - 2100000
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0
config path is /media/_models/mnn/Qwen3-VL-4B-Instruct-MNN/config.json
170 tensor [ deepstack_embeds ] is input but not found
Create module error
[Error]: Load module failed, please check model.
LLM init error
main, 266, cost time: 461.384003 ms

21world changed discussion title from error when try usage code to error when tried usage code Oct 16, 2025

zhaode

taobao-mnn org Oct 16, 2025

Please wait for the MNN code to be updated, then recompile and test.

21world

Oct 16, 2025

ok waiting...

21world

Oct 16, 2025

•

edited Oct 16, 2025

https://www.modelscope.cn/models/taobao-mnn/Qwen3-VL-4B-Instruct-MNN

404
Sorry, the page you visited does not exist.

$ git clone https://www.modelscope.cn/taobao-mnn/Qwen3-VL-4B-Instruct-MNN
Cloning into 'Qwen3-VL-4B-Instruct-MNN'...
remote: The project you were looking for could not be found or you don't have permission to view it.
fatal: repository 'https://www.modelscope.cn/taobao-mnn/Qwen3-VL-4B-Instruct-MNN/' not found

zhaode

taobao-mnn org Oct 16, 2025

modelscope organization name is MNN

21world

Oct 16, 2025

•

edited Oct 16, 2025

tried other model

https://huggingface.co/taobao-mnn/FastVLM-1.5B-Stage3-MNN

Can't open file:/sys/devices/system/cpu/cpufreq/ondemand/affected_cpus
CPU Group: [ 9 13 1 15 3 17 5 19 7 10 11 12 0 14 2 16 4 18 6 8 ], 1200000 - 2100000
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0
config path is /media/_models/mnn/FastVLM-1.5B-Stage3-MNN/
main, 266, cost time: 11399.670898 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 274, cost time: 1849.022949 ms
prompt file is prompt.txt
Hello! How can I help you today?

#################################
prompt tokens num = 9
decode tokens num = 10
vision time = 0.00 s
pixels_mp = 0.00 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 0.78 s
decode time = 2.22 s
sample time = 0.00 s
prefill speed = 11.51 tok/s
decode speed = 4.50 tok/s
vision speed = 0.000 MP/s
audio RTF = -nan
##################################

21world changed discussion status to closed Oct 16, 2025

21world changed discussion status to open Oct 16, 2025

21world

Oct 16, 2025

prompt.txt is for text prompt but how to proceed with image file?

rakeshchow202

Oct 16, 2025

@zhaode can we have converter colab notebooks please? I want pipelines to convert qwen3 and other models to mnn versions

zhaode

taobao-mnn org Oct 16, 2025

@zhaode can we have converter colab notebooks please? I want pipelines to convert qwen3 and other models to mnn versions

Convert pipeline will be updated with MNN code.

zhaode

taobao-mnn org Oct 16, 2025

MNN Code has been released: https://github.com/alibaba/MNN/commit/9a0546b3b990ce12e9712d4b5e7add6cf73678b0

zhaode changed discussion status to closed Oct 16, 2025

21world

Oct 16, 2025

../build/llm_demo /media/_models/mnn/Qwen3-VL-4B-Instruct-MNN/ prompt.txt
Can't open file:/sys/devices/system/cpu/cpufreq/ondemand/affected_cpus
CPU Group: [ 9 13 1 15 3 17 5 19 7 10 11 12 0 14 2 16 4 18 6 8 ], 1200000 - 2100000
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0
config path is /media/_models/mnn/Qwen3-VL-4B-Instruct-MNN/
main, 266, cost time: 39613.335938 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 274, cost time: 5247.416016 ms
prompt file is prompt.txt
Hello! How can I assist you today? 😊

#################################
prompt tokens num = 14
decode tokens num = 12
vision time = 0.00 s
pixels_mp = 0.00 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 3.12 s
decode time = 8.03 s
sample time = 0.00 s
prefill speed = 4.49 tok/s
decode speed = 1.50 tok/s
vision speed = 0.000 MP/s
audio RTF = -nan
##################################

21world

Oct 16, 2025

now is ok

zhaode

taobao-mnn org Oct 16, 2025

OK! Try vision input can using like

<img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg</img>介绍一下图片里的内容

or

<img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg<hw>256, 256</hw></img>介绍一下图片里的内容

21world

Oct 16, 2025

found it here:
https://github.com/alibaba/MNN/blob/master/transformers/README.md

21world

Oct 16, 2025

•

edited Oct 16, 2025

results are not good compared with llama.cpp using InternVL3_5-30B-A3B gguf

21world

Oct 16, 2025

•

edited Oct 16, 2025

does it work from local links?

here is what it gives i am not sure is it accept the image at all

#################################
prompt tokens num = 17
decode tokens num = 506
vision time = 0.00 s
pixels_mp = 0.00 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 5.92 s
decode time = 358.19 s
sample time = 0.17 s
prefill speed = 2.87 tok/s
decode speed = 1.41 tok/s
vision speed = 0.000 MP/s
audio RTF = -nan
##################################

21world

Oct 16, 2025

with web example gives ''File has been downloaded successfully."

#################################
prompt tokens num = 20
decode tokens num = 153
vision time = 0.00 s
pixels_mp = 0.00 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 6.11 s
decode time = 105.82 s
sample time = 0.05 s
prefill speed = 3.27 tok/s
decode speed = 1.45 tok/s
vision speed = 0.000 MP/s
audio RTF = -nan
##################################

21world

Oct 16, 2025

•

edited Oct 16, 2025

vl not working will try with https://huggingface.co/taobao-mnn/Qwen3-VL-30B-A3B-Instruct-MNN later

21world

Oct 16, 2025

cmake .. -DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DLLM_SUPPORT_VISION=true -DMNN_BUILD_OPENCV=true -DMNN_IMGCODECS=true

-- WIN_USE_ASM:
-- x86_64: Open SSE
-- MNN_AVX512:OFF
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- MNN_IMGCODECS: true
-- Configuring done (1.2s)
-- Generating done (0.5s)
CMake Warning:
Manually-specified variables were not used by the project:

LLM_SUPPORT_VISION

zhaode

taobao-mnn org Oct 16, 2025

Using new MACRO -DMNN_BUILD_LLM_OMNI=ON replace -D LLM_SUPPORT_VISION =ON

zhaode

taobao-mnn org Oct 16, 2025

Local link is alright likes <img>demo.jpeg<hw>280, 280</hw></img>介绍一下这张图片

zhaode

taobao-mnn org Oct 16, 2025

results are not good compared with llama.cpp using InternVL3_5-30B-A3B gguf

What's wrong? Maybe default quant-bit is too low?

zhaode

taobao-mnn org Oct 16, 2025

Now the model's vIsion Encoder using 4bit quant.

21world

Oct 16, 2025

•

edited Oct 16, 2025

after compiling this way:

cmake .. -DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DLLM_SUPPORT_VISION=true -DMNN_BUILD_OPENCV=true -DMNN_IMGCODECS=true

now seems working with web example

#################################
prompt tokens num = 191
decode tokens num = 161
vision time = 12.73 s
pixels_mp = 0.17 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 44.24 s
decode time = 107.00 s
sample time = 0.05 s
prefill speed = 4.32 tok/s
decode speed = 1.50 tok/s
vision speed = 0.014 MP/s
audio RTF = -nan
##################################

zhaode

taobao-mnn org Oct 16, 2025

Great, thank you for your feedback! If you encounter any accuracy issues, please feel free to report them here, and we can adjust the default quantization configuration.

21world

Oct 16, 2025

•

edited Oct 16, 2025

my local example also working thanks good results

zhaode

taobao-mnn org Oct 16, 2025

The initial version had a simpler sampler_type configuration. I've now updated the sampler_type in config.json as per the official suggestions, which you can refer to for your changes.