error when tried usage code

#1
by 21world - opened

$ ./llm_demo /media/_models/mnn/Qwen3-VL-4B-Instruct-MNN/config.json prompt.txt
Can't open file:/sys/devices/system/cpu/cpufreq/ondemand/affected_cpus
CPU Group: [ 9 13 1 15 3 17 5 19 7 10 11 12 0 14 2 16 4 18 6 8 ], 1200000 - 2100000
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0
config path is /media/_models/mnn/Qwen3-VL-4B-Instruct-MNN/config.json
170 tensor [ deepstack_embeds ] is input but not found
Create module error
[Error]: Load module failed, please check model.
LLM init error
main, 266, cost time: 461.384003 ms

21world changed discussion title from error when try usage code to error when tried usage code
taobao-mnn org

Please wait for the MNN code to be updated, then recompile and test.

ok waiting...

https://www.modelscope.cn/models/taobao-mnn/Qwen3-VL-4B-Instruct-MNN

404
Sorry, the page you visited does not exist.

$ git clone https://www.modelscope.cn/taobao-mnn/Qwen3-VL-4B-Instruct-MNN
Cloning into 'Qwen3-VL-4B-Instruct-MNN'...
remote: The project you were looking for could not be found or you don't have permission to view it.
fatal: repository 'https://www.modelscope.cn/taobao-mnn/Qwen3-VL-4B-Instruct-MNN/' not found

taobao-mnn org

modelscope organization name is MNN

tried other model

https://huggingface.co/taobao-mnn/FastVLM-1.5B-Stage3-MNN

Can't open file:/sys/devices/system/cpu/cpufreq/ondemand/affected_cpus
CPU Group: [ 9 13 1 15 3 17 5 19 7 10 11 12 0 14 2 16 4 18 6 8 ], 1200000 - 2100000
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0
config path is /media/_models/mnn/FastVLM-1.5B-Stage3-MNN/
main, 266, cost time: 11399.670898 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 274, cost time: 1849.022949 ms
prompt file is prompt.txt
Hello! How can I help you today?

#################################
prompt tokens num = 9
decode tokens num = 10
vision time = 0.00 s
pixels_mp = 0.00 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 0.78 s
decode time = 2.22 s
sample time = 0.00 s
prefill speed = 11.51 tok/s
decode speed = 4.50 tok/s
vision speed = 0.000 MP/s
audio RTF = -nan
##################################

21world changed discussion status to closed
21world changed discussion status to open

prompt.txt is for text prompt but how to proceed with image file?

@zhaode can we have converter colab notebooks please? I want pipelines to convert qwen3 and other models to mnn versions

taobao-mnn org

@zhaode can we have converter colab notebooks please? I want pipelines to convert qwen3 and other models to mnn versions

Convert pipeline will be updated with MNN code.

taobao-mnn org
zhaode changed discussion status to closed

../build/llm_demo /media/_models/mnn/Qwen3-VL-4B-Instruct-MNN/ prompt.txt
Can't open file:/sys/devices/system/cpu/cpufreq/ondemand/affected_cpus
CPU Group: [ 9 13 1 15 3 17 5 19 7 10 11 12 0 14 2 16 4 18 6 8 ], 1200000 - 2100000
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0
config path is /media/_models/mnn/Qwen3-VL-4B-Instruct-MNN/
main, 266, cost time: 39613.335938 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 274, cost time: 5247.416016 ms
prompt file is prompt.txt
Hello! How can I assist you today? 😊

#################################
prompt tokens num = 14
decode tokens num = 12
vision time = 0.00 s
pixels_mp = 0.00 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 3.12 s
decode time = 8.03 s
sample time = 0.00 s
prefill speed = 4.49 tok/s
decode speed = 1.50 tok/s
vision speed = 0.000 MP/s
audio RTF = -nan
##################################

now is ok

taobao-mnn org

OK! Try vision input can using like

<img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg</img>介绍一下图片里的内容

or

<img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg<hw>256, 256</hw></img>介绍一下图片里的内容

results are not good compared with llama.cpp using InternVL3_5-30B-A3B gguf

does it work from local links?

here is what it gives i am not sure is it accept the image at all


#################################
prompt tokens num = 17
decode tokens num = 506
vision time = 0.00 s
pixels_mp = 0.00 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 5.92 s
decode time = 358.19 s
sample time = 0.17 s
prefill speed = 2.87 tok/s
decode speed = 1.41 tok/s
vision speed = 0.000 MP/s
audio RTF = -nan
##################################

with web example gives ''File has been downloaded successfully."

#################################
prompt tokens num = 20
decode tokens num = 153
vision time = 0.00 s
pixels_mp = 0.00 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 6.11 s
decode time = 105.82 s
sample time = 0.05 s
prefill speed = 3.27 tok/s
decode speed = 1.45 tok/s
vision speed = 0.000 MP/s
audio RTF = -nan
##################################

cmake .. -DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DLLM_SUPPORT_VISION=true -DMNN_BUILD_OPENCV=true -DMNN_IMGCODECS=true

-- WIN_USE_ASM:
-- x86_64: Open SSE
-- MNN_AVX512:OFF
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- MNN_IMGCODECS: true
-- Configuring done (1.2s)
-- Generating done (0.5s)
CMake Warning:
Manually-specified variables were not used by the project:

LLM_SUPPORT_VISION
taobao-mnn org

Using new MACRO -DMNN_BUILD_LLM_OMNI=ON replace -D LLM_SUPPORT_VISION =ON

taobao-mnn org

Local link is alright likes <img>demo.jpeg<hw>280, 280</hw></img>介绍一下这张图片

taobao-mnn org

results are not good compared with llama.cpp using InternVL3_5-30B-A3B gguf

What's wrong? Maybe default quant-bit is too low?

taobao-mnn org

Now the model's vIsion Encoder using 4bit quant.

after compiling this way:

cmake .. -DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DLLM_SUPPORT_VISION=true -DMNN_BUILD_OPENCV=true -DMNN_IMGCODECS=true

now seems working with web example

#################################
prompt tokens num = 191
decode tokens num = 161
vision time = 12.73 s
pixels_mp = 0.17 MP
audio process time = 0.00 s
audio input time = 0.00 s
prefill time = 44.24 s
decode time = 107.00 s
sample time = 0.05 s
prefill speed = 4.32 tok/s
decode speed = 1.50 tok/s
vision speed = 0.014 MP/s
audio RTF = -nan
##################################

taobao-mnn org

Great, thank you for your feedback! If you encounter any accuracy issues, please feel free to report them here, and we can adjust the default quantization configuration.

my local example also working thanks good results

taobao-mnn org

The initial version had a simpler sampler_type configuration. I've now updated the sampler_type in config.json as per the official suggestions, which you can refer to for your changes.

Sign up or log in to comment