@kostakoff on Hugging Face: "Just for fun, let's run the Alibaba MNN benchmark on a DGX Spark! From time…"

Post

168

Just for fun, let's run the Alibaba MNN benchmark on a DGX Spark!

From time to time, I look for something new or unusual in the AI world, and recently I stumbled upon MNN — a direct competitor to llama.cpp.

I found this project intriguing and set a small goal for myself: to run it on my DGX Spark. I was glad to see that MNN is open-source under the Apache 2.0 license, meaning I was free to fork and modify it.

However, MNN had a few issues out of the box:
- No support for CUDA 13.0
- No support for the Blackwell architecture sm_12
- No built-in support for CUDA benchmarking

I tackled these issues one by one and successfully compiled MNN on the DGX Spark. The benchmark results are currently quite low, but at least it works! Patch file here https://github.com/alibaba/MNN/issues/4289#issuecomment-4093931887

Here is the step-by-step guide on how I built MNN:

mkdir mnn && cd mnn
# Get the code
git clone https://github.com/alibaba/MNN.git
cd MNN

# Reset repo to a specific commit
git reset --hard b1d06d68b3366183d157f0703d7b8a8b61ae55b3

# Apply patch for CUDA 13.0
git apply ../my_changes.patch

mkdir build && cd build
# Configure the project
cmake .. \
  -DMNN_CUDA=ON \
  -DMNN_BUILD_LLM=ON \
  -DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
  -DCMAKE_BUILD_TYPE=Release

# Build libraries and executable binaries
cmake --build . --config Release -j$(nproc)
make -j$(nproc)

How to run the test:
- Download the MNN model: taobao-mnn/Qwen3-30B-A3B-MNN
- Run the benchmark:

./MNN/build/llm_bench -m /path/to/qwen/config.json -a cuda -c 2 -p 512 -n 128 -kv true -rep 3

It works!
Hopefully, the MNN developers will add official CUDA 13 support.

llmlaba

Join the conversation