vLLM Sonic Extension

This repository contains the Sonic compiler integration for vLLM (CPU) using the Sonic MLIR compiler.

Prerequisites

Python 3.8+
torch-mlir
LLVM
Sonic MLIR
Access to the Sonic frontend repository ($AGICL_DIR/sonic-frontend)

Installation

1. Set up Python Virtual Environment

cd <your-workspace-directory>
python3 -m venv vllm
source ./vllm/bin/activate

2. Install vLLM (CPU Build)

cd <parent-directory-for-repositories>
git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout 134f70b3eddf05f01f55ecee9c2a14ec0732e8b6
pip install setuptools_scm
pip install -r requirements/cpu.txt

Set environment variables for CPU-only build:

export VLLM_TARGET_DEVICE=cpu
export CUDA_VISIBLE_DEVICES=""
export CMAKE_ARGS="-DVLLM_GPU_LANG=cpu -DWITH_CUDA=OFF -DUSE_CUDA=OFF -DCUDA_TOOLKIT_ROOT_DIR='' -DCUDAToolkit_ROOT=''"
export TORCH_CUDA_ARCH_LIST=""
export FORCE_CUDA=0
export USE_CUDA=0
export CUDACXX=""

Build and install vLLM:

VLLM_TARGET_DEVICE=cpu pip install . --no-build-isolation --verbose

3. Install vLLM Sonic Extension

cd ..
git clone https://github.com/artyom-beilis/vllm-sonic.git
cd vllm-sonic
VLLM_TARGET_DEVICE="empty" python -m pip install -v .

4. Set up Sonic Frontend

Build the dynamo executor extension:

cd $AGICL_DIR/sonic-frontend/dynamo_executor
python3 setup.py build_ext --inplace

Add Sonic frontend to Python path:

export PYTHONPATH="$PYTHONPATH:$AGICL_DIR/sonic-frontend"

Usage

Basic Chat Example

python3 examples/sonic_chat.py

Inference with Eager Mode Validation

VLLM_SONIC_EAGER_VALIDATION=1 python3 examples/sonic_basic_inference_comparsion.py

Examples

The repository includes several example scripts:

examples/sonic_chat.py - Interactive chat example
examples/sonic_basic_inference.py - Basic inference example
examples/sonic_basic_inference_comparsion.py - Comparison with eager mode
examples/sonic_eager_mode_example.py - Eager mode demonstration

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support