YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
vLLM Sonic Extension
This repository contains the Sonic compiler integration for vLLM (CPU) using the Sonic MLIR compiler.
Prerequisites
- Python 3.8+
- torch-mlir
- LLVM
- Sonic MLIR
- Access to the Sonic frontend repository (
$AGICL_DIR/sonic-frontend)
Installation
1. Set up Python Virtual Environment
cd <your-workspace-directory>
python3 -m venv vllm
source ./vllm/bin/activate
2. Install vLLM (CPU Build)
cd <parent-directory-for-repositories>
git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout 134f70b3eddf05f01f55ecee9c2a14ec0732e8b6
pip install setuptools_scm
pip install -r requirements/cpu.txt
Set environment variables for CPU-only build:
export VLLM_TARGET_DEVICE=cpu
export CUDA_VISIBLE_DEVICES=""
export CMAKE_ARGS="-DVLLM_GPU_LANG=cpu -DWITH_CUDA=OFF -DUSE_CUDA=OFF -DCUDA_TOOLKIT_ROOT_DIR='' -DCUDAToolkit_ROOT=''"
export TORCH_CUDA_ARCH_LIST=""
export FORCE_CUDA=0
export USE_CUDA=0
export CUDACXX=""
Build and install vLLM:
VLLM_TARGET_DEVICE=cpu pip install . --no-build-isolation --verbose
3. Install vLLM Sonic Extension
cd ..
git clone https://github.com/artyom-beilis/vllm-sonic.git
cd vllm-sonic
VLLM_TARGET_DEVICE="empty" python -m pip install -v .
4. Set up Sonic Frontend
Build the dynamo executor extension:
cd $AGICL_DIR/sonic-frontend/dynamo_executor
python3 setup.py build_ext --inplace
Add Sonic frontend to Python path:
export PYTHONPATH="$PYTHONPATH:$AGICL_DIR/sonic-frontend"
Usage
Basic Chat Example
python3 examples/sonic_chat.py
Inference with Eager Mode Validation
VLLM_SONIC_EAGER_VALIDATION=1 python3 examples/sonic_basic_inference_comparsion.py
Examples
The repository includes several example scripts:
examples/sonic_chat.py- Interactive chat exampleexamples/sonic_basic_inference.py- Basic inference exampleexamples/sonic_basic_inference_comparsion.py- Comparison with eager modeexamples/sonic_eager_mode_example.py- Eager mode demonstration
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support