YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

vLLM Sonic Extension

This repository contains the Sonic compiler integration for vLLM (CPU) using the Sonic MLIR compiler.

Prerequisites

  • Python 3.8+
  • torch-mlir
  • LLVM
  • Sonic MLIR
  • Access to the Sonic frontend repository ($AGICL_DIR/sonic-frontend)

Installation

1. Set up Python Virtual Environment

cd <your-workspace-directory>
python3 -m venv vllm
source ./vllm/bin/activate

2. Install vLLM (CPU Build)

cd <parent-directory-for-repositories>
git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout 134f70b3eddf05f01f55ecee9c2a14ec0732e8b6
pip install setuptools_scm
pip install -r requirements/cpu.txt

Set environment variables for CPU-only build:

export VLLM_TARGET_DEVICE=cpu
export CUDA_VISIBLE_DEVICES=""
export CMAKE_ARGS="-DVLLM_GPU_LANG=cpu -DWITH_CUDA=OFF -DUSE_CUDA=OFF -DCUDA_TOOLKIT_ROOT_DIR='' -DCUDAToolkit_ROOT=''"
export TORCH_CUDA_ARCH_LIST=""
export FORCE_CUDA=0
export USE_CUDA=0
export CUDACXX=""

Build and install vLLM:

VLLM_TARGET_DEVICE=cpu pip install . --no-build-isolation --verbose

3. Install vLLM Sonic Extension

cd ..
git clone https://github.com/artyom-beilis/vllm-sonic.git
cd vllm-sonic
VLLM_TARGET_DEVICE="empty" python -m pip install -v .

4. Set up Sonic Frontend

Build the dynamo executor extension:

cd $AGICL_DIR/sonic-frontend/dynamo_executor
python3 setup.py build_ext --inplace

Add Sonic frontend to Python path:

export PYTHONPATH="$PYTHONPATH:$AGICL_DIR/sonic-frontend"

Usage

Basic Chat Example

python3 examples/sonic_chat.py

Inference with Eager Mode Validation

VLLM_SONIC_EAGER_VALIDATION=1 python3 examples/sonic_basic_inference_comparsion.py

Examples

The repository includes several example scripts:

  • examples/sonic_chat.py - Interactive chat example
  • examples/sonic_basic_inference.py - Basic inference example
  • examples/sonic_basic_inference_comparsion.py - Comparison with eager mode
  • examples/sonic_eager_mode_example.py - Eager mode demonstration
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support