Do you provide Stream inference codebase

#4
by Saiful79 - opened

Thank you great work my question is that you have provide end2end.py script for vllm inference. But Could you please clarifying it support stream inference

I've tried the python3 end2end.py solution.
see below (1)

It failed for me despite me having succeeded in creating the vLLM
(I will wait a week before re-trying , see (2)

Verification Step 1: The Environment
We must be in the correct "room" (directory) and have the "brain" (environment) active.

Action (In your Ubuntu Terminal):

(1)
Bash
cd ~/vllm-omni/examples/offline_inference/qwen3_tts && source ~/vllm-blackwell/bin/activate
Verify: Your command prompt should now start with (vllm-blackwell) and end with .../qwen3_tts$.

Verification Step 2: Audio Dependencies
This script needs two specific tools to "draw" the audio waves.

Action:

Bash
pip install onnxruntime sox
Verify: Look for Successfully installed or Requirement already satisfied.

The Proof of Concept: "CustomVoice" Test
This script will attempt to:

Load the model onto your Blackwell card.

Generate speech for a pre-set query.

Save it as a file named output.wav.

Action:

Bash
python3 end2end.py --query-type CustomVoice
Post-Bash Verification

  1. The Console Output:

Pass: You see a progress bar for downloading weights, followed by "Saving output to output.wav".

Fail: You see another ModuleNotFoundError. (If this happens, the script is trying to import the broken API files we saw earlier).

  1. The File Check: If the script finishes without an error, run this to see if the audio file actually exists:

Bash
ls -l output.wav
Pass: You see a file roughly 50KB–500KB in size.

  1. The "Windows Mirror" Check: Since you found the \wsl$ path earlier, go to: \wsl$\Ubuntu\home\angus-linux\vllm-omni\examples\offline_inference\qwen3_tts\ Look for output.wav. If it's there, you can double-click it and hear the Blackwell GPU's first words.

Report: Does the python3 end2end.py command begin the loading process, or does it hit a "Module" wall immediately?

(2)

The error TypeError: OmniModelConfig.post_init() takes 13 positional arguments but 14 were given is what developers call a Breaking Change Conflict.

What happened: In the last 48 hours, the vLLM team updated their core settings (the "ModelConfig").

The Result: The vllm-omni fork is still using an old "blueprint" for how a model should be initialized. When it tries to load the Qwen3-TTS model, it's accidentally sending an extra piece of data that the code doesn't know how to handle.

The "Proof of Concept" Verdict
Since this error is happening deep inside the C++ and Python core of the library (specifically in the multiprocessing spawn of the worker), it is not something we can fix with a simple bash command or a settings change.

Is it progress? Actually, yes. We moved past the "Module Not Found" errors and reached the actual Model Initialization Phase. Your Blackwell card successfully started the process, but the software "tripped" at the very last second before the engine started spinning.

Sign up or log in to comment