Segmentation Fault in cohere-transcribe Model with vLLM Audio Transcription
#20
by kunalchamoli - opened
I'm encountering a segmentation fault when running the CohereLabs/cohere-transcribe-03-2026 model with vLLM's audio transcription endpoint.
Environment:
- vLLM Version: Latest nightly build
- Model: CohereLabs/cohere-transcribe-03-2026
- Python Version: 3.11
- CUDA Version: 12.4.1
- cuDNN: 12.4.1
- GPU: NVIDIA A100-SXM4-40GB
Setup:
- Running in Docker container (nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04)
- vLLM installed with [audio] extra: pip install "vllm[audio]"
- Audio processing dependencies: librosa, transformers>=5.4.0
Configuration:
- VLLM_MAX_MODEL_LEN=1024
- VLLM_GPU_MEMORY_UTILIZATION=0.80
- VLLM_DTYPE=auto
Error Description:
The EngineCore process crashes with a segmentation fault during /v1/audio/transcriptions API calls. The crash occurs deep in the Python evaluation stack with no usable traceback information.
Error Messages:
WARNING: Defaulting to language='en'. If you wish to transcribe audio in a different language, pass the `language` field in the TranscriptionRequest.
!!!!!!! Segfault encountered !!!!!!!
File "<unknown>", line 0, in _PyEval_EvalFrameDefault
[... stack frames omitted ...]
File "<unknown>", line 0, in _start
ERROR: Engine core proc EngineCore died unexpectedly, shutting down client.
ERROR: AsyncLLM output_handler failed.
vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue.
Hello,
- can you share the exact cmd you used to start the server and the audio file along with the cmd to send the request so we can repro?
- Does it happen for all kinds of audio files OR some specific one which you tried?
nice one