Segmentation Fault in cohere-transcribe Model with vLLM Audio Transcription

#20
by kunalchamoli - opened

I'm encountering a segmentation fault when running the CohereLabs/cohere-transcribe-03-2026 model with vLLM's audio transcription endpoint.

Environment:

  • vLLM Version: Latest nightly build
  • Model: CohereLabs/cohere-transcribe-03-2026
  • Python Version: 3.11
  • CUDA Version: 12.4.1
  • cuDNN: 12.4.1
  • GPU: NVIDIA A100-SXM4-40GB

Setup:

  • Running in Docker container (nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04)
  • vLLM installed with [audio] extra: pip install "vllm[audio]"
  • Audio processing dependencies: librosa, transformers>=5.4.0

Configuration:

  • VLLM_MAX_MODEL_LEN=1024
  • VLLM_GPU_MEMORY_UTILIZATION=0.80
  • VLLM_DTYPE=auto

Error Description:

The EngineCore process crashes with a segmentation fault during /v1/audio/transcriptions API calls. The crash occurs deep in the Python evaluation stack with no usable traceback information.

Error Messages:

WARNING: Defaulting to language='en'. If you wish to transcribe audio in a different language, pass the `language` field in the TranscriptionRequest.

!!!!!!! Segfault encountered !!!!!!!
  File "<unknown>", line 0, in _PyEval_EvalFrameDefault
  [... stack frames omitted ...]
  File "<unknown>", line 0, in _start

ERROR: Engine core proc EngineCore died unexpectedly, shutting down client.
ERROR: AsyncLLM output_handler failed.
vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue.
Cohere Labs org

Hello,

  1. can you share the exact cmd you used to start the server and the audio file along with the cmd to send the request so we can repro?
  2. Does it happen for all kinds of audio files OR some specific one which you tried?

Sign up or log in to comment