Garbage output from transcribe method

#28
by svennslu - opened

I'm getting garbage output from the model.transcribe method:

Loading weights: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2150/2150 [00:00<00:00, 13091.41it/s]
aires συνεχ εργασία richtiglenlij richtig traffic droitsθούν side pun dank pun专 dank examples geben pun专 dank味 oper fell sources 치 dank acuerdo richtig dichφέρすط rồi 재 existence确专 pati cor Americans in digo famous friend cor hatten objetivos类 cor χρησιμοποι dankなんか corr corなんか raison욕 raison sustسب All Roy richtig raison욕 All dankFχωaires εμεί existence gebenよね dank Allellschaft Americans existencezioni임 εργασίαhntaires εμεί existence raison εργασία coraming corr cor διαπ飛 corr cor Americans dank Inter remained All All Allellschaft路 εργασία πρω pull Ye organ εργασία essayالمõ cor necessário리를 essay raison εργασία av remained专 remained takie opposed sal Allط Roy Roy Roy dei agencies Allط cor διαπ cor διαπ angry tour cor partner All Roy cor partner All tour All All All All All All All All All Roylement jeszcze εργασίαよね geben cor路 εργασία cor partner All cor partner All All All All All』专 remained专 remained innererd toch All cadf Allaleالم famous friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend friend

Code (copy pasted from readme)

import torch
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
from huggingface_hub import hf_hub_download

model_id = "CohereLabs/cohere-transcribe-03-2026"

device = "cuda:0" if torch.cuda.is_available() else "cpu"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, trust_remote_code=True).to(device)
model.eval()

audio_file = hf_hub_download(
    repo_id="CohereLabs/cohere-transcribe-03-2026",
    filename="demo/voxpopuli_test_en_demo.wav",
)

texts = model.transcribe(processor=processor, audio_files=[audio_file], language="en")
print(texts[0])

Also tried cpu instead of cuda and other audio files, same issue.

The quickstart transformers code works fine though.

Yeah I suddenly started having that issue too! Seriously thought I was going crazy!

Hi! Thanks for reporting. Good news the transformers native path doesn't show this and that is the recommended way to proceed going forward.

However if you have a reason to keep the trust_remote_code=True path going, do you still see this with the recommended install for the trust_remote_code=True path?

pip install "transformers>=4.56,<5.3,!=5.0.*,!=5.1.*" torch huggingface_hub soundfile librosa sentencepiece protobuf

link

If so what version of transformers did you see this issue with?

Thanks

Hi team,
I was using vLLM 0.19.0 also having this issue:

{"text":",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,","usage":{"type":"duration","seconds":7}}

Console output:

(APIServer pid=66086) WARNING 04-08 21:19:36 [base.py:283] Falling back on <BOS> for decoder start token id because decoder start token id is not available.
(APIServer pid=66086) INFO:     192.168.1.77:58751 - "POST /v1/audio/transcriptions HTTP/1.1" 200 OK

Hi @evilperson068
Can you share the cmd to repro this along with the audio you are using?

I just confirmed again that it works when using:
server

uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install vllm==0.19.0 --torch-backend=auto
uv pip install vllm[audio]
uv pip install librosa
vllm serve CohereLabs/cohere-transcribe-03-2026 --trust-remote-code

client:

AUDIO_PATH=<path to your audio file>
curl -v http://localhost:8000/v1/audio/transcriptions   -H "Authorization: Bearer $VLLM_API_KEY"   -F "file=@$(realpath ${AUDIO_PATH})"

Hi @evilperson068
Can you share the cmd to repro this along with the audio you are using?

I just confirmed again that it works when using:
server

uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install vllm==0.19.0 --torch-backend=auto
uv pip install vllm[audio]
uv pip install librosa
vllm serve CohereLabs/cohere-transcribe-03-2026 --trust-remote-code

client:

AUDIO_PATH=<path to your audio file>
curl -v http://localhost:8000/v1/audio/transcriptions   -H "Authorization: Bearer $VLLM_API_KEY"   -F "file=@$(realpath ${AUDIO_PATH})"

Hi this was the command:

vllm serve /root/models/cohere-transcribe-03-2026 --trust-remote-code --host 0.0.0.0 --port 9900 --quantization fp8 --gpu-memory-utilization=0.2 --served-model-name model

(It does not matter using fp8 or not)
The audio was me speaking, but I tested any file would not work.

I would recomment making a new folder and just trying these commands as it is. These cmd are known to be working.

It could be that your local ckpt (/root/models/cohere-transcribe-03-2026) is an older ckpt.

I would recomment making a new folder and just trying these commands as it is. These cmd are known to be working.

It could be that your local ckpt (/root/models/cohere-transcribe-03-2026) is an older ckpt.

Thank you, I will take a look. It could be my Python env problem.

Sign up or log in to comment