Auto language detection ?

#29

by Kwan0 - opened 7 days ago

Dear Cohere team,

I have tested this model and am really impressed with it. The WER is lower, and it can transcribe different languages and accents better than Whisper.

One thing I am wondering about: does this model support automatic language detection? I believe requiring a fixed language ID when integrating this model would significantly reduce its flexibility.

ekagra-ranjan

Cohere Labs org 3 days ago

•

edited 3 days ago

Hi - we didnt specifically train for auto language detection but we have noted your request to improve our future release.

While it doesnt work out of the box, you could hack it in some way. One method would be to let the model generate the language token by passing the partial input prompt and then later use that predicted language token to create the final prompt and use it again. The drawback is the increased latency due increased number of model call.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment