Some languages are not supported by the models

#3
by TrueTuner - opened

First of all, the benchmark is excellent work and very useful for the community.

But I have a question:

You calculated the WER scores for:

  • granite-speech
  • nvidia-nemo (parakeet)
  • kyutai

However, it is clearly stated that these models do not support African languages.
Have you fine-tuned each of these models on your datasets?
Or do these models actually perform well in African languages natively?

Thank you for clarifying this for us.

Microsoft org

Thanks @TrueTuner . Yes, some of the models do not support African languages. We did not fine-tune these models except the Paza models. All 3 metrics reported are from the base models. However, one intention of this benchmark was to highlight the accuracy vs efficiency trade-offs across SOTA ASR models, as a useful output for finetuning considerations. For example, Parakeet achieves the best RTFx on most languages, making it a strong efficiency baseline and a practical candidate for fine-tuning on unsupported African languages rather than a claim of native language support.

muchai-mercy changed discussion status to closed

Sign up or log in to comment