Papers
arxiv:2406.06371

mHuBERT-147: A Compact Multilingual HuBERT Model

Published on Jun 10, 2024
Authors:
,
,
,

Abstract

mHuBERT-147, a multilingual speech representation model, achieves superior performance with fewer parameters compared to larger models by using faiss-based clustering and a multilingual batching up-sampling strategy.

AI-generated summary

We present mHuBERT-147, the first general-purpose massively multilingual HuBERT speech representation model trained on 90K hours of clean, open-license data. To scale up the multi-iteration HuBERT approach, we use faiss-based clustering, achieving 5.2x faster label assignment over the original method. We also apply a new multilingual batching up-sampling strategy, leveraging both language and dataset diversity. After 3 training iterations and with only 95M parameters, mHuBERT-147 outperforms larger models trained on substantially more data. We rank second and first on the ML-SUPERB 10min/1h leaderboards respectively, with SOTA scores for all LID tasks. Across ASR/LID tasks, our model consistently surpasses XLS-R (300M params; 436K hours) and demonstrates strong competitiveness against the much larger MMS (1B params; 491K hours). Our findings suggest that mHuBERT-147 is a promising model for multilingual speech processing tasks, offering an unprecedented balance between high performance and parameter efficiency.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2406.06371
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.06371 in a dataset README.md to link it from this page.

Spaces citing this paper 3

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.