Wolof-HuBERT-Base

Wolof-HuBERT is continued pretrained from facebook/hubert-large-ls960 on 860 hours of Wolof speech audio. When using the model make sure that your speech input is also sampled at 16Khz.

Note: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out huggingface transformers examples to finetune this model.

We experimented with finetuning, and the model outperforms other models of the same size.

Usage

See this blog for more information on how to fine-tune the model. Note that the class Wav2Vec2ForCTC has to be replaced by HubertForCTC.

How to Cite

If you use this model, please cite:

@misc{sy2025speechlanguagemodelsunderrepresented,
      title={Speech Language Models for Under-Represented Languages: Insights from Wolof}, 
      author={Yaya Sy and Dioula Doucouré and Christophe Cerisara and Irina Illina},
      year={2025},
      eprint={2509.15362},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.15362}, 
}