| --- |
| license: cc-by-nc-sa-4.0 |
| language: |
| - bg |
| - cs |
| - da |
| - el |
| - es |
| - et |
| - fi |
| - hr |
| - hu |
| - it |
| - lt |
| - lv |
| - mt |
| - nl |
| - pl |
| - pt |
| - ro |
| - sk |
| - sl |
| - sv |
| --- |
| |
| # HuBERT VP-20 |
|
|
| HuBERT VP-20 is a HuBERT base model pretrained on a subset of 6k hours and 20 languages of VoxPopuli |
| (all EU languages except English, French, and German) for the [DiscoPhon benchmark](https://benchmarks.cognitive-ml.fr/discophon). |
| It was pretrained using the [`minimal_hubert`](https://github.com/mxmpl/minimal_hubert) library. |
|
|
| You can load it with HuggingFace Transformers: |
|
|
| ```python |
| from transformers import HubertModel |
| |
| model = HubertModel.from_pretrained("coml/hubert-base-vp20") |
| ``` |
|
|
| Or with `minimal_hubert`: |
| ```python |
| from minimal_hubert import HuBERT, HuBERTPretrain |
| |
| # Standard model |
| model = HuBERT.from_pretrained("coml/hubert-base-vp20") |
| # With pretraining head for classification |
| model_for_pretraining = HuBERTPretrain.from_pretrained("https://huggingface.co/coml/hubert-base-vp20/resolve/main/it2.pt") |
| ``` |
|
|
| Check out [`minimal_hubert`](https://github.com/mxmpl/minimal_hubert) if you are interested in pretraining or want |
| to load HuBERT checkpoints from different libraries. |
|
|
| ## Files: |
|
|
| - `model.safetensors` and `config.json`: HuggingFace Transformers checkpoint and config. |
| - `it1.pt`: 1st iteration checkpoint. |
| - `it2.pt`: 2nd iteration checkpoint. Converted to HuggingFace state_dict to get `model.safetensors`. |
| - `km100-mfcc.joblib`: K-means trained on MFCCs of VoxPopuli-20. Used to train the 1st iteration. |
| - `km500-it1-l10.joblib`: K-means trained on features from the 10th layer of the 1st iteration model. Used to train the 2nd iteration. |
| - `km256-it2-l11.joblib`: K-means trained on features from the 11th layer of the 2nd iteration model. Used for DiscoPhon finetuning. |
| |
| ## Citing |
| |
| ```bibtex |
| @misc{poli2026discophon, |
| title={{DiscoPhon}: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units}, |
| author={Maxime Poli and Manel Khentout and Angelo Ortiz Tandazo and Ewan Dunbar and Emmanuel Chemla and Emmanuel Dupoux}, |
| year={2026}, |
| eprint={2603.18612}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2603.18612}, |
| } |
| ``` |
| |