coml
/

hubert-base-vp20

Model card Files Files and versions

mpoli commited on 13 days ago

Commit

7252422

·

verified ·

1 Parent(s): b8185b7

Update README.md

Files changed (1) hide show

README.md +74 -3

README.md CHANGED Viewed

@@ -1,3 +1,74 @@
----
-license: mit
----

+---
+license: cc-by-nc-sa-4.0
+language:
+- bg
+- cs
+- da
+- el
+- es
+- et
+- fi
+- hr
+- hu
+- it
+- lt
+- lv
+- mt
+- nl
+- pl
+- pt
+- ro
+- sk
+- sl
+- sv
+---
+# HuBERT VP-20
+HuBERT VP-20 is a HuBERT base model pretrained on a subset of 6k hours and 20 languages of VoxPopuli
+(all EU languages except English, French, and German) for the [DiscoPhon benchmark](https://benchmarks.cognitive-ml.fr/discophon).
+It was pretrained using the [`minimal_hubert`](https://github.com/mxmpl/minimal_hubert) library.
+You can load it with HuggingFace Transformers:
+```python
+from transformers import HubertModel
+model = HubertModel.from_pretrained("coml/hubert-base-vp20")
+```
+Or with `minimal_hubert`:
+```python
+from minimal_hubert import HuBERT, HuBERTPretrain
+# Standard model
+model = HuBERT.from_pretrained("coml/hubert-base-vp20")
+# With pretraining head for classification
+model_for_pretraining = HuBERTPretrain.from_pretrained("https://huggingface.co/coml/hubert-base-vp20/resolve/main/it2.pt")
+```
+Check out [`minimal_hubert`](https://github.com/mxmpl/minimal_hubert) if you are interested in pretraining or want
+to load HuBERT checkpoints from different libraries.
+## Files:
+- `model.safetensors` and `config.json`: HuggingFace Transformers checkpoint and config.
+- `it1.pt`: 1st iteration checkpoint.
+- `it2.pt`: 2nd iteration checkpoint. Converted to HuggingFace state_dict to get `model.safetensors`.
+- `km100-mfcc.joblib`: K-means trained on MFCCs of VoxPopuli-20. Used to train the 1st iteration.
+- `km500-it1-l10.joblib`: K-means trained on features from the 10th layer of the 1st iteration model. Used to train the 2nd iteration.
+- `km256-it2-l11.joblib`: K-means trained on features from the 11th layer of the 2nd iteration model. Used for DiscoPhon finetuning.
+## Citing
+```
+@misc{poli2026discophon,
+  title={{DiscoPhon}: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units},
+  author={Maxime Poli and Manel Khentout and Angelo Ortiz Tandazo and Ewan Dunbar and Emmanuel Chemla and Emmanuel Dupoux},
+  year={2026},
+  eprint={2603.18612},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  url={https://arxiv.org/abs/2603.18612},
+}
+```