Joblib
Safetensors
hubert
mpoli commited on
Commit
7252422
·
verified ·
1 Parent(s): b8185b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -3
README.md CHANGED
@@ -1,3 +1,74 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ language:
4
+ - bg
5
+ - cs
6
+ - da
7
+ - el
8
+ - es
9
+ - et
10
+ - fi
11
+ - hr
12
+ - hu
13
+ - it
14
+ - lt
15
+ - lv
16
+ - mt
17
+ - nl
18
+ - pl
19
+ - pt
20
+ - ro
21
+ - sk
22
+ - sl
23
+ - sv
24
+ ---
25
+
26
+ # HuBERT VP-20
27
+
28
+ HuBERT VP-20 is a HuBERT base model pretrained on a subset of 6k hours and 20 languages of VoxPopuli
29
+ (all EU languages except English, French, and German) for the [DiscoPhon benchmark](https://benchmarks.cognitive-ml.fr/discophon).
30
+ It was pretrained using the [`minimal_hubert`](https://github.com/mxmpl/minimal_hubert) library.
31
+
32
+ You can load it with HuggingFace Transformers:
33
+
34
+ ```python
35
+ from transformers import HubertModel
36
+
37
+ model = HubertModel.from_pretrained("coml/hubert-base-vp20")
38
+ ```
39
+
40
+ Or with `minimal_hubert`:
41
+ ```python
42
+ from minimal_hubert import HuBERT, HuBERTPretrain
43
+
44
+ # Standard model
45
+ model = HuBERT.from_pretrained("coml/hubert-base-vp20")
46
+ # With pretraining head for classification
47
+ model_for_pretraining = HuBERTPretrain.from_pretrained("https://huggingface.co/coml/hubert-base-vp20/resolve/main/it2.pt")
48
+ ```
49
+
50
+ Check out [`minimal_hubert`](https://github.com/mxmpl/minimal_hubert) if you are interested in pretraining or want
51
+ to load HuBERT checkpoints from different libraries.
52
+
53
+ ## Files:
54
+
55
+ - `model.safetensors` and `config.json`: HuggingFace Transformers checkpoint and config.
56
+ - `it1.pt`: 1st iteration checkpoint.
57
+ - `it2.pt`: 2nd iteration checkpoint. Converted to HuggingFace state_dict to get `model.safetensors`.
58
+ - `km100-mfcc.joblib`: K-means trained on MFCCs of VoxPopuli-20. Used to train the 1st iteration.
59
+ - `km500-it1-l10.joblib`: K-means trained on features from the 10th layer of the 1st iteration model. Used to train the 2nd iteration.
60
+ - `km256-it2-l11.joblib`: K-means trained on features from the 11th layer of the 2nd iteration model. Used for DiscoPhon finetuning.
61
+
62
+ ## Citing
63
+
64
+ ```
65
+ @misc{poli2026discophon,
66
+ title={{DiscoPhon}: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units},
67
+ author={Maxime Poli and Manel Khentout and Angelo Ortiz Tandazo and Ewan Dunbar and Emmanuel Chemla and Emmanuel Dupoux},
68
+ year={2026},
69
+ eprint={2603.18612},
70
+ archivePrefix={arXiv},
71
+ primaryClass={cs.CL},
72
+ url={https://arxiv.org/abs/2603.18612},
73
+ }
74
+ ```