coml
/

hubert-base-vp20

Model card Files Files and versions

hubert-base-vp20 / README.md

mxmpl

bibtex

46076ff unverified 13 days ago

|

history blame contribute delete

2.18 kB

	---
	license: cc-by-nc-sa-4.0
	language:
	- bg
	- cs
	- da
	- el
	- es
	- et
	- fi
	- hr
	- hu
	- it
	- lt
	- lv
	- mt
	- nl
	- pl
	- pt
	- ro
	- sk
	- sl
	- sv
	---

	# HuBERT VP-20

	HuBERT VP-20 is a HuBERT base model pretrained on a subset of 6k hours and 20 languages of VoxPopuli
	(all EU languages except English, French, and German) for the [DiscoPhon benchmark](https://benchmarks.cognitive-ml.fr/discophon).
	It was pretrained using the [`minimal_hubert`](https://github.com/mxmpl/minimal_hubert) library.

	You can load it with HuggingFace Transformers:

	```python
	from transformers import HubertModel

	model = HubertModel.from_pretrained("coml/hubert-base-vp20")
	```

	Or with `minimal_hubert`:
	```python
	from minimal_hubert import HuBERT, HuBERTPretrain

	# Standard model
	model = HuBERT.from_pretrained("coml/hubert-base-vp20")
	# With pretraining head for classification
	model_for_pretraining = HuBERTPretrain.from_pretrained("https://huggingface.co/coml/hubert-base-vp20/resolve/main/it2.pt")
	```

	Check out [`minimal_hubert`](https://github.com/mxmpl/minimal_hubert) if you are interested in pretraining or want
	to load HuBERT checkpoints from different libraries.

	## Files:

	- `model.safetensors` and `config.json`: HuggingFace Transformers checkpoint and config.
	- `it1.pt`: 1st iteration checkpoint.
	- `it2.pt`: 2nd iteration checkpoint. Converted to HuggingFace state_dict to get `model.safetensors`.
	- `km100-mfcc.joblib`: K-means trained on MFCCs of VoxPopuli-20. Used to train the 1st iteration.
	- `km500-it1-l10.joblib`: K-means trained on features from the 10th layer of the 1st iteration model. Used to train the 2nd iteration.
	- `km256-it2-l11.joblib`: K-means trained on features from the 11th layer of the 2nd iteration model. Used for DiscoPhon finetuning.

	## Citing

	```bibtex
	@misc{poli2026discophon,
	title={{DiscoPhon}: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units},
	author={Maxime Poli and Manel Khentout and Angelo Ortiz Tandazo and Ewan Dunbar and Emmanuel Chemla and Emmanuel Dupoux},
	year={2026},
	eprint={2603.18612},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2603.18612},
	}
	```