yhaha
/

WavCube

Model card Files Files and versions

WavCube / README.md

nielsr's picture

nielsr HF Staff

Add model card and metadata

716fae9 verified 5 days ago

|

1.79 kB

	---
	license: mit
	library_name: transformers
	pipeline_tag: audio-to-audio
	---

	# WavCube

	WavCube is a 128-dim, 50Hz continuous representation that unifies speech understanding, reconstruction, and generation within a single space. It is presented in the paper [WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling](https://huggingface.co/papers/2605.06407).

	- Code: [GitHub Repository](https://github.com/yanghaha0908/WavCube)
	- Paper: [arXiv:2605.06407](https://arxiv.org/abs/2605.06407)

	## Usage

	Before using the model, ensure you have installed the requirements as described in the [official repository](https://github.com/yanghaha0908/WavCube).

	### Extract Representation from Speech
	You can get continuous representations from raw wav using the following command:

	```bash
	python wav_to_feature.py \
	--audio 19_198_000000_000002.wav \
	--config configs/WavCube-stage2.yaml \
	--ckpt WavCube/checkpoints/vocos_checkpoint_epoch=177_step=195000_val_loss=3.3080.ckpt \
	--output 19_198_000000_000002.pt
	```

	### Reconstruct Speech from Representation
	You can reconstruct waveform from representations using the following command:

	```bash
	python feature_to_wav.py \
	--feature 19_198_000000_000002.pt \
	--config configs/WavCube-stage2.yaml \
	--ckpt WavCube/checkpoints/vocos_checkpoint_epoch=177_step=195000_val_loss=3.3080.ckpt
	```

	## Citation

	```bibtex
	@misc{yang2025wavcube,
	title={WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling},
	author={Haohan Yang and others},
	year={2025},
	eprint={2605.06407},
	archivePrefix={arXiv},
	primaryClass={cs.SD},
	url={https://arxiv.org/abs/2605.06407},
	}
	```