--- license: mit library_name: transformers pipeline_tag: audio-to-audio --- # WavCube WavCube is a 128-dim, 50Hz continuous representation that unifies speech understanding, reconstruction, and generation within a single space. It is presented in the paper [WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling](https://huggingface.co/papers/2605.06407). - **Code:** [GitHub Repository](https://github.com/yanghaha0908/WavCube) - **Paper:** [arXiv:2605.06407](https://arxiv.org/abs/2605.06407) ## Usage Before using the model, ensure you have installed the requirements as described in the [official repository](https://github.com/yanghaha0908/WavCube). ### Extract Representation from Speech You can get continuous representations from raw wav using the following command: ```bash python wav_to_feature.py \ --audio 19_198_000000_000002.wav \ --config configs/WavCube-stage2.yaml \ --ckpt WavCube/checkpoints/vocos_checkpoint_epoch=177_step=195000_val_loss=3.3080.ckpt \ --output 19_198_000000_000002.pt ``` ### Reconstruct Speech from Representation You can reconstruct waveform from representations using the following command: ```bash python feature_to_wav.py \ --feature 19_198_000000_000002.pt \ --config configs/WavCube-stage2.yaml \ --ckpt WavCube/checkpoints/vocos_checkpoint_epoch=177_step=195000_val_loss=3.3080.ckpt ``` ## Citation ```bibtex @misc{yang2025wavcube, title={WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling}, author={Haohan Yang and others}, year={2025}, eprint={2605.06407}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2605.06407}, } ```