File size: 1,792 Bytes
716fae9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | ---
license: mit
library_name: transformers
pipeline_tag: audio-to-audio
---
# WavCube
WavCube is a 128-dim, 50Hz continuous representation that unifies speech understanding, reconstruction, and generation within a single space. It is presented in the paper [WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling](https://huggingface.co/papers/2605.06407).
- **Code:** [GitHub Repository](https://github.com/yanghaha0908/WavCube)
- **Paper:** [arXiv:2605.06407](https://arxiv.org/abs/2605.06407)
## Usage
Before using the model, ensure you have installed the requirements as described in the [official repository](https://github.com/yanghaha0908/WavCube).
### Extract Representation from Speech
You can get continuous representations from raw wav using the following command:
```bash
python wav_to_feature.py \
--audio 19_198_000000_000002.wav \
--config configs/WavCube-stage2.yaml \
--ckpt WavCube/checkpoints/vocos_checkpoint_epoch=177_step=195000_val_loss=3.3080.ckpt \
--output 19_198_000000_000002.pt
```
### Reconstruct Speech from Representation
You can reconstruct waveform from representations using the following command:
```bash
python feature_to_wav.py \
--feature 19_198_000000_000002.pt \
--config configs/WavCube-stage2.yaml \
--ckpt WavCube/checkpoints/vocos_checkpoint_epoch=177_step=195000_val_loss=3.3080.ckpt
```
## Citation
```bibtex
@misc{yang2025wavcube,
title={WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling},
author={Haohan Yang and others},
year={2025},
eprint={2605.06407},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2605.06407},
}
``` |