WavCube / README.md
nielsr's picture
nielsr HF Staff
Add model card and metadata
716fae9 verified
|
raw
history blame
1.79 kB
metadata
license: mit
library_name: transformers
pipeline_tag: audio-to-audio

WavCube

WavCube is a 128-dim, 50Hz continuous representation that unifies speech understanding, reconstruction, and generation within a single space. It is presented in the paper WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling.

Usage

Before using the model, ensure you have installed the requirements as described in the official repository.

Extract Representation from Speech

You can get continuous representations from raw wav using the following command:

python wav_to_feature.py \
    --audio 19_198_000000_000002.wav \
    --config configs/WavCube-stage2.yaml \
    --ckpt WavCube/checkpoints/vocos_checkpoint_epoch=177_step=195000_val_loss=3.3080.ckpt \
    --output 19_198_000000_000002.pt

Reconstruct Speech from Representation

You can reconstruct waveform from representations using the following command:

python feature_to_wav.py \
    --feature 19_198_000000_000002.pt \
    --config configs/WavCube-stage2.yaml \
    --ckpt WavCube/checkpoints/vocos_checkpoint_epoch=177_step=195000_val_loss=3.3080.ckpt

Citation

@misc{yang2025wavcube,
      title={WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling},
      author={Haohan Yang and others},
      year={2025},
      eprint={2605.06407},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2605.06407},
}