WavCube / README.md

nielsr HF Staff

Add model card and metadata

716fae9 verified 4 days ago

1.79 kB

license: mit
library_name: transformers
pipeline_tag: audio-to-audio

WavCube

WavCube is a 128-dim, 50Hz continuous representation that unifies speech understanding, reconstruction, and generation within a single space. It is presented in the paper WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling.

Code: GitHub Repository
Paper: arXiv:2605.06407

Usage

Before using the model, ensure you have installed the requirements as described in the official repository.

Extract Representation from Speech

You can get continuous representations from raw wav using the following command:

python wav_to_feature.py \
    --audio 19_198_000000_000002.wav \
    --config configs/WavCube-stage2.yaml \
    --ckpt WavCube/checkpoints/vocos_checkpoint_epoch=177_step=195000_val_loss=3.3080.ckpt \
    --output 19_198_000000_000002.pt

Reconstruct Speech from Representation

You can reconstruct waveform from representations using the following command:

python feature_to_wav.py \
    --feature 19_198_000000_000002.pt \
    --config configs/WavCube-stage2.yaml \
    --ckpt WavCube/checkpoints/vocos_checkpoint_epoch=177_step=195000_val_loss=3.3080.ckpt

Citation

@misc{yang2025wavcube,
      title={WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling},
      author={Haohan Yang and others},
      year={2025},
      eprint={2605.06407},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2605.06407},
}