Add model card and metadata

This PR adds a model card for WavCube, which includes:
- Metadata for the `audio-to-audio` pipeline, `transformers` library, and `mit` license.
- Links to the research paper "[WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling](https://huggingface.co/papers/2605.06407)" and the official GitHub repository.
- Sample usage instructions for extracting speech representations and reconstructing waveforms based on the GitHub README.

Files changed (1) hide show

README.md +51 -0

README.md ADDED Viewed

	@@ -0,0 +1,51 @@

+---
+license: mit
+library_name: transformers
+pipeline_tag: audio-to-audio
+---
+# WavCube
+WavCube is a 128-dim, 50Hz continuous representation that unifies speech understanding, reconstruction, and generation within a single space. It is presented in the paper [WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling](https://huggingface.co/papers/2605.06407).
+- **Code:** [GitHub Repository](https://github.com/yanghaha0908/WavCube)
+- **Paper:** [arXiv:2605.06407](https://arxiv.org/abs/2605.06407)
+## Usage
+Before using the model, ensure you have installed the requirements as described in the [official repository](https://github.com/yanghaha0908/WavCube).
+### Extract Representation from Speech
+You can get continuous representations from raw wav using the following command:
+```bash
+python wav_to_feature.py \
+    --audio 19_198_000000_000002.wav \
+    --config configs/WavCube-stage2.yaml \
+    --ckpt WavCube/checkpoints/vocos_checkpoint_epoch=177_step=195000_val_loss=3.3080.ckpt \
+    --output 19_198_000000_000002.pt
+```
+### Reconstruct Speech from Representation
+You can reconstruct waveform from representations using the following command:
+```bash
+python feature_to_wav.py \
+    --feature 19_198_000000_000002.pt \
+    --config configs/WavCube-stage2.yaml \
+    --ckpt WavCube/checkpoints/vocos_checkpoint_epoch=177_step=195000_val_loss=3.3080.ckpt
+```
+## Citation
+```bibtex
+@misc{yang2025wavcube,
+      title={WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling},
+      author={Haohan Yang and others},
+      year={2025},
+      eprint={2605.06407},
+      archivePrefix={arXiv},
+      primaryClass={cs.SD},
+      url={https://arxiv.org/abs/2605.06407},
+}
+```