--- title: Vox Upscaler emoji: 🔊 colorFrom: red colorTo: yellow sdk: static pinned: false license: apache-2.0 short_description: 16kHz → 48kHz audio upscaling in your browser (WebGPU/CPU) models: - openbmb/VoxCPM2 custom_headers: cross-origin-embedder-policy: require-corp cross-origin-opener-policy: same-origin cross-origin-resource-policy: cross-origin --- # Vox Upscaler Browser-based 16kHz → 48kHz audio upscaling using the VoxCPM2 streaming VAE. Runs with WebGPU when available, falls back to CPU (WASM). ## Usage Drop in an audio file, click **Upscale to 48 kHz**, download the result. ## How it works The VoxCPM2 VAE encodes audio to a latent space and decodes at 48kHz. The ONNX model processes audio in streaming chunks with explicit state passing — no autoregressive loop, just a single forward pass per chunk. - **WebGPU**: fp32 model, 5s chunks - **CPU (WASM)**: fp132 model, 1s chunks, multi-threaded ## Files ``` vox-upscaler-web/ ├── index.html # Web UI (self-contained) ├── server.py # Dev server with COOP/COEP headers ├── LICENSE # License └── onnx/ ├── vae_stream.onnx # Streaming VAE (fp32, ~376 MB) ├── meta.json # State sizes for runtime ``` ## Running locally ```bash python server.py 8080 # open http://localhost:8080/ ``` The COOP/COEP headers are required for SharedArrayBuffer (WASM multi-threading). ## Credits - **VAE model**: [VoxCPM2](https://huggingface.co/openbmb/VoxCPM2/blob/main/audiovae.pth) by [OpenBMB](https://huggingface.co/openbmb) — Apache-2.0 - **WebGPU port**: [KevinAHM](https://huggingface.co/KevinAHM)