Spaces:
Running
Running
metadata
title: Vox Upscaler
emoji: π
colorFrom: red
colorTo: yellow
sdk: static
pinned: false
license: apache-2.0
short_description: 16kHz β 48kHz audio upscaling in your browser (WebGPU/CPU)
models:
- openbmb/VoxCPM2
custom_headers:
cross-origin-embedder-policy: require-corp
cross-origin-opener-policy: same-origin
cross-origin-resource-policy: cross-origin
Vox Upscaler
Browser-based 16kHz β 48kHz audio upscaling using the VoxCPM2 streaming VAE. Runs with WebGPU when available, falls back to CPU (WASM).
Usage
Drop in an audio file, click Upscale to 48 kHz, download the result.
How it works
The VoxCPM2 VAE encodes audio to a latent space and decodes at 48kHz. The ONNX model processes audio in streaming chunks with explicit state passing β no autoregressive loop, just a single forward pass per chunk.
- WebGPU: fp32 model, 5s chunks
- CPU (WASM): fp132 model, 1s chunks, multi-threaded
Files
vox-upscaler-web/
βββ index.html # Web UI (self-contained)
βββ server.py # Dev server with COOP/COEP headers
βββ LICENSE # License
βββ onnx/
βββ vae_stream.onnx # Streaming VAE (fp32, ~376 MB)
βββ meta.json # State sizes for runtime
Running locally
python server.py 8080
# open http://localhost:8080/
The COOP/COEP headers are required for SharedArrayBuffer (WASM multi-threading).