vox-upscaler-web / README.md
Kevin Knoedler
Initial commit
59b2bdd
metadata
title: Vox Upscaler
emoji: πŸ”Š
colorFrom: red
colorTo: yellow
sdk: static
pinned: false
license: apache-2.0
short_description: 16kHz β†’ 48kHz audio upscaling in your browser (WebGPU/CPU)
models:
  - openbmb/VoxCPM2
custom_headers:
  cross-origin-embedder-policy: require-corp
  cross-origin-opener-policy: same-origin
  cross-origin-resource-policy: cross-origin

Vox Upscaler

Browser-based 16kHz β†’ 48kHz audio upscaling using the VoxCPM2 streaming VAE. Runs with WebGPU when available, falls back to CPU (WASM).

Usage

Drop in an audio file, click Upscale to 48 kHz, download the result.

How it works

The VoxCPM2 VAE encodes audio to a latent space and decodes at 48kHz. The ONNX model processes audio in streaming chunks with explicit state passing β€” no autoregressive loop, just a single forward pass per chunk.

  • WebGPU: fp32 model, 5s chunks
  • CPU (WASM): fp132 model, 1s chunks, multi-threaded

Files

vox-upscaler-web/
β”œβ”€β”€ index.html             # Web UI (self-contained)
β”œβ”€β”€ server.py              # Dev server with COOP/COEP headers
β”œβ”€β”€ LICENSE                # License
└── onnx/
    β”œβ”€β”€ vae_stream.onnx           # Streaming VAE (fp32, ~376 MB)
    β”œβ”€β”€ meta.json      # State sizes for runtime

Running locally

python server.py 8080
# open http://localhost:8080/

The COOP/COEP headers are required for SharedArrayBuffer (WASM multi-threading).

Credits