---
title: Vox Upscaler
emoji: 🔊
colorFrom: red
colorTo: yellow
sdk: static
pinned: false
license: apache-2.0
short_description: 16kHz → 48kHz audio upscaling in your browser (WebGPU/CPU)
models:
  - openbmb/VoxCPM2
custom_headers:
  cross-origin-embedder-policy: require-corp
  cross-origin-opener-policy: same-origin
  cross-origin-resource-policy: cross-origin
---

# Vox Upscaler

Browser-based 16kHz → 48kHz audio upscaling using the VoxCPM2 streaming VAE. Runs with WebGPU when available, falls back to CPU (WASM).

## Usage

Drop in an audio file, click **Upscale to 48 kHz**, download the result.

## How it works

The VoxCPM2 VAE encodes audio to a latent space and decodes at 48kHz. The ONNX model processes audio in streaming chunks with explicit state passing — no autoregressive loop, just a single forward pass per chunk.

- **WebGPU**: fp32 model, 5s chunks
- **CPU (WASM)**: fp132 model, 1s chunks, multi-threaded

## Files

```
vox-upscaler-web/
├── index.html             # Web UI (self-contained)
├── server.py              # Dev server with COOP/COEP headers
├── LICENSE                # License
└── onnx/
    ├── vae_stream.onnx           # Streaming VAE (fp32, ~376 MB)
    ├── meta.json      # State sizes for runtime
```

## Running locally

```bash
python server.py 8080
# open http://localhost:8080/
```

The COOP/COEP headers are required for SharedArrayBuffer (WASM multi-threading).

## Credits

- **VAE model**: [VoxCPM2](https://huggingface.co/openbmb/VoxCPM2/blob/main/audiovae.pth) by [OpenBMB](https://huggingface.co/openbmb) — Apache-2.0
- **WebGPU port**: [KevinAHM](https://huggingface.co/KevinAHM)