Spaces:
Running on A10G
Running on A10G
File size: 3,698 Bytes
608ef95 7d9f729 608ef95 7d9f729 608ef95 7d9f729 ced8c00 608ef95 09dc185 608ef95 09dc185 7d8de04 ffbf382 7d8de04 39d1990 7d8de04 ffbf382 09dc185 ffbf382 09dc185 ffbf382 09dc185 7d8de04 09dc185 ffbf382 09dc185 7d8de04 09dc185 39d1990 7d8de04 44d9bcd 7d8de04 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | ---
title: VoxCPM Demo
emoji: 🎙️
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
python_version: "3.10"
pinned: true
license: apache-2.0
short_description: VoxCPM2 Nano-vLLM Demo
---
Experimental Gradio Space demo for `VoxCPM2` powered by `nanovllm-voxcpm`.
This repo keeps the existing Gradio frontend layout and swaps only the backend inference path to Nano-vLLM.
Notes:
- This is the non-Docker experiment path. It relies on a persistent GPU Gradio Space.
- `flash-attn` and `nanovllm-voxcpm` are pinned in `requirements.txt`, so they install during Space build instead of on first request.
- ZipEnhancer denoising is supported for reference audio cloning. The default denoiser model is `iic/speech_zipenhancer_ans_multiloss_16k_base`.
- The Space now defaults to a hardened runtime path:
- If `/data` exists, request logs are written to daily JSONL files like `/data/logs/2026-04-05.jsonl`.
- Model, pip, and temporary caches now stay on the default runtime paths instead of consuming persistent storage.
- Backend prewarm is enabled by default, so startup can begin dependency install + model load in the background.
- Gradio SSR is disabled by default for stability.
- The first cold start may still spend extra time installing dependencies, downloading the model, and loading the server.
- `SenseVoiceSmall` is downloaded from Hugging Face and cached locally before ASR initialization.
- `ASR_DEVICE` defaults to `cpu` to avoid competing with TTS GPU memory.
- Reference audio longer than 50 seconds is rejected early before denoising or Nano-vLLM encoding.
- The `LocDiT flow-matching steps` slider is wired to Nano-vLLM server `inference_timesteps`; changing it rebuilds the backend server.
- The existing `normalize` toggle is kept for UI compatibility, but Nano-vLLM currently ignores it.
- The existing `denoise` toggle now runs ZipEnhancer on the reference audio before encoding it to latents.
- `packages.txt` is required because this path needs extra system build dependencies.
Stability recommendation:
- Use a persistent GPU Space.
- Attach persistent storage so `/data` is available.
- Keep the default queue concurrency at `1` unless you have profiled GPU memory headroom.
Recommended environment variables:
- `HF_REPO_ID`: Hugging Face model repo id. Defaults to `openbmb/VoxCPM2`
- `HF_TOKEN`: required if the model repo is private
- `NANOVLLM_MODEL`: optional direct model ref override. Can be a local path or HF repo id
- `NANOVLLM_MODEL_PATH`: optional local model path override
- `ASR_DEVICE`: defaults to `cpu`
- `ZIPENHANCER_MODEL_ID`: optional ModelScope denoiser model id or local path. Defaults to `iic/speech_zipenhancer_ans_multiloss_16k_base`
- `NANOVLLM_INFERENCE_TIMESTEPS`: initial default is `10`
- `NANOVLLM_PREWARM`: defaults to `true`
- `NANOVLLM_SERVERPOOL_MAX_NUM_BATCHED_TOKENS`: defaults to `8192`
- `NANOVLLM_SERVERPOOL_MAX_NUM_SEQS`: defaults to `16`
- `NANOVLLM_SERVERPOOL_MAX_MODEL_LEN`: defaults to `4096`
- `NANOVLLM_SERVERPOOL_GPU_MEMORY_UTILIZATION`: defaults to `0.95`
- `NANOVLLM_SERVERPOOL_ENFORCE_EAGER`: defaults to `false`
- `NANOVLLM_SERVERPOOL_DEVICES`: defaults to `0`
- `NANOVLLM_MAX_GENERATE_LENGTH`: defaults to `2000`
- `NANOVLLM_TEMPERATURE`: defaults to `1.0`
- `REQUEST_LOG_DIR`: optional persistent request log directory. Defaults to `/data/logs` when `/data` exists
- `GRADIO_QUEUE_MAX_SIZE`: defaults to `10`
- `GRADIO_DEFAULT_CONCURRENCY_LIMIT`: defaults to `4` (uses async server pool bridge for thread-safe concurrency)
- `DENOISE_MAX_CONCURRENT`: defaults to `1` (limits concurrent ZipEnhancer denoise requests to avoid GPU OOM)
- `GRADIO_SSR_MODE`: defaults to `false`
|