Spaces:
Running on A10G
Running on A10G
A newer version of the Gradio SDK is available: 6.12.0
metadata
title: VoxCPM Demo
emoji: 🎙️
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
python_version: '3.10'
pinned: true
license: apache-2.0
short_description: VoxCPM2 Nano-vLLM Demo
Experimental Gradio Space demo for VoxCPM2 powered by nanovllm-voxcpm.
This repo keeps the existing Gradio frontend layout and swaps only the backend inference path to Nano-vLLM.
Notes:
- This is the non-Docker experiment path. It relies on a persistent GPU Gradio Space.
flash-attnandnanovllm-voxcpmare pinned inrequirements.txt, so they install during Space build instead of on first request.- ZipEnhancer denoising is supported for reference audio cloning. The default denoiser model is
iic/speech_zipenhancer_ans_multiloss_16k_base. - The Space now defaults to a hardened runtime path:
- If
/dataexists, request logs are written to daily JSONL files like/data/logs/2026-04-05.jsonl. - Model, pip, and temporary caches now stay on the default runtime paths instead of consuming persistent storage.
- Backend prewarm is enabled by default, so startup can begin dependency install + model load in the background.
- Gradio SSR is disabled by default for stability.
- If
- The first cold start may still spend extra time installing dependencies, downloading the model, and loading the server.
SenseVoiceSmallis downloaded from Hugging Face and cached locally before ASR initialization.ASR_DEVICEdefaults tocputo avoid competing with TTS GPU memory.- Reference audio longer than 50 seconds is rejected early before denoising or Nano-vLLM encoding.
- The
LocDiT flow-matching stepsslider is wired to Nano-vLLM serverinference_timesteps; changing it rebuilds the backend server. - The existing
normalizetoggle is kept for UI compatibility, but Nano-vLLM currently ignores it. - The existing
denoisetoggle now runs ZipEnhancer on the reference audio before encoding it to latents. packages.txtis required because this path needs extra system build dependencies.
Stability recommendation:
- Use a persistent GPU Space.
- Attach persistent storage so
/datais available. - Keep the default queue concurrency at
1unless you have profiled GPU memory headroom.
Recommended environment variables:
HF_REPO_ID: Hugging Face model repo id. Defaults toopenbmb/VoxCPM2HF_TOKEN: required if the model repo is privateNANOVLLM_MODEL: optional direct model ref override. Can be a local path or HF repo idNANOVLLM_MODEL_PATH: optional local model path overrideASR_DEVICE: defaults tocpuZIPENHANCER_MODEL_ID: optional ModelScope denoiser model id or local path. Defaults toiic/speech_zipenhancer_ans_multiloss_16k_baseNANOVLLM_INFERENCE_TIMESTEPS: initial default is10NANOVLLM_PREWARM: defaults totrueNANOVLLM_SERVERPOOL_MAX_NUM_BATCHED_TOKENS: defaults to8192NANOVLLM_SERVERPOOL_MAX_NUM_SEQS: defaults to16NANOVLLM_SERVERPOOL_MAX_MODEL_LEN: defaults to4096NANOVLLM_SERVERPOOL_GPU_MEMORY_UTILIZATION: defaults to0.95NANOVLLM_SERVERPOOL_ENFORCE_EAGER: defaults tofalseNANOVLLM_SERVERPOOL_DEVICES: defaults to0NANOVLLM_MAX_GENERATE_LENGTH: defaults to2000NANOVLLM_TEMPERATURE: defaults to1.0REQUEST_LOG_DIR: optional persistent request log directory. Defaults to/data/logswhen/dataexistsGRADIO_QUEUE_MAX_SIZE: defaults to10GRADIO_DEFAULT_CONCURRENCY_LIMIT: defaults to4(uses async server pool bridge for thread-safe concurrency)DENOISE_MAX_CONCURRENT: defaults to1(limits concurrent ZipEnhancer denoise requests to avoid GPU OOM)GRADIO_SSR_MODE: defaults tofalse