VoxCPM LoRA - Indonesian Female Voice
LoRA adapter for VoxCPM 1.5 fine-tuned for natural Indonesian female speech.
Model Details
| Property | Value |
|---|---|
| Base model | openbmb/VoxCPM1.5 (800M params) |
| LoRA rank (r) | 32 |
| LoRA alpha | 16 |
| Training steps | 3500 |
| Training samples | 112 clips |
| Final loss | 0.790 |
| Sample rate | 44.1 kHz |
Installation
Safetensors support requires the latest VoxCPM from GitHub:
pip install git+https://github.com/openbmb/VoxCPM.git
Usage
from voxcpm import VoxCPM
from huggingface_hub import snapshot_download
import soundfile as sf
# Download LoRA
lora_path = snapshot_download("aisyahsyihab/voxcpm-lora-indonesian-female")
# Load model with LoRA
model = VoxCPM.from_pretrained(
"openbmb/VoxCPM1.5",
lora_weights_path=lora_path,
load_denoiser=False
)
# Generate speech
audio = model.generate(
text="Halo, apa kabar hari ini?",
cfg_value=3.0,
normalize=True
)
# Save output
sf.write("output.wav", audio, model.tts_model.sample_rate)
Training Configuration
lora:
enable_lm: true
enable_dit: true
enable_proj: false
r: 32
alpha: 16
dropout: 0.0
target_modules_lm: ["q_proj", "v_proj", "k_proj", "o_proj"]
target_modules_dit: ["q_proj", "v_proj", "k_proj", "o_proj"]
training:
batch_size: 2
grad_accum_steps: 4
learning_rate: 0.0001
warmup_steps: 100
max_steps: 4000
Hardware Requirements
- Training: NVIDIA RTX 3090 (24GB VRAM)
- Inference: ~4GB VRAM (with base model)
Inference Tips
- CFG value: 2.0-4.0 (higher = more adherence to voice, lower = more natural)
- Best for: Casual Indonesian speech, conversational tone
- Optimal text length: 1-3 sentences at a time
License
Apache 2.0 (following VoxCPM base model license)
Acknowledgments
- OpenBMB for VoxCPM 1.5
- Training conducted with VoxCPM's official LoRA training pipeline