VoxCPM LoRA - Indonesian Female Voice v2
LoRA adapter for VoxCPM 1.5 fine-tuned for natural Indonesian female speech.
v2 improvements: Higher capacity LoRA (r=64), trained on 400 samples (vs 112 in v1), lower final loss.
Model Details
| Property | Value |
|---|---|
| Base model | openbmb/VoxCPM1.5 (800M params) |
| LoRA rank (r) | 64 |
| LoRA alpha | 32 |
| Training steps | 3500 |
| Training samples | 400 clips |
| Final loss | 0.750 |
| Sample rate | 44.1 kHz |
Installation
Safetensors support requires the latest VoxCPM from GitHub:
pip install git+https://github.com/openbmb/VoxCPM.git
Usage
from voxcpm import VoxCPM
from huggingface_hub import snapshot_download
import soundfile as sf
# Download LoRA
lora_path = snapshot_download("aisyahsyihab/voxcpm-lora-indonesian-female-v2")
# Load model with LoRA
model = VoxCPM.from_pretrained(
"openbmb/VoxCPM1.5",
lora_weights_path=lora_path,
load_denoiser=False
)
# Generate speech
audio = model.generate(
text="Halo, apa kabar hari ini?",
cfg_value=2.5,
normalize=True
)
# Save output
sf.write("output.wav", audio, model.tts_model.sample_rate)
Training Configuration
lora:
enable_lm: true
enable_dit: true
enable_proj: false
r: 64
alpha: 32
dropout: 0.0
target_modules_lm: ["q_proj", "v_proj", "k_proj", "o_proj"]
target_modules_dit: ["q_proj", "v_proj", "k_proj", "o_proj"]
training:
batch_size: 4
grad_accum_steps: 4
learning_rate: 0.0001
warmup_steps: 100
max_steps: 4000
Hardware Requirements
- Training: NVIDIA RTX 5090 (32GB VRAM)
- Inference: ~6GB VRAM (with base model)
Inference Tips
- CFG value: 2.0-3.0 (higher = more adherence to voice, lower = more natural)
- Best for: Casual Indonesian speech, conversational tone
- Optimal text length: 1-3 sentences at a time
Changelog
- v2 (2026-03-18): Higher capacity LoRA (r=64), 400 samples, loss 0.750
- v1 (2026-02-09): Initial release, r=32, 112 samples, loss 0.790
License
Apache 2.0 (following VoxCPM base model license)
Acknowledgments
- OpenBMB for VoxCPM 1.5
- Training conducted with VoxCPM's official LoRA training pipeline