CSM 1b MLX Tahm Kench Voice Model

This is a merged LoRA checkpoint of senstella/csm-1b-mlx with a custom LoRA fine-tune for expressive voice synthesis in the style of Tahm Kench from League of Legends. The dataset used for the fine-tuning is here.

The unmerged lora adapter is additionally available in the lora_tahm_kench_v2.2 subdir.

Usage

Here are instructions for OS X using csm_mlx. Tahm Kench is speaker 0.

A few tips:

Use conservative sampler params and short phrases. If you need longer segments, string them together with Context.
If max_audio_length is too long, it can cause quality issues.
You will probably need more than one sample to get a really good generation.
This version of the model seems to get confused by apostrophes. Avoid them if possible. Will try to find more samples to clear this up.

from mlx_lm.sample_utils import make_sampler
from huggingface_hub import hf_hub_download
from csm_mlx import CSM, csm_1b, generate
import audiofile

text = "I do not hold grudges. I marinade them."
filename = "test.wav"

weights = hf_hub_download(repo_id="xlr8harder/csm-1b-mlx-tahm-kench", filename="ckpt.safetensors")
csm = CSM(csm_1b())
csm.load_weights(weights)
sampler = make_sampler(temp=0.3, top_k=10, top_p=0.9)
audio = generate(
    csm,
    text=text,
    speaker=0,
    context=[],
    sampler=sampler,
)
audiofile.write(filename, audio, 24000)
print(f"Wrote to {filename}")

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support