YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

CSM 1b MLX Tahm Kench Voice Model

This is a merged LoRA checkpoint of senstella/csm-1b-mlx with a custom LoRA fine-tune for expressive voice synthesis in the style of Tahm Kench from League of Legends. The dataset used for the fine-tuning is here.

The unmerged lora adapter is additionally available in the lora_tahm_kench_v2.2 subdir.

Usage

Here are instructions for OS X using csm_mlx. Tahm Kench is speaker 0.

A few tips:

  • Use conservative sampler params and short phrases. If you need longer segments, string them together with Context.
  • If max_audio_length is too long, it can cause quality issues.
  • You will probably need more than one sample to get a really good generation.
  • This version of the model seems to get confused by apostrophes. Avoid them if possible. Will try to find more samples to clear this up.
from mlx_lm.sample_utils import make_sampler
from huggingface_hub import hf_hub_download
from csm_mlx import CSM, csm_1b, generate
import audiofile

text = "I do not hold grudges. I marinade them."
filename = "test.wav"

weights = hf_hub_download(repo_id="xlr8harder/csm-1b-mlx-tahm-kench", filename="ckpt.safetensors")
csm = CSM(csm_1b())
csm.load_weights(weights)
sampler = make_sampler(temp=0.3, top_k=10, top_p=0.9)
audio = generate(
    csm,
    text=text,
    speaker=0,
    context=[],
    sampler=sampler,
)
audiofile.write(filename, audio, 24000)
print(f"Wrote to {filename}")
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support