YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
CSM 1b MLX Tahm Kench Voice Model
This is a merged LoRA checkpoint of senstella/csm-1b-mlx with a custom LoRA fine-tune for expressive voice synthesis in the style of Tahm Kench from League of Legends. The dataset used for the fine-tuning is here.
The unmerged lora adapter is additionally available in the lora_tahm_kench_v2.2 subdir.
Usage
Here are instructions for OS X using csm_mlx. Tahm Kench is speaker 0.
A few tips:
- Use conservative sampler params and short phrases. If you need longer segments, string them together with Context.
- If
max_audio_lengthis too long, it can cause quality issues. - You will probably need more than one sample to get a really good generation.
- This version of the model seems to get confused by apostrophes. Avoid them if possible. Will try to find more samples to clear this up.
from mlx_lm.sample_utils import make_sampler
from huggingface_hub import hf_hub_download
from csm_mlx import CSM, csm_1b, generate
import audiofile
text = "I do not hold grudges. I marinade them."
filename = "test.wav"
weights = hf_hub_download(repo_id="xlr8harder/csm-1b-mlx-tahm-kench", filename="ckpt.safetensors")
csm = CSM(csm_1b())
csm.load_weights(weights)
sampler = make_sampler(temp=0.3, top_k=10, top_p=0.9)
audio = generate(
csm,
text=text,
speaker=0,
context=[],
sampler=sampler,
)
audiofile.write(filename, audio, 24000)
print(f"Wrote to {filename}")
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support