Audio
Collection
Dhivehi Voice AI Collection: Tools for Thaana speech recognition (ASR), text-to-speech (TTS), and audio processing • 32 items • Updated • 4
Multispeaker Dhivehi speech generation model based on sesame/csm-1b, fine-tuned on synthetic male and female Dhivehi voice data.
sesame/csm-1balakxender/voice-syntheticrole = "0"role = "1"import torch
from transformers import CsmForConditionalGeneration, AutoProcessor
model_id = "alakxender/csm-1b-dhivehi-2-speakers"
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load model and processor
processor = AutoProcessor.from_pretrained(model_id)
model = CsmForConditionalGeneration.from_pretrained(model_id, device_map=device)
# Set speaker and input Dhivehi text
role = "0" # "0" for female, "1" for male
content = "މެލޭޝިއާގައި އިތުރުކުރާ ޓެކްސް، ދިވެހި ދަރިވަރުންނަށް ބުރައަކަށް ނުވާނެ ގޮތެއް ހޯދައިދޭނަން: ހައިދަރު"
conversation = [
{"role": role, "content": [{"type": "text", "text": content}]}
]
inputs = processor.apply_chat_template(
conversation,
tokenize=True,
return_dict=True
).to(device)
# Generate audio
audio = model.generate(**inputs, output_audio=True)
# Save to file
processor.save_audio(audio, f"output_{role}.wav")
More usage info at: sesame/csm-1b
"0": Female synthetic voice"1": Male synthetic voicerole field in the chat input template.This fine-tuned checkpoint was created for Dhivehi speech synthesis and is intended for research and educational use only. All voice outputs generated by this model are entirely synthetic. Any resemblance to real persons, living or deceased, is purely coincidental and unintentional. The creators of this model do not endorse or condone the use of this system for: