Fine tune by: Arttu Pakarinen

Fine TTS model with Finnish parlament speeches.

Produces a lot of 'Ööö, äää öhm...'

Better with longer sentences.

🧪 Test script

#!/usr/bin/env python3
import torch
from transformers import AutoProcessor, CsmForConditionalGeneration

MODEL_ID = "ArttuPakarinen/sesame-csm-FIN-parlament-full-finetune"
BASE_ID = "sesame/csm-1b"  # processor comes from the base model

device = "cuda" if torch.cuda.is_available() else "cpu"

# Disable flash / mem-efficient SDPA if your setup has issues with them
if hasattr(torch.backends.cuda, "sdp_kernel"):
    torch.backends.cuda.sdp_kernel(
        enable_flash=False,
        enable_math=True,
        enable_mem_efficient=False,
    )

processor = AutoProcessor.from_pretrained(BASE_ID)

model = CsmForConditionalGeneration.from_pretrained(
    MODEL_ID,
    torch_dtype="auto",
    low_cpu_mem_usage=True,
    attn_implementation="eager",
).to(device)

model.eval()
model.config.use_cache = True
try:
    model.generation_config.attn_implementation = "eager"
except Exception:
    pass

text = "Ihanaa, kun voi generoida ääntä!"
conversation = [{"role": "0", "content": [{"type": "text", "text": text}]}]

raw = processor.apply_chat_template(
    conversation,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
)

# attention_mask -> bool (some setups expect this)
inputs = {
    k: (v.to(device).to(torch.bool) if k == "attention_mask" else v.to(device))
    for k, v in raw.items()
}

with torch.no_grad(), torch.amp.autocast("cuda", enabled=(device == "cuda")):
    audio = model.generate(
        **inputs,
        output_audio=True,
        use_cache=True,
        max_new_tokens=600,
        do_sample=True,
        temperature=0.8,
        top_p=0.95,
        pad_token_id=processor.tokenizer.pad_token_id,
        eos_token_id=processor.tokenizer.eos_token_id,
    )

processor.save_audio(audio, "tulos.wav")
print("OK: tulos.wav")

Downloads last month
3
Safetensors
Model size
2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ArttuPakarinen/sesame-csm-FIN-parlament-full-finetune

Base model

sesame/csm-1b
Finetuned
(26)
this model

Dataset used to train ArttuPakarinen/sesame-csm-FIN-parlament-full-finetune