Sesame CSM-1B Acholi TTS β€” v2

Fine-tuned version of Sesame CSM-1B for Acholi (ach) text-to-speech synthesis using LoRA, trained on an expanded multi-source Acholi speech dataset.

Part of a masters thesis at Makerere University on offline English-to-Acholi movie translation.


Model Details

Field Value
Base model unsloth/csm-1b (Sesame Conversational Speech Model, 1B params)
Fine-tuning method LoRA (Low-Rank Adaptation)
LoRA rank r=32, alpha=64
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training steps 15,000 (early stopped at step 14,200)
Best checkpoint step 14,200
Best eval_loss 8.368
Test eval_loss 8.325
Training time ~7h 37min (L40S 46GB GPU)
Language Acholi (ach)

Training Data

Source Split Samples Notes
Sunbird/salt studio-ach train 4,062 Studio-quality single speaker
google/WaxalNLP ach_tts train 309 Community speech
acellam/acholi-english-movie-dialogue train 1,743 Movie dialogue, 16kHz
Total 6,114

All audio resampled to 24,000 Hz. Samples filtered to 0.5–8 seconds duration.


Training Configuration

learning_rate = 5e-6
lr_scheduler_type = "cosine"
warmup_steps = 1000
max_steps = 15000
per_device_train_batch_size = 2
gradient_accumulation_steps = 4  # effective batch = 8
weight_decay = 0.05
fp16 / bf16 = True (auto)
early_stopping_patience = 20  # 2000-step tolerance
early_stopping_threshold = 0.001
gradient_checkpointing = "unsloth"

Experiment History

Version Dataset LoRA rank Best eval_loss Best step Notes
v1 SALT studio-ach only (~3,500 samples) r=128, alpha=128 5.031 3,000 Studio-only, early plateau
v2 SALT + WaxalNLP + movie dialogue (6,114 total) r=32, alpha=64 8.368 14,200 Multi-source, longer training

Note on loss comparison: v1's lower loss (5.031 vs 8.368) reflects training on a narrow, homogeneous single-speaker studio dataset β€” not a better model. v2 was trained on a more diverse, multi-speaker, multi-domain dataset which is inherently harder to fit. The expected benefit is better generalization to real movie speech. Perceptual quality comparison (MOS evaluation) is ongoing.


Key Design Decisions (v2)

  • LoRA rank reduced r=128 β†’ r=32: Mirroring Spark-TTS findings that rank has minimal impact when data is the bottleneck. Reduces memory and training time.
  • Early stopping patience increased 10 β†’ 20: Allows 2,000 steps of patience to avoid premature stopping on the noisier multi-source dataset.
  • save_total_limit=2: Disk safety β€” each Sesame checkpoint is ~4GB.
  • speaker_id via cast_column: Schema-level dtype fix required for multi-source dataset concatenation.
  • Movie audio resampled from 44100Hz β†’ 16kHz: Native movie audio rate corrected at preprocessing.

Usage

from transformers import CsmForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch

# Load base model
base_model = CsmForConditionalGeneration.from_pretrained("unsloth/csm-1b")
processor = AutoProcessor.from_pretrained("unsloth/csm-1b")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "acellam/sesame-csm-salt-ach-v2")
model.eval()

# Inference
text = "Kwo me dano obedo maber tutwal."  # Example Acholi text
conversation = [{"role": "0", "content": [{"type": "text", "text": text}]}]
inputs = processor.apply_chat_template(conversation, tokenize=True, return_dict=True, return_tensors="pt")

with torch.no_grad():
    output = model.generate(**inputs)

Related Models & Resources

Resource Link
Spark-TTS v11 (Acholi) acellam/spark-tts-salt-ach-v11
Spark-TTS v10 (Acholi) acellam/spark-tts-salt-ach-v10
Sesame CSM-1B v1 (Acholi) acellam/spark-tts-salt-sesame
Sunbird SparkTTS (SALT) Sunbird/spark-tts-salt
Training dataset acellam/acholi-english-movie-dialogue
Training script GitLab: sesame-tts-training
Live demo / dashboard lebsync.com

Citation

@mastersthesis{acellam2026acholi,
  title     = {Offline Speech Translation System for English Movies into Acholi Using AI Techniques},
  author    = {Guy Acellam},
  school    = {Makerere University, School of Computing and Informatics Technology},
  year      = {2026},
  note      = {Supervisor: Dr. Rose Nakibuule. Student ID: 2017/HD05/84U}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for acellam/sesame-csm-salt-ach-v2

Base model

sesame/csm-1b
Finetuned
unsloth/csm-1b
Adapter
(30)
this model