Sesame CSM-1B Acholi TTS — v2

Fine-tuned version of Sesame CSM-1B for Acholi (ach) text-to-speech synthesis using LoRA, trained on an expanded multi-source Acholi speech dataset.

Part of a masters thesis at Makerere University on offline English-to-Acholi movie translation.

Model Details

Field	Value
Base model	`unsloth/csm-1b` (Sesame Conversational Speech Model, 1B params)
Fine-tuning method	LoRA (Low-Rank Adaptation)
LoRA rank	r=32, alpha=64
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training steps	15,000 (early stopped at step 14,200)
Best checkpoint	step 14,200
Best eval_loss	8.368
Test eval_loss	8.325
Training time	~7h 37min (L40S 46GB GPU)
Language	Acholi (ach)

Training Data

Source	Split	Samples	Notes
Sunbird/salt `studio-ach`	train	4,062	Studio-quality single speaker
google/WaxalNLP `ach_tts`	train	309	Community speech
`acellam/acholi-english-movie-dialogue`	train	1,743	Movie dialogue, 16kHz
Total		6,114

All audio resampled to 24,000 Hz. Samples filtered to 0.5–8 seconds duration.

Training Configuration

learning_rate = 5e-6
lr_scheduler_type = "cosine"
warmup_steps = 1000
max_steps = 15000
per_device_train_batch_size = 2
gradient_accumulation_steps = 4  # effective batch = 8
weight_decay = 0.05
fp16 / bf16 = True (auto)
early_stopping_patience = 20  # 2000-step tolerance
early_stopping_threshold = 0.001
gradient_checkpointing = "unsloth"

Experiment History

Version	Dataset	LoRA rank	Best eval_loss	Best step	Notes
v1	SALT studio-ach only (~3,500 samples)	r=128, alpha=128	5.031	3,000	Studio-only, early plateau
v2	SALT + WaxalNLP + movie dialogue (6,114 total)	r=32, alpha=64	8.368	14,200	Multi-source, longer training

Note on loss comparison: v1's lower loss (5.031 vs 8.368) reflects training on a narrow, homogeneous single-speaker studio dataset — not a better model. v2 was trained on a more diverse, multi-speaker, multi-domain dataset which is inherently harder to fit. The expected benefit is better generalization to real movie speech. Perceptual quality comparison (MOS evaluation) is ongoing.

Key Design Decisions (v2)

LoRA rank reduced r=128 → r=32: Mirroring Spark-TTS findings that rank has minimal impact when data is the bottleneck. Reduces memory and training time.
Early stopping patience increased 10 → 20: Allows 2,000 steps of patience to avoid premature stopping on the noisier multi-source dataset.
save_total_limit=2: Disk safety — each Sesame checkpoint is ~4GB.
speaker_id via cast_column: Schema-level dtype fix required for multi-source dataset concatenation.
Movie audio resampled from 44100Hz → 16kHz: Native movie audio rate corrected at preprocessing.

Usage

from transformers import CsmForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch

# Load base model
base_model = CsmForConditionalGeneration.from_pretrained("unsloth/csm-1b")
processor = AutoProcessor.from_pretrained("unsloth/csm-1b")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "acellam/sesame-csm-salt-ach-v2")
model.eval()

# Inference
text = "Kwo me dano obedo maber tutwal."  # Example Acholi text
conversation = [{"role": "0", "content": [{"type": "text", "text": text}]}]
inputs = processor.apply_chat_template(conversation, tokenize=True, return_dict=True, return_tensors="pt")

with torch.no_grad():
    output = model.generate(**inputs)

Related Models & Resources

Resource	Link
Spark-TTS v11 (Acholi)	acellam/spark-tts-salt-ach-v11
Spark-TTS v10 (Acholi)	acellam/spark-tts-salt-ach-v10
Sesame CSM-1B v1 (Acholi)	acellam/spark-tts-salt-sesame
Sunbird SparkTTS (SALT)	Sunbird/spark-tts-salt
Training dataset	acellam/acholi-english-movie-dialogue
Training script	GitLab: sesame-tts-training
Live demo / dashboard	lebsync.com

Citation

@mastersthesis{acellam2026acholi,
  title     = {Offline Speech Translation System for English Movies into Acholi Using AI Techniques},
  author    = {Guy Acellam},
  school    = {Makerere University, School of Computing and Informatics Technology},
  year      = {2026},
  note      = {Supervisor: Dr. Rose Nakibuule. Student ID: 2017/HD05/84U}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for acellam/sesame-csm-salt-ach-v2

Base model

sesame/csm-1b

Finetuned

unsloth/csm-1b

Adapter

(30)

this model