Arabic Orpheus TTS Fine-tuning

Fine-tuning unsloth/orpheus-3b-0.1-ft for Arabic speech synthesis using a 2-stage pipeline:

Multi-speaker Arabic adaptation
Single-speaker specialization

Highlights

Formal Arabic / MSA-oriented output
2-stage LoRA fine-tuning
Arabic text normalization
SNAC-based audio tokenization
Demo audio samples included

Important note on data

The training dataset is not included in this repository.

I used licensed/restricted data and cannot redistribute:

raw audio
metadata
transcriptions
processed dataset artifacts derived from the source corpus

This repository only contains the training/inference code, configuration, and demo outputs.

Training setup

Base model: unsloth/orpheus-3b-0.1-ft
Stage 1: Arabic multi-speaker training
Stage 2: single-speaker fine-tuning on a female voice
Hardware used: L40S
LoRA fine-tuning with Unsloth

🔊 Audio Demos

Demo 1

Demo 2

Demo 3

Notes

This project is intended for research and educational purposes. Please respect the terms of any upstream model and dataset licenses.

Citation

@misc{toyin2025arvoicemultispeakerdatasetarabic,
      title={ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis}, 
      author={Hawau Olamide Toyin and Rufael Marew and Humaid Alblooshi and Samar M. Magdy and Hanan Aldarmaki},
      year={2025},
      eprint={2505.20506},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.20506}, 
}

Downloads last month: 2

Safetensors

Model size

3B params

Tensor type

F16

Paper for Abdullah-Baqais/orpheus-arabic-tts-16bit

ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis

Paper • 2505.20506 • Published May 26, 2025 • 1