Orpheus 3B — GRPO LoRA for Conversational TTS

LoRA adapter trained with Group Relative Policy Optimization (GRPO) on Orpheus 3B for conversational speech synthesis.

What is Orpheus?

Orpheus is a 3B parameter LLM-based TTS model from Canopy Labs. It generates SNAC audio tokens autoregressively, producing natural speech with emotion and prosody inferred from text context.

Training

Base model: canopylabs/orpheus-3b-0.1-ft (3B params, Llama 3 architecture)
Method: GRPO (Group Relative Policy Optimization) — reinforcement learning with UTMOS as reward signal
Adapter: LoRA (r=16, alpha=32, all linear layers)
Reward: UTMOS naturalness score (target: maximize perceived quality)
Dataset: Expresso conversational speech corpus
Hardware: NVIDIA A10G 24GB

Why GRPO?

Standard SFT fine-tuning of TTS models risks overfitting to surface patterns in the training data. GRPO uses a learned reward model (UTMOS) to optimize directly for perceived audio quality, which better aligns with human preferences for naturalness.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base Orpheus
base_model = AutoModelForCausalLM.from_pretrained("canopylabs/orpheus-3b-0.1-ft")
tokenizer = AutoTokenizer.from_pretrained("canopylabs/orpheus-3b-0.1-ft")

# Apply GRPO LoRA
model = PeftModel.from_pretrained(base_model, "Tachyeon/orpheus-3b-conversational-grpo")
model = model.merge_and_unload()  # Optional: merge for faster inference

For production inference via llama.cpp GGUF, see Project Maya.

Part of Project Maya

This adapter was trained as part of Project Maya — a real-time conversational voice AI system achieving <2s end-to-end latency with:

Orpheus 3B TTS via llama.cpp (129 tok/s, RTF 0.64)
Llama 3.2 3B LLM (155 tok/s)
faster-whisper STT with hallucination mitigation
Multi-GPU streaming pipeline (4x A10G)

Repos:

Research Context

This GRPO approach was informed by GLM-4-Voice which demonstrated that RL-based optimization (DPO/GRPO) can improve TTS quality metrics beyond what supervised fine-tuning achieves alone.

Citation

@misc{orpheus2025canopy,
  title={Orpheus TTS},
  author={Canopy Labs},
  year={2025},
  url={https://github.com/canopylabs/orpheus-tts}
}

Downloads last month: 2

Model tree for Tachyeon/orpheus-3b-conversational-grpo

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

canopylabs/orpheus-3b-0.1-pretrained

Finetuned

canopylabs/orpheus-3b-0.1-ft

Adapter

(5)

this model

Dataset used to train Tachyeon/orpheus-3b-conversational-grpo

Paper for Tachyeon/orpheus-3b-conversational-grpo

GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot

Paper • 2412.02612 • Published Dec 3, 2024