Piper Sarah Atlas β en_US-sarah-atlas
A custom Piper TTS voice model fine-tuned to sound like ElevenLabs Sarah β the Atlas assistant's phone voice.
Purpose
Atlas uses ElevenLabs Sarah for phone calls (premium, cloud) and this model for on-device TTS on desktop, iOS, and Android. Both should sound like the same voice.
Model Details
| Property | Value |
|---|---|
| Base checkpoint | en_US-lessac-medium (epoch 2164) |
| Fine-tuned on | 1,500 utterances (~1 hour) of ElevenLabs Sarah audio |
| TTS model used | eleven_turbo_v2_5, stability=0.6, similarity_boost=0.8 |
| Architecture | VITS (Piper medium) |
| Output sample rate | 22,050 Hz |
| ONNX size | ~20MB |
| Training GPU | A100 40GB (GCP spot, ~$12 total) |
| Training epochs | 1,500 |
Usage
CLI
echo "Hi, this is Atlas. How can I help you today?" | \
piper -m en_US-sarah-atlas.onnx --output_file output.wav
Streaming (raw PCM)
echo "You have 3 urgent emails." | \
piper -m en_US-sarah-atlas.onnx --output_raw | aplay -r 22050 -f S16_LE -c 1
Python
import subprocess
result = subprocess.run(
["piper", "-m", "en_US-sarah-atlas.onnx", "--output_raw"],
input=b"Let me check your calendar.",
capture_output=True
)
pcm_audio = result.stdout # 22050Hz mono 16-bit PCM
Training Data
- ~279 sentences extracted from Atlas codebase (
pattern_mapping.py,acknowledgments.py) - ~1,221 sentences generated via ElevenLabs Sarah API (
eleven_turbo_v2_5) - ASR-validated with Whisper (rejected samples with <90% transcript match)
- Corpus sources: Atlas tool commands, acknowledgments, LJSpeech phoneme coverage, numbers/dates/names, conversational filler, news/Wikipedia extracts
Feature
Feature 099 β Piper Sarah Voice Training
Branch: 099-piper-sarah-voice
Training framework doc: specs/099-piper-sarah-voice/training-framework.md