Orpheus 3B Hindi Male TTS
A LoRA finetune of canopylabs/3b-hi-pretrain-research_release on the ai4bharat/Rasa Hindi Male subset, producing a single consistent male Hindi voice named arjun.
Training Details
| Base model | canopylabs/3b-hi-pretrain-research_release |
| Dataset | ai4bharat/Rasa — Hindi config, Male speaker |
| Train examples | 12,116 utterances (~23.78 hours) |
| Codec | SNAC 24kHz (hubertsiuzdak/snac_24khz) |
| LoRA rank | 32 (RSLoRA, α=64) |
| LoRA targets | All attention + MLP projections + lm_head + embed_tokens |
| Epochs | 3 |
| Batch size | 4 (effective, grad accum) |
| Learning rate | 2e-4 (cosine schedule) |
| Hardware | 1× NVIDIA A100-SXM4-40GB |
Generated Samples
All samples generated with temperature=0.4, top_p=0.9, repetition_penalty=1.1.
नमस्ते, मेरा नाम अर्जुन है और मैं दिल्ली में रहता हूँ। मुझे हिंदी में बात करना बहुत पसंद है क्योंकि यह मेरी मातृभाषा है।
आज सुबह से ही मौसम बहुत सुहाना है और आसमान में हल्के बादल छाए हुए हैं। ऐसे मौसम में चाय पीते हुए किताब पढ़ना बहुत अच्छा लगता है।
भारत एक विविधताओं से भरा हुआ देश है जहाँ अलग-अलग भाषाएँ, संस्कृतियाँ और परंपराएँ एक साथ फलती-फूलती हैं। यहाँ के लोग मिलजुल कर रहते हैं और एक-दूसरे की मदद करते हैं।
Usage
import torch
import wave
import numpy as np
from snac import SNAC
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_ID = "edzsaji26/orpheus-3b-0.1-hindi-male-lora"
VOICE_NAME = "arjun"
AUDIO_OFFSET = 128266
STOP_TOKEN = 128258
SAMPLE_RATE = 24_000
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype=torch.bfloat16, device_map="auto")
model.eval()
snac = SNAC.from_pretrained("hubertsiuzdak/snac_24khz").eval().to(device)
def synthesise(text):
prompt = f"{VOICE_NAME}: {text}"
ids = tokenizer(prompt, return_tensors="pt").input_ids
input_ids = torch.cat([
torch.tensor([[128259]]),
ids,
torch.tensor([[128009, 128260, 128261, 128257]])
], dim=1).to(device)
with torch.inference_mode():
out = model.generate(
input_ids=input_ids,
max_new_tokens=2000,
do_sample=True,
temperature=0.4,
top_p=0.9,
repetition_penalty=1.1,
eos_token_id=STOP_TOKEN,
)
new_tokens = out[0, input_ids.shape[1]:].tolist()
audio_tokens = [t for t in new_tokens if t != STOP_TOKEN]
n_frames = len(audio_tokens) // 7
audio_tokens = audio_tokens[:n_frames * 7]
c0, c1, c2 = [], [], []
for f in range(n_frames):
i = f * 7
c0.append(audio_tokens[i] - AUDIO_OFFSET)
c1.append(audio_tokens[i+1] - AUDIO_OFFSET - 4096)
c2.append(audio_tokens[i+2] - AUDIO_OFFSET - 2*4096)
c2.append(audio_tokens[i+3] - AUDIO_OFFSET - 3*4096)
c1.append(audio_tokens[i+4] - AUDIO_OFFSET - 4*4096)
c2.append(audio_tokens[i+5] - AUDIO_OFFSET - 5*4096)
c2.append(audio_tokens[i+6] - AUDIO_OFFSET - 6*4096)
codes = [torch.tensor(c0).unsqueeze(0).to(device),
torch.tensor(c1).unsqueeze(0).to(device),
torch.tensor(c2).unsqueeze(0).to(device)]
with torch.inference_mode():
audio = snac.decode(codes)
waveform = audio.squeeze().cpu().numpy()
int16 = (waveform * 32767).clip(-32768, 32767).astype(np.int16)
with wave.open("output.wav", "wb") as wf:
wf.setnchannels(1)
wf.setsampwidth(2)
wf.setframerate(SAMPLE_RATE)
wf.writeframes(int16.tobytes())
synthesise("नमस्ते, मेरा नाम अर्जुन है और मैं आपसे बात करके बहुत खुश हूँ।")
Prompt Format
arjun: <hindi text here>
The voice name arjun is the learned speaker identity — always include it as the prefix.
Limitations
- Single male speaker (Arjun) — not multi-speaker
- Hindi only
- Based on Rasa dataset styles: neutral speech, commands, conversations, news, emotions
- For best results keep utterances under ~15 seconds
Citation
If you use this model, please also cite the underlying work:
@misc{orpheus2025,
title = {Orpheus TTS},
author = {Canopy Labs},
year = {2025},
url = {https://github.com/canopyai/Orpheus-TTS}
}
@inproceedings{ai4bharat2024rasa,
author = {Praveen Srinivasa Varadhan and Ashwin Sankar and Giri Raju and Mitesh M. Khapra},
title = {Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings},
booktitle = {Proc. INTERSPEECH 2024},
year = {2024}
}
- Downloads last month
- 65