Orpheus 3B Hindi Male TTS

A LoRA finetune of canopylabs/3b-hi-pretrain-research_release on the ai4bharat/Rasa Hindi Male subset, producing a single consistent male Hindi voice named arjun.

Training Details

Base model canopylabs/3b-hi-pretrain-research_release
Dataset ai4bharat/Rasa — Hindi config, Male speaker
Train examples 12,116 utterances (~23.78 hours)
Codec SNAC 24kHz (hubertsiuzdak/snac_24khz)
LoRA rank 32 (RSLoRA, α=64)
LoRA targets All attention + MLP projections + lm_head + embed_tokens
Epochs 3
Batch size 4 (effective, grad accum)
Learning rate 2e-4 (cosine schedule)
Hardware 1× NVIDIA A100-SXM4-40GB

Generated Samples

All samples generated with temperature=0.4, top_p=0.9, repetition_penalty=1.1.

नमस्ते, मेरा नाम अर्जुन है और मैं दिल्ली में रहता हूँ। मुझे हिंदी में बात करना बहुत पसंद है क्योंकि यह मेरी मातृभाषा है।

आज सुबह से ही मौसम बहुत सुहाना है और आसमान में हल्के बादल छाए हुए हैं। ऐसे मौसम में चाय पीते हुए किताब पढ़ना बहुत अच्छा लगता है।

भारत एक विविधताओं से भरा हुआ देश है जहाँ अलग-अलग भाषाएँ, संस्कृतियाँ और परंपराएँ एक साथ फलती-फूलती हैं। यहाँ के लोग मिलजुल कर रहते हैं और एक-दूसरे की मदद करते हैं।

Usage

import torch
import wave
import numpy as np
from snac import SNAC
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_ID     = "edzsaji26/orpheus-3b-0.1-hindi-male-lora"
VOICE_NAME   = "arjun"
AUDIO_OFFSET = 128266
STOP_TOKEN   = 128258
SAMPLE_RATE  = 24_000

device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype=torch.bfloat16, device_map="auto")
model.eval()

snac = SNAC.from_pretrained("hubertsiuzdak/snac_24khz").eval().to(device)

def synthesise(text):
    prompt = f"{VOICE_NAME}: {text}"
    ids = tokenizer(prompt, return_tensors="pt").input_ids
    input_ids = torch.cat([
        torch.tensor([[128259]]),
        ids,
        torch.tensor([[128009, 128260, 128261, 128257]])
    ], dim=1).to(device)

    with torch.inference_mode():
        out = model.generate(
            input_ids=input_ids,
            max_new_tokens=2000,
            do_sample=True,
            temperature=0.4,
            top_p=0.9,
            repetition_penalty=1.1,
            eos_token_id=STOP_TOKEN,
        )

    new_tokens = out[0, input_ids.shape[1]:].tolist()
    audio_tokens = [t for t in new_tokens if t != STOP_TOKEN]
    n_frames = len(audio_tokens) // 7
    audio_tokens = audio_tokens[:n_frames * 7]

    c0, c1, c2 = [], [], []
    for f in range(n_frames):
        i = f * 7
        c0.append(audio_tokens[i]   - AUDIO_OFFSET)
        c1.append(audio_tokens[i+1] - AUDIO_OFFSET - 4096)
        c2.append(audio_tokens[i+2] - AUDIO_OFFSET - 2*4096)
        c2.append(audio_tokens[i+3] - AUDIO_OFFSET - 3*4096)
        c1.append(audio_tokens[i+4] - AUDIO_OFFSET - 4*4096)
        c2.append(audio_tokens[i+5] - AUDIO_OFFSET - 5*4096)
        c2.append(audio_tokens[i+6] - AUDIO_OFFSET - 6*4096)

    codes = [torch.tensor(c0).unsqueeze(0).to(device),
             torch.tensor(c1).unsqueeze(0).to(device),
             torch.tensor(c2).unsqueeze(0).to(device)]

    with torch.inference_mode():
        audio = snac.decode(codes)

    waveform = audio.squeeze().cpu().numpy()
    int16 = (waveform * 32767).clip(-32768, 32767).astype(np.int16)

    with wave.open("output.wav", "wb") as wf:
        wf.setnchannels(1)
        wf.setsampwidth(2)
        wf.setframerate(SAMPLE_RATE)
        wf.writeframes(int16.tobytes())

synthesise("नमस्ते, मेरा नाम अर्जुन है और मैं आपसे बात करके बहुत खुश हूँ।")

Prompt Format

arjun: <hindi text here>

The voice name arjun is the learned speaker identity — always include it as the prefix.

Limitations

  • Single male speaker (Arjun) — not multi-speaker
  • Hindi only
  • Based on Rasa dataset styles: neutral speech, commands, conversations, news, emotions
  • For best results keep utterances under ~15 seconds

Citation

If you use this model, please also cite the underlying work:

@misc{orpheus2025,
  title  = {Orpheus TTS},
  author = {Canopy Labs},
  year   = {2025},
  url    = {https://github.com/canopyai/Orpheus-TTS}
}

@inproceedings{ai4bharat2024rasa,
  author    = {Praveen Srinivasa Varadhan and Ashwin Sankar and Giri Raju and Mitesh M. Khapra},
  title     = {Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings},
  booktitle = {Proc. INTERSPEECH 2024},
  year      = {2024}
}
Downloads last month
65
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for edzsaji26/orpheus-3b-0.1-hindi-male-lora