Megha Bangla TTS - Fine-tuned Model

This is a fine-tuned Bangla Text-to-Speech (TTS) model based on Microsoft's SpeechT5 architecture.

Model Details

  • Base Model: microsoft/speecht5_tts
  • Fine-tuned on: EsferSami/small-female-bangla-voice-data
  • Language: Bangla (Bengali)
  • Gender: Female voice
  • Sampling Rate: 16kHz

Usage

from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
import torch
import soundfile as sf

# Load model and processor
processor = SpeechT5Processor.from_pretrained("your-username/megha-bangla-tts")
model = SpeechT5ForTextToSpeech.from_pretrained("your-username/megha-bangla-tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

# Generate speech
text = "আমি বাংলা ভাষায় কথা বলতে পারি।"
inputs = processor(text=text, return_tensors="pt")

with torch.no_grad():
    speaker_embeddings = torch.zeros((1, model.config.speaker_embedding_dim))
    speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)

# Save output
sf.write("output.wav", speech.cpu().numpy(), samplerate=16000)

Training Details

  • Epochs: 3
  • Batch Size: 4
  • Learning Rate: 1e-4
  • Dataset Size: 321 samples

License

MIT License

Citation

If you use this model, please cite:

@misc{megha-bangla-tts,
  author = {Megha Model Team},
  title = {Megha Bangla TTS - Fine-tuned SpeechT5 Model},
  year = {2026},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/your-repo/megha-bangla-tts}}
}
Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support