Megha Bangla TTS - Fine-tuned Model
This is a fine-tuned Bangla Text-to-Speech (TTS) model based on Microsoft's SpeechT5 architecture.
Model Details
- Base Model: microsoft/speecht5_tts
- Fine-tuned on: EsferSami/small-female-bangla-voice-data
- Language: Bangla (Bengali)
- Gender: Female voice
- Sampling Rate: 16kHz
Usage
from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
import torch
import soundfile as sf
# Load model and processor
processor = SpeechT5Processor.from_pretrained("your-username/megha-bangla-tts")
model = SpeechT5ForTextToSpeech.from_pretrained("your-username/megha-bangla-tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
# Generate speech
text = "আমি বাংলা ভাষায় কথা বলতে পারি।"
inputs = processor(text=text, return_tensors="pt")
with torch.no_grad():
speaker_embeddings = torch.zeros((1, model.config.speaker_embedding_dim))
speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
# Save output
sf.write("output.wav", speech.cpu().numpy(), samplerate=16000)
Training Details
- Epochs: 3
- Batch Size: 4
- Learning Rate: 1e-4
- Dataset Size: 321 samples
License
MIT License
Citation
If you use this model, please cite:
@misc{megha-bangla-tts,
author = {Megha Model Team},
title = {Megha Bangla TTS - Fine-tuned SpeechT5 Model},
year = {2026},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/your-repo/megha-bangla-tts}}
}
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support