| --- |
| datasets: |
| - flexthink/audiomnist |
| pipeline_tag: text-to-speech |
| --- |
| |
| This is a basic audio diffusion model using Unet. I've uploaded the weights and training code. |
| The sample method of the model is used to generate whatever spoken digit you want. |
| I used the awesome code provided by HuggingFace audio diffusers to generate Mel-spectrograms which were then used to train the model. |
| For the model code I used the denoising-diffusion-pytorch repo found at https://github.com/lucidrains/denoising-diffusion-pytorch |
|     |
|
|
|
|
| The images found in the files are sample{epoch}_{sample#}_{digit}.jpg. They also have corresponding audio files. |
| The audio is VERY quiet, so turn up the speakers to hear better. (Just don't forget to turn it down after!) |
|
|
|
|