irow
/

conditional-audio-diffusion

Model card Files Files and versions

conditional-audio-diffusion / README.md

irow's picture

Update README.md

3113c33 about 3 years ago

|

history blame contribute delete

897 Bytes

	---
	datasets:
	- flexthink/audiomnist
	pipeline_tag: text-to-speech
	---

	This is a basic audio diffusion model using Unet. I've uploaded the weights and training code.
	The sample method of the model is used to generate whatever spoken digit you want.
	I used the awesome code provided by HuggingFace audio diffusers to generate Mel-spectrograms which were then used to train the model.
	For the model code I used the denoising-diffusion-pytorch repo found at https://github.com/lucidrains/denoising-diffusion-pytorch
	![alt text](sample24_4_6.jpg "Title") ![alt text]( sample24_5_5.jpg
	"Title") ![alt text]( sample24_6_3.jpg
	"Title") ![alt text]( sample24_7_2.jpg
	"Title")


	The images found in the files are sample{epoch}_{sample#}_{digit}.jpg. They also have corresponding audio files.
	The audio is VERY quiet, so turn up the speakers to hear better. (Just don't forget to turn it down after!)