ssc-ssc-audio-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.0837	3.9089	1000	0.0754
0.0722	7.8149	2000	0.0724
0.0722	11.7209	3000	0.0722
0.0703	15.6268	4000	0.0729
0.0655	19.5328	5000	0.0710
0.0622	23.4388	6000	0.0694
0.0703	27.3448	7000	0.0716
0.0625	31.2507	8000	0.0705
0.0604	35.1567	9000	0.0692
0.0599	39.0627	10000	0.0722
0.0579	42.9716	11000	0.0689
0.0567	46.8776	12000	0.0704
0.0568	50.7835	13000	0.0694
0.0592	54.6895	14000	0.0703
0.056	58.5955	15000	0.0714
0.0549	62.5015	16000	0.0693
0.054	66.4074	17000	0.0694
0.0545	70.3134	18000	0.0693
0.0544	74.2194	19000	0.0704
0.053	78.1254	20000	0.0701
0.0523	82.0313	21000	0.0699
0.0513	85.9403	22000	0.0719
0.0515	89.8462	23000	0.0699
0.051	93.7522	24000	0.0705
0.0509	97.6582	25000	0.0702
0.0509	101.5642	26000	0.0699
0.0506	105.4701	27000	0.0705
0.0506	109.3761	28000	0.0702
0.0495	113.2821	29000	0.0694
0.0497	117.1881	30000	0.0700
0.0498	121.0940	31000	0.0709
0.0499	125.0	32000	0.0702
0.0495	128.9089	33000	0.0701
0.0491	132.8149	34000	0.0703
0.0492	136.7209	35000	0.0701
0.0507	140.6268	36000	0.0699
0.0486	144.5328	37000	0.0703
0.049	148.4388	38000	0.0703
0.0486	152.3448	39000	0.0702
0.0493	156.2507	40000	0.0702

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model