swh-waxal-audio-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.0947	62.5161	1000	0.0838
0.0807	125.0	2000	0.0836
0.0723	187.5161	3000	0.0845
0.0653	250.0	4000	0.0872
0.0594	312.5161	5000	0.0854
0.0553	375.0	6000	0.0904
0.0582	437.5161	7000	0.0926
0.0512	500.0	8000	0.0965
0.0488	562.5161	9000	0.0969
0.0467	625.0	10000	0.0991
0.0455	687.5161	11000	0.1021
0.0442	750.0	12000	0.1033
0.0458	812.5161	13000	0.1055
0.0416	875.0	14000	0.1071
0.0401	937.5161	15000	0.1045
0.041	1000.0	16000	0.1066
0.0384	1062.5161	17000	0.1130
0.0444	1125.0	18000	0.1118
0.0385	1187.5161	19000	0.1102
0.0365	1250.0	20000	0.1121
0.0347	1312.5161	21000	0.1127
0.0348	1375.0	22000	0.1146
0.0348	1437.5161	23000	0.1153
0.0351	1500.0	24000	0.1163
0.0346	1562.5161	25000	0.1175
0.0341	1625.0	26000	0.1183
0.0327	1687.5161	27000	0.1199
0.0341	1750.0	28000	0.1201
0.0341	1812.5161	29000	0.1198
0.0327	1875.0	30000	0.1205
0.0324	1937.5161	31000	0.1200
0.0381	2000.0	32000	0.1205
0.0353	2062.5161	33000	0.1213
0.0331	2125.0	34000	0.1214
0.0328	2187.5161	35000	0.1221
0.0304	2250.0	36000	0.1227
0.0312	2312.5161	37000	0.1220
0.0306	2375.0	38000	0.1213
0.031	2437.5161	39000	0.1227
0.0309	2500.0	40000	0.1222

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model