ntk-Ikoma-audio-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.0668	3.9841	1000	0.0580
0.0625	7.9681	2000	0.0544
0.0599	11.9522	3000	0.0529
0.0551	15.9363	4000	0.0521
0.0539	19.9203	5000	0.0516
0.0523	23.9044	6000	0.0501
0.0513	27.8884	7000	0.0509
0.0508	31.8725	8000	0.0499
0.0501	35.8566	9000	0.0503
0.0471	39.8406	10000	0.0483
0.046	43.8247	11000	0.0484
0.0465	47.8088	12000	0.0476
0.0458	51.7928	13000	0.0472
0.0439	55.7769	14000	0.0472
0.0429	59.7610	15000	0.0479
0.0437	63.7450	16000	0.0470
0.0422	67.7291	17000	0.0468
0.0467	71.7131	18000	0.0474
0.0422	75.6972	19000	0.0469
0.0426	79.6813	20000	0.0473
0.0404	83.6653	21000	0.0465
0.0415	87.6494	22000	0.0475
0.0405	91.6335	23000	0.0465
0.0401	95.6175	24000	0.0464
0.0397	99.6016	25000	0.0467
0.0389	103.5857	26000	0.0463
0.0389	107.5697	27000	0.0464
0.0382	111.5538	28000	0.0465
0.0386	115.5378	29000	0.0463
0.0391	119.5219	30000	0.0467
0.0415	123.5060	31000	0.0466
0.038	127.4900	32000	0.0465
0.0373	131.4741	33000	0.0463
0.038	135.4582	34000	0.0464
0.0371	139.4422	35000	0.0463
0.0372	143.4263	36000	0.0463
0.0377	147.4104	37000	0.0464
0.0377	151.3944	38000	0.0464
0.038	155.3785	39000	0.0463
0.0379	159.3625	40000	0.0463

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model