senga-nt-asr-inferred-force-aligned-speecht5-NT-l1-pure-mms40

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 200
num_epochs: 300.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.1077	14.0845	1000	0.0878
0.098	28.1690	2000	0.0832
0.0904	42.2535	3000	0.0817
0.0855	56.3380	4000	0.0807
0.0813	70.4225	5000	0.0811
0.0786	84.5070	6000	0.0808
0.0769	98.5915	7000	0.0801
0.0721	112.6761	8000	0.0820
0.0736	126.7606	9000	0.0815
0.0692	140.8451	10000	0.0818
0.0671	154.9296	11000	0.0822
0.0691	169.0141	12000	0.0826
0.065	183.0986	13000	0.0819
0.0649	197.1831	14000	0.0827
0.0631	211.2676	15000	0.0829
0.0652	225.3521	16000	0.0832
0.0635	239.4366	17000	0.0830
0.0696	253.5211	18000	0.0834
0.0633	267.6056	19000	0.0830
0.0651	281.6901	20000	0.0828
0.0616	295.7746	21000	0.0833

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model