swh-bible-audio-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.0941	9.9032	1000	0.0779
0.0802	19.8040	2000	0.0755
0.0785	29.7047	3000	0.0732
0.0752	39.6055	4000	0.0751
0.0694	49.5062	5000	0.0752
0.065	59.4069	6000	0.0730
0.0795	69.3077	7000	0.0738
0.0637	79.2084	8000	0.0762
0.0616	89.1092	9000	0.0750
0.0599	99.0099	10000	0.0784
0.0571	108.9132	11000	0.0747
0.0547	118.8139	12000	0.0745
0.0547	128.7146	13000	0.0761
0.0577	138.6154	14000	0.0762
0.0546	148.5161	15000	0.0771
0.0514	158.4169	16000	0.0773
0.0538	168.3176	17000	0.0772
0.0522	178.2184	18000	0.0782
0.0494	188.1191	19000	0.0785
0.0487	198.0199	20000	0.0770
0.0486	207.9231	21000	0.0778
0.0488	217.8238	22000	0.0799
0.0475	227.7246	23000	0.0781
0.048	237.6253	24000	0.0812
0.0469	247.5261	25000	0.0794
0.0466	257.4268	26000	0.0786
0.0479	267.3275	27000	0.0796
0.0448	277.2283	28000	0.0795
0.0459	287.1290	29000	0.0799
0.045	297.0298	30000	0.0798
0.0451	306.9330	31000	0.0805
0.0444	316.8337	32000	0.0794
0.0446	326.7345	33000	0.0798
0.0452	336.6352	34000	0.0794
0.0436	346.5360	35000	0.0800
0.0433	356.4367	36000	0.0800
0.0477	366.3375	37000	0.0799
0.0441	376.2382	38000	0.0797
0.0435	386.1390	39000	0.0803
0.0443	396.0397	40000	0.0800

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model