bcc_latn-bcclatn-audio-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.0696	10.2041	1000	0.0547
0.0622	20.4082	2000	0.0514
0.0548	30.6122	3000	0.0484
0.0543	40.8163	4000	0.0499
0.0515	51.0204	5000	0.0499
0.0498	61.2245	6000	0.0505
0.0473	71.4286	7000	0.0489
0.0447	81.6327	8000	0.0494
0.0463	91.8367	9000	0.0514
0.0422	102.0408	10000	0.0519
0.041	112.2449	11000	0.0521
0.0415	122.4490	12000	0.0528
0.0394	132.6531	13000	0.0542
0.0396	142.8571	14000	0.0528
0.0387	153.0612	15000	0.0546
0.0385	163.2653	16000	0.0550
0.0373	173.4694	17000	0.0542
0.0421	183.6735	18000	0.0558
0.0367	193.8776	19000	0.0553
0.0374	204.0816	20000	0.0554
0.0348	214.2857	21000	0.0563
0.0354	224.4898	22000	0.0560
0.0348	234.6939	23000	0.0552
0.0344	244.8980	24000	0.0559
0.0341	255.1020	25000	0.0558
0.0338	265.3061	26000	0.0569
0.0325	275.5102	27000	0.0567
0.0318	285.7143	28000	0.0572
0.0327	295.9184	29000	0.0567
0.0328	306.1224	30000	0.0575
0.0394	316.3265	31000	0.0581
0.0319	326.5306	32000	0.0574
0.0313	336.7347	33000	0.0577
0.0325	346.9388	34000	0.0578
0.0312	357.1429	35000	0.0575
0.0309	367.3469	36000	0.0580
0.0319	377.5510	37000	0.0580
0.0313	387.7551	38000	0.0579
0.0318	397.9592	39000	0.0584
0.0319	408.1633	40000	0.0576

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model