kya-kya-audio-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.0644	4.0816	1000	0.0509
0.0577	8.1633	2000	0.0461
0.0534	12.2449	3000	0.0440
0.0515	16.3265	4000	0.0451
0.0506	20.4082	5000	0.0426
0.0484	24.4898	6000	0.0433
0.0469	28.5714	7000	0.0420
0.0464	32.6531	8000	0.0416
0.0465	36.7347	9000	0.0433
0.0433	40.8163	10000	0.0405
0.0416	44.8980	11000	0.0406
0.0427	48.9796	12000	0.0408
0.041	53.0612	13000	0.0404
0.0401	57.1429	14000	0.0405
0.039	61.2245	15000	0.0399
0.0399	65.3061	16000	0.0396
0.0389	69.3878	17000	0.0413
0.0444	73.4694	18000	0.0416
0.0381	77.5510	19000	0.0402
0.0386	81.6327	20000	0.0394
0.0364	85.7143	21000	0.0398
0.037	89.7959	22000	0.0405
0.0365	93.8776	23000	0.0404
0.0361	97.9592	24000	0.0398
0.0358	102.0408	25000	0.0397
0.0354	106.1224	26000	0.0395
0.0342	110.2041	27000	0.0396
0.0337	114.2857	28000	0.0395
0.0345	118.3673	29000	0.0395
0.0348	122.4490	30000	0.0395
0.0389	126.5306	31000	0.0400
0.0336	130.6122	32000	0.0396
0.0334	134.6939	33000	0.0394
0.034	138.7755	34000	0.0391
0.0333	142.8571	35000	0.0393
0.0326	146.9388	36000	0.0392
0.0335	151.0204	37000	0.0393
0.0326	155.1020	38000	0.0394
0.0336	159.1837	39000	0.0392
0.0337	163.2653	40000	0.0392

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model