bcc_latn-bcclatn-audio-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0576

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 3407
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 4000
  • training_steps: 40000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.0696 10.2041 1000 0.0547
0.0622 20.4082 2000 0.0514
0.0548 30.6122 3000 0.0484
0.0543 40.8163 4000 0.0499
0.0515 51.0204 5000 0.0499
0.0498 61.2245 6000 0.0505
0.0473 71.4286 7000 0.0489
0.0447 81.6327 8000 0.0494
0.0463 91.8367 9000 0.0514
0.0422 102.0408 10000 0.0519
0.041 112.2449 11000 0.0521
0.0415 122.4490 12000 0.0528
0.0394 132.6531 13000 0.0542
0.0396 142.8571 14000 0.0528
0.0387 153.0612 15000 0.0546
0.0385 163.2653 16000 0.0550
0.0373 173.4694 17000 0.0542
0.0421 183.6735 18000 0.0558
0.0367 193.8776 19000 0.0553
0.0374 204.0816 20000 0.0554
0.0348 214.2857 21000 0.0563
0.0354 224.4898 22000 0.0560
0.0348 234.6939 23000 0.0552
0.0344 244.8980 24000 0.0559
0.0341 255.1020 25000 0.0558
0.0338 265.3061 26000 0.0569
0.0325 275.5102 27000 0.0567
0.0318 285.7143 28000 0.0572
0.0327 295.9184 29000 0.0567
0.0328 306.1224 30000 0.0575
0.0394 316.3265 31000 0.0581
0.0319 326.5306 32000 0.0574
0.0313 336.7347 33000 0.0577
0.0325 346.9388 34000 0.0578
0.0312 357.1429 35000 0.0575
0.0309 367.3469 36000 0.0580
0.0319 377.5510 37000 0.0580
0.0313 387.7551 38000 0.0579
0.0318 397.9592 39000 0.0584
0.0319 408.1633 40000 0.0576

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.2
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sil-ai/bcc_latn-bcclatn-audio-aligned-speecht5

Finetuned
(1364)
this model