ntk-Ikoma-audio-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0463

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 3407
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 4000
  • training_steps: 40000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.0668 3.9841 1000 0.0580
0.0625 7.9681 2000 0.0544
0.0599 11.9522 3000 0.0529
0.0551 15.9363 4000 0.0521
0.0539 19.9203 5000 0.0516
0.0523 23.9044 6000 0.0501
0.0513 27.8884 7000 0.0509
0.0508 31.8725 8000 0.0499
0.0501 35.8566 9000 0.0503
0.0471 39.8406 10000 0.0483
0.046 43.8247 11000 0.0484
0.0465 47.8088 12000 0.0476
0.0458 51.7928 13000 0.0472
0.0439 55.7769 14000 0.0472
0.0429 59.7610 15000 0.0479
0.0437 63.7450 16000 0.0470
0.0422 67.7291 17000 0.0468
0.0467 71.7131 18000 0.0474
0.0422 75.6972 19000 0.0469
0.0426 79.6813 20000 0.0473
0.0404 83.6653 21000 0.0465
0.0415 87.6494 22000 0.0475
0.0405 91.6335 23000 0.0465
0.0401 95.6175 24000 0.0464
0.0397 99.6016 25000 0.0467
0.0389 103.5857 26000 0.0463
0.0389 107.5697 27000 0.0464
0.0382 111.5538 28000 0.0465
0.0386 115.5378 29000 0.0463
0.0391 119.5219 30000 0.0467
0.0415 123.5060 31000 0.0466
0.038 127.4900 32000 0.0465
0.0373 131.4741 33000 0.0463
0.038 135.4582 34000 0.0464
0.0371 139.4422 35000 0.0463
0.0372 143.4263 36000 0.0463
0.0377 147.4104 37000 0.0464
0.0377 151.3944 38000 0.0464
0.038 155.3785 39000 0.0463
0.0379 159.3625 40000 0.0463

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.2
Downloads last month
6
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sil-ai/ntk-Ikoma-audio-aligned-speecht5

Finetuned
(1364)
this model