kya-kya-audio-aligned-speecht5
This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.0392
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 3407
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 4000
- training_steps: 40000
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.0644 | 4.0816 | 1000 | 0.0509 |
| 0.0577 | 8.1633 | 2000 | 0.0461 |
| 0.0534 | 12.2449 | 3000 | 0.0440 |
| 0.0515 | 16.3265 | 4000 | 0.0451 |
| 0.0506 | 20.4082 | 5000 | 0.0426 |
| 0.0484 | 24.4898 | 6000 | 0.0433 |
| 0.0469 | 28.5714 | 7000 | 0.0420 |
| 0.0464 | 32.6531 | 8000 | 0.0416 |
| 0.0465 | 36.7347 | 9000 | 0.0433 |
| 0.0433 | 40.8163 | 10000 | 0.0405 |
| 0.0416 | 44.8980 | 11000 | 0.0406 |
| 0.0427 | 48.9796 | 12000 | 0.0408 |
| 0.041 | 53.0612 | 13000 | 0.0404 |
| 0.0401 | 57.1429 | 14000 | 0.0405 |
| 0.039 | 61.2245 | 15000 | 0.0399 |
| 0.0399 | 65.3061 | 16000 | 0.0396 |
| 0.0389 | 69.3878 | 17000 | 0.0413 |
| 0.0444 | 73.4694 | 18000 | 0.0416 |
| 0.0381 | 77.5510 | 19000 | 0.0402 |
| 0.0386 | 81.6327 | 20000 | 0.0394 |
| 0.0364 | 85.7143 | 21000 | 0.0398 |
| 0.037 | 89.7959 | 22000 | 0.0405 |
| 0.0365 | 93.8776 | 23000 | 0.0404 |
| 0.0361 | 97.9592 | 24000 | 0.0398 |
| 0.0358 | 102.0408 | 25000 | 0.0397 |
| 0.0354 | 106.1224 | 26000 | 0.0395 |
| 0.0342 | 110.2041 | 27000 | 0.0396 |
| 0.0337 | 114.2857 | 28000 | 0.0395 |
| 0.0345 | 118.3673 | 29000 | 0.0395 |
| 0.0348 | 122.4490 | 30000 | 0.0395 |
| 0.0389 | 126.5306 | 31000 | 0.0400 |
| 0.0336 | 130.6122 | 32000 | 0.0396 |
| 0.0334 | 134.6939 | 33000 | 0.0394 |
| 0.034 | 138.7755 | 34000 | 0.0391 |
| 0.0333 | 142.8571 | 35000 | 0.0393 |
| 0.0326 | 146.9388 | 36000 | 0.0392 |
| 0.0335 | 151.0204 | 37000 | 0.0393 |
| 0.0326 | 155.1020 | 38000 | 0.0394 |
| 0.0336 | 159.1837 | 39000 | 0.0392 |
| 0.0337 | 163.2653 | 40000 | 0.0392 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.2
- Downloads last month
- 2
Model tree for sil-ai/kya-kya-audio-aligned-speecht5
Base model
microsoft/speecht5_tts