ssc-ssc-audio-aligned-speecht5
This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.0702
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 3407
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 4000
- training_steps: 40000
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.0837 | 3.9089 | 1000 | 0.0754 |
| 0.0722 | 7.8149 | 2000 | 0.0724 |
| 0.0722 | 11.7209 | 3000 | 0.0722 |
| 0.0703 | 15.6268 | 4000 | 0.0729 |
| 0.0655 | 19.5328 | 5000 | 0.0710 |
| 0.0622 | 23.4388 | 6000 | 0.0694 |
| 0.0703 | 27.3448 | 7000 | 0.0716 |
| 0.0625 | 31.2507 | 8000 | 0.0705 |
| 0.0604 | 35.1567 | 9000 | 0.0692 |
| 0.0599 | 39.0627 | 10000 | 0.0722 |
| 0.0579 | 42.9716 | 11000 | 0.0689 |
| 0.0567 | 46.8776 | 12000 | 0.0704 |
| 0.0568 | 50.7835 | 13000 | 0.0694 |
| 0.0592 | 54.6895 | 14000 | 0.0703 |
| 0.056 | 58.5955 | 15000 | 0.0714 |
| 0.0549 | 62.5015 | 16000 | 0.0693 |
| 0.054 | 66.4074 | 17000 | 0.0694 |
| 0.0545 | 70.3134 | 18000 | 0.0693 |
| 0.0544 | 74.2194 | 19000 | 0.0704 |
| 0.053 | 78.1254 | 20000 | 0.0701 |
| 0.0523 | 82.0313 | 21000 | 0.0699 |
| 0.0513 | 85.9403 | 22000 | 0.0719 |
| 0.0515 | 89.8462 | 23000 | 0.0699 |
| 0.051 | 93.7522 | 24000 | 0.0705 |
| 0.0509 | 97.6582 | 25000 | 0.0702 |
| 0.0509 | 101.5642 | 26000 | 0.0699 |
| 0.0506 | 105.4701 | 27000 | 0.0705 |
| 0.0506 | 109.3761 | 28000 | 0.0702 |
| 0.0495 | 113.2821 | 29000 | 0.0694 |
| 0.0497 | 117.1881 | 30000 | 0.0700 |
| 0.0498 | 121.0940 | 31000 | 0.0709 |
| 0.0499 | 125.0 | 32000 | 0.0702 |
| 0.0495 | 128.9089 | 33000 | 0.0701 |
| 0.0491 | 132.8149 | 34000 | 0.0703 |
| 0.0492 | 136.7209 | 35000 | 0.0701 |
| 0.0507 | 140.6268 | 36000 | 0.0699 |
| 0.0486 | 144.5328 | 37000 | 0.0703 |
| 0.049 | 148.4388 | 38000 | 0.0703 |
| 0.0486 | 152.3448 | 39000 | 0.0702 |
| 0.0493 | 156.2507 | 40000 | 0.0702 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.2
- Downloads last month
- 2
Model tree for sil-ai/ssc-ssc-audio-aligned-speecht5
Base model
microsoft/speecht5_tts