swh-waxal-audio-speecht5
This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.1222
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 3407
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 4000
- training_steps: 40000
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.0947 | 62.5161 | 1000 | 0.0838 |
| 0.0807 | 125.0 | 2000 | 0.0836 |
| 0.0723 | 187.5161 | 3000 | 0.0845 |
| 0.0653 | 250.0 | 4000 | 0.0872 |
| 0.0594 | 312.5161 | 5000 | 0.0854 |
| 0.0553 | 375.0 | 6000 | 0.0904 |
| 0.0582 | 437.5161 | 7000 | 0.0926 |
| 0.0512 | 500.0 | 8000 | 0.0965 |
| 0.0488 | 562.5161 | 9000 | 0.0969 |
| 0.0467 | 625.0 | 10000 | 0.0991 |
| 0.0455 | 687.5161 | 11000 | 0.1021 |
| 0.0442 | 750.0 | 12000 | 0.1033 |
| 0.0458 | 812.5161 | 13000 | 0.1055 |
| 0.0416 | 875.0 | 14000 | 0.1071 |
| 0.0401 | 937.5161 | 15000 | 0.1045 |
| 0.041 | 1000.0 | 16000 | 0.1066 |
| 0.0384 | 1062.5161 | 17000 | 0.1130 |
| 0.0444 | 1125.0 | 18000 | 0.1118 |
| 0.0385 | 1187.5161 | 19000 | 0.1102 |
| 0.0365 | 1250.0 | 20000 | 0.1121 |
| 0.0347 | 1312.5161 | 21000 | 0.1127 |
| 0.0348 | 1375.0 | 22000 | 0.1146 |
| 0.0348 | 1437.5161 | 23000 | 0.1153 |
| 0.0351 | 1500.0 | 24000 | 0.1163 |
| 0.0346 | 1562.5161 | 25000 | 0.1175 |
| 0.0341 | 1625.0 | 26000 | 0.1183 |
| 0.0327 | 1687.5161 | 27000 | 0.1199 |
| 0.0341 | 1750.0 | 28000 | 0.1201 |
| 0.0341 | 1812.5161 | 29000 | 0.1198 |
| 0.0327 | 1875.0 | 30000 | 0.1205 |
| 0.0324 | 1937.5161 | 31000 | 0.1200 |
| 0.0381 | 2000.0 | 32000 | 0.1205 |
| 0.0353 | 2062.5161 | 33000 | 0.1213 |
| 0.0331 | 2125.0 | 34000 | 0.1214 |
| 0.0328 | 2187.5161 | 35000 | 0.1221 |
| 0.0304 | 2250.0 | 36000 | 0.1227 |
| 0.0312 | 2312.5161 | 37000 | 0.1220 |
| 0.0306 | 2375.0 | 38000 | 0.1213 |
| 0.031 | 2437.5161 | 39000 | 0.1227 |
| 0.0309 | 2500.0 | 40000 | 0.1222 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.2
- Downloads last month
- 89
Model tree for sil-ai/swh-waxal-audio-speecht5
Base model
microsoft/speecht5_tts