taska-wav2vec-300m-max22-WF-epoch-16-batch-8-whisper-2
This model is a fine-tuned version of whisper-large-v2 on the uriel/audio_data_kaggle_train_taska dataset. It achieves the following results on the evaluation set:
- Loss: 0.8168
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_steps: 100
- num_epochs: 6
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 2.4269 | 0.0888 | 250 | 2.2703 |
| 1.8282 | 0.1777 | 500 | 1.7029 |
| 1.4952 | 0.2665 | 750 | 1.4932 |
| 1.4543 | 0.3553 | 1000 | 1.3942 |
| 1.319 | 0.4442 | 1250 | 1.3234 |
| 1.2642 | 0.5330 | 1500 | 1.2685 |
| 1.1427 | 0.6218 | 1750 | 1.2236 |
| 1.1739 | 0.7107 | 2000 | 1.1846 |
| 1.2141 | 0.7995 | 2250 | 1.1495 |
| 1.1487 | 0.8883 | 2500 | 1.1158 |
| 1.1129 | 0.9772 | 2750 | 1.0875 |
| 1.0279 | 1.0657 | 3000 | 1.0655 |
| 1.051 | 1.1546 | 3250 | 1.0451 |
| 1.0939 | 1.2434 | 3500 | 1.0276 |
| 0.9935 | 1.3322 | 3750 | 1.0116 |
| 0.9597 | 1.4211 | 4000 | 0.9981 |
| 0.9801 | 1.5099 | 4250 | 0.9833 |
| 1.0072 | 1.5987 | 4500 | 0.9723 |
| 1.0046 | 1.6876 | 4750 | 0.9610 |
| 0.9349 | 1.7764 | 5000 | 0.9499 |
| 0.954 | 1.8652 | 5250 | 0.9423 |
| 0.9476 | 1.9541 | 5500 | 0.9328 |
| 0.9019 | 2.0426 | 5750 | 0.9245 |
| 0.9377 | 2.1315 | 6000 | 0.9162 |
| 0.8679 | 2.2203 | 6250 | 0.9093 |
| 0.9139 | 2.3091 | 6500 | 0.9030 |
| 0.8849 | 2.3980 | 6750 | 0.8964 |
| 0.8529 | 2.4868 | 7000 | 0.8908 |
| 0.8874 | 2.5756 | 7250 | 0.8852 |
| 0.9127 | 2.6645 | 7500 | 0.8795 |
| 0.8952 | 2.7533 | 7750 | 0.8748 |
| 0.8817 | 2.8421 | 8000 | 0.8703 |
| 0.8542 | 2.9310 | 8250 | 0.8652 |
| 0.8651 | 3.0195 | 8500 | 0.8618 |
| 0.9227 | 3.1084 | 8750 | 0.8576 |
| 0.8792 | 3.1972 | 9000 | 0.8540 |
| 0.8417 | 3.2860 | 9250 | 0.8507 |
| 0.8692 | 3.3749 | 9500 | 0.8476 |
| 0.8254 | 3.4637 | 9750 | 0.8443 |
| 0.8894 | 3.5525 | 10000 | 0.8420 |
| 0.8395 | 3.6414 | 10250 | 0.8396 |
| 0.8509 | 3.7302 | 10500 | 0.8372 |
| 0.8282 | 3.8190 | 10750 | 0.8347 |
| 0.8377 | 3.9079 | 11000 | 0.8333 |
| 0.8156 | 3.9967 | 11250 | 0.8313 |
| 0.829 | 4.0853 | 11500 | 0.8289 |
| 0.8247 | 4.1741 | 11750 | 0.8278 |
| 0.8182 | 4.2629 | 12000 | 0.8268 |
| 0.8489 | 4.3518 | 12250 | 0.8252 |
| 0.8392 | 4.4406 | 12500 | 0.8240 |
| 0.8272 | 4.5294 | 12750 | 0.8229 |
| 0.8044 | 4.6183 | 13000 | 0.8217 |
| 0.8042 | 4.7071 | 13250 | 0.8211 |
| 0.7989 | 4.7959 | 13500 | 0.8199 |
| 0.8282 | 4.8848 | 13750 | 0.8193 |
| 0.8137 | 4.9736 | 14000 | 0.8188 |
| 0.8156 | 5.0622 | 14250 | 0.8185 |
| 0.8356 | 5.1510 | 14500 | 0.8181 |
| 0.8436 | 5.2399 | 14750 | 0.8178 |
| 0.8414 | 5.3287 | 15000 | 0.8174 |
| 0.7825 | 5.4175 | 15250 | 0.8174 |
| 0.8374 | 5.5064 | 15500 | 0.8168 |
| 0.8261 | 5.5952 | 15750 | 0.8169 |
| 0.8161 | 5.6840 | 16000 | 0.8169 |
| 0.7703 | 5.7729 | 16250 | 0.8168 |
| 0.8152 | 5.8617 | 16500 | 0.8168 |
| 0.8479 | 5.9505 | 16750 | 0.8168 |
Framework versions
- PEFT 0.15.2.dev0
- Transformers 4.53.0.dev0
- Pytorch 2.7.1+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for uriel/taska-wav2vec-300m-max22-WF-epoch-16-batch-8-whisper-2
Base model
openai/whisper-large-v2