Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

Whisper Medium Ro - PEFT

This model is a fine-tuned version of openai/whisper-medium on the Common Voice 17.0 10% Synthetic 4 speakers dataset. It achieves the following results on the evaluation set:

Loss: 0.1623
Wer: 8.3516

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 600
training_steps: 6000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.2456	0.2267	1000	0.2116	10.6105
0.2176	0.4533	2000	0.1821	9.4078
0.2059	0.6800	3000	0.1719	8.8431
0.1889	0.9066	4000	0.1665	8.5867
0.194	1.1333	5000	0.1632	8.3455
0.1729	1.3599	6000	0.1623	8.3516

Framework versions

PEFT 0.18.1.dev0
Transformers 4.57.1
Pytorch 2.9.1+rocm6.4
Datasets 3.6.0
Tokenizers 0.22.1

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VladS159/whisper-medium-6000-steps_10_percent_4_speakers_synthetic_data_16_02_2026

Base model

openai/whisper-medium

Adapter

(126)

this model

Dataset used to train VladS159/whisper-medium-6000-steps_10_percent_4_speakers_synthetic_data_16_02_2026

Evaluation results

Wer on Common Voice 17.0 10% Synthetic 4 speakers
self-reported

8.352