mms-1b-all-bemgen-combined-gdro

This model is a fine-tuned version of facebook/mms-1b-all on the BEMGEN - BEM dataset. It achieves the following results on the evaluation set:

  • Loss: 17.8045
  • Wer: 0.5344

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 5.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
651.3737 0.5076 100 199.4144 1.0
200.6862 1.0152 200 66.1207 0.9503
134.4785 1.5228 300 23.4666 0.5956
111.6127 2.0305 400 19.1133 0.5569
110.8015 2.5381 500 18.9017 0.5372
102.4243 3.0457 600 18.3297 0.5449
103.9713 3.5533 700 18.8923 0.5445
102.1721 4.0609 800 17.8276 0.5343
102.6848 4.5685 900 17.8989 0.5297

Framework versions

  • Transformers 4.52.4
  • Pytorch 2.9.0+cu128
  • Datasets 4.4.1
  • Tokenizers 0.21.4
Downloads last month
1
Safetensors
Model size
1.0B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for csikasote/mms-1b-all-bemgen-combined-gdro

Finetuned
(382)
this model