shawgpt-ft-epoch-25

This model is a fine-tuned version of TheBloke/Mistral-7B-Instruct-v0.2-GPTQ on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9760

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • num_epochs: 25
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
8.5018 0.5714 1 4.2401
8.5321 1.5714 2 4.1560
8.0764 2.5714 3 3.9643
7.7407 3.5714 4 3.7754
7.3053 4.5714 5 3.5978
6.9918 5.5714 6 3.4268
6.6703 6.5714 7 3.2631
6.3584 7.5714 8 3.1110
6.0966 8.5714 9 2.9737
5.8804 9.5714 10 2.8524
5.563 10.5714 11 2.7455
5.3723 11.5714 12 2.6487
5.1872 12.5714 13 2.5584
4.994 13.5714 14 2.4747
4.8122 14.5714 15 2.3941
4.6556 15.5714 16 2.3245
4.5241 16.5714 17 2.2687
4.4362 17.5714 18 2.2153
4.314 18.5714 19 2.1626
4.1867 19.5714 20 2.1120
4.0063 20.5714 21 2.0663
3.9645 21.5714 22 2.0290
3.8819 22.5714 23 2.0020
3.8298 23.5714 24 1.9846
1.9396 24.5714 25 1.9760

Framework versions

  • PEFT 0.14.0
  • Transformers 4.48.3
  • Pytorch 2.5.1+cu124
  • Datasets 3.3.1
  • Tokenizers 0.21.0
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jonasbukhave/shawgpt-ft-epoch-25