gpt2-multilingual-20-zh-repair_3epochs_lr1e-4

This model is a fine-tuned version of CausalNLP/gpt2-hf_multilingual-20 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5282

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
3.6036 0.0626 500 3.6583
3.6222 0.1252 1000 3.6372
3.5958 0.1879 1500 3.6347
3.6297 0.2505 2000 3.6452
3.6016 0.3131 2500 3.6591
3.6284 0.3757 3000 3.6530
3.5823 0.4384 3500 3.6493
3.646 0.5010 4000 3.6412
3.6329 0.5636 4500 3.6358
3.5469 0.6262 5000 3.6302
3.5987 0.6889 5500 3.6242
3.4773 0.7515 6000 3.6181
3.561 0.8141 6500 3.6142
3.5243 0.8767 7000 3.6092
3.5998 0.9393 7500 3.6058
3.6267 1.0019 8000 3.6007
3.5314 1.0645 8500 3.5958
3.5579 1.1271 9000 3.5915
3.541 1.1897 9500 3.5861
3.5823 1.2524 10000 3.5820
3.534 1.3150 10500 3.5767
3.4906 1.3776 11000 3.5718
3.5538 1.4402 11500 3.5673
3.527 1.5029 12000 3.5631
3.5119 1.5655 12500 3.5584
3.4633 1.6281 13000 3.5537
3.5098 1.6907 13500 3.5503
3.4336 1.7534 14000 3.5474
3.5241 1.8160 14500 3.5444
3.4846 1.8786 15000 3.5409
3.4802 1.9412 15500 3.5385
3.5026 2.0038 16000 3.5368
3.494 2.0664 16500 3.5351
3.4524 2.1290 17000 3.5341
3.4478 2.1916 17500 3.5329
3.4458 2.2543 18000 3.5317
3.4925 2.3169 18500 3.5305
3.4913 2.3795 19000 3.5298
3.4331 2.4421 19500 3.5293
3.5182 2.5047 20000 3.5290
3.446 2.5674 20500 3.5286
3.5018 2.6300 21000 3.5285
3.475 2.6926 21500 3.5283
3.4299 2.7552 22000 3.5282
3.4538 2.8179 22500 3.5282
3.4471 2.8805 23000 3.5282
3.4295 2.9431 23500 3.5282

Framework versions

  • Transformers 4.57.3
  • Pytorch 2.9.0
  • Datasets 4.4.1
  • Tokenizers 0.22.1
Downloads last month
68
Safetensors
Model size
0.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CausalNLP/gpt2-multilingual-20-zh-repair_3epochs_lr1e-4

Finetuned
(6)
this model

Collection including CausalNLP/gpt2-multilingual-20-zh-repair_3epochs_lr1e-4