diffuGem / gpt2-medium-pt /README.md
youuor7r's picture
Clean initial commit with LFS
17e4d32
metadata
library_name: transformers
license: other
base_model: gpt2-medium
tags:
  - llama-factory
  - full
  - generated_from_trainer
model-index:
  - name: gpt2-medium-pt
    results: []

gpt2-medium-pt

This model is a fine-tuned version of gpt2-medium on the dolma_v17 dataset. It achieves the following results on the evaluation set:

  • Loss: 4.0213

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 2000
  • training_steps: 480000

Training results

Training Loss Epoch Step Validation Loss
3.9406 0.0021 1000 4.2421
4.1862 0.0042 2000 4.3823
4.2825 0.0063 3000 4.3790
4.1645 0.0083 4000 4.3446
4.2677 0.0104 5000 4.3706
4.0537 0.0125 6000 4.3647
4.0605 0.0146 7000 4.3406
4.0504 0.0167 8000 4.3591
4.1096 0.0187 9000 4.3455
3.9671 0.0208 10000 4.3356
3.9795 0.0229 11000 4.3432
3.9739 0.025 12000 4.3560
4.0854 0.0271 13000 4.3599
3.9751 0.0292 14000 4.3569
4.178 0.0312 15000 4.3309
3.9919 0.0333 16000 4.3316
4.1456 0.0354 17000 4.3259
4.0668 0.0375 18000 4.3300
4.1336 0.0396 19000 4.3413
4.0671 0.0417 20000 4.3567
4.0745 0.0437 21000 4.3423
3.8593 0.0458 22000 4.3541
4.0354 0.0479 23000 4.3456
4.0783 0.05 24000 4.3473
4.0063 0.0521 25000 4.3363
4.0914 0.0542 26000 4.3279
4.1028 0.0563 27000 4.3403
3.9719 0.0583 28000 4.3337
3.9235 0.0604 29000 4.3229
4.0542 0.0625 30000 4.3088
3.995 0.0646 31000 4.3274
3.9575 0.0667 32000 4.3180
4.0015 0.0688 33000 4.3040
3.8917 0.0708 34000 4.3096
4.0533 0.0729 35000 4.3057
4.0497 0.075 36000 4.3005
3.9832 0.0771 37000 4.3032
3.9568 0.0792 38000 4.3099
3.8289 0.0813 39000 4.2920
3.8529 0.0833 40000 4.3161
3.8106 0.0854 41000 4.3104
3.969 0.0875 42000 4.3202
3.9867 0.0896 43000 4.3069
3.8601 0.0917 44000 4.3099
3.9873 0.0938 45000 4.2964
3.8743 0.0958 46000 4.2993
3.956 0.0979 47000 4.2866
4.0474 0.1 48000 4.2899
3.9868 0.1021 49000 4.2792
4.1809 0.1042 50000 4.2974
3.7905 0.1062 51000 4.2968
4.1425 0.1083 52000 4.2928
3.9946 0.1104 53000 4.2871
4.0914 0.1125 54000 4.2808
3.9746 0.1146 55000 4.2826
4.1812 0.1167 56000 4.2942
4.156 0.1187 57000 4.2894
3.9697 0.1208 58000 4.3088
3.8969 0.1229 59000 4.3043
3.9393 0.125 60000 4.2872
3.7209 0.1271 61000 4.2879
4.0016 0.1292 62000 4.2829
3.9925 0.1313 63000 4.2996
3.966 0.1333 64000 4.2836
4.162 0.1354 65000 4.2700
4.1639 0.1375 66000 4.2808
4.003 0.1396 67000 4.2764
3.9027 0.1417 68000 4.2868
4.0008 0.1437 69000 4.2744
3.9711 0.1458 70000 4.2691
4.0112 0.1479 71000 4.2842
4.0251 0.15 72000 4.2958
3.9756 0.1521 73000 4.2890
4.0327 0.1542 74000 4.2838
4.0919 0.1562 75000 4.2948
3.9199 0.1583 76000 4.2723
3.9125 0.1604 77000 4.2866
3.7808 0.1625 78000 4.2626
3.9531 0.1646 79000 4.2740
3.8218 0.1667 80000 4.2792
3.9225 0.1688 81000 4.2763
3.9183 0.1708 82000 4.2682
3.9687 0.1729 83000 4.2729
3.8568 0.175 84000 4.2648
4.0333 0.1771 85000 4.2709
4.0159 0.1792 86000 4.2644
3.9802 0.1812 87000 4.2762
3.9888 0.1833 88000 4.2907
3.9399 0.1854 89000 4.2989
3.8818 0.1875 90000 4.2921
3.8417 0.1896 91000 4.2766
3.9246 0.1917 92000 4.3040
4.0405 0.1938 93000 4.2696
4.0513 0.1958 94000 4.2859
3.9277 0.1979 95000 4.2711
3.8974 0.2 96000 4.2692
3.8104 0.2021 97000 4.2821
3.9236 0.2042 98000 4.2718
3.9511 0.2062 99000 4.2568
3.9057 0.2083 100000 4.2715
3.9731 0.2104 101000 4.2596
3.9541 0.2125 102000 4.2579
3.838 0.2146 103000 4.2609
3.8423 0.2167 104000 4.2744
3.7543 0.2188 105000 4.2600
3.897 0.2208 106000 4.2630
4.0565 0.2229 107000 4.2592
3.9413 0.225 108000 4.2702
3.923 0.2271 109000 4.2499
3.8697 0.2292 110000 4.2497
3.899 0.2313 111000 4.2592
3.7349 0.2333 112000 4.2554
3.9045 0.2354 113000 4.2610
3.8632 0.2375 114000 4.2747
3.8984 0.2396 115000 4.2779
3.93 0.2417 116000 4.2575
3.8582 0.2437 117000 4.2616
4.0219 0.2458 118000 4.2670
3.7756 0.2479 119000 4.2543
3.9143 0.25 120000 4.2616
3.6382 0.2521 121000 4.2462
3.8568 0.2542 122000 4.2457
3.9708 0.2562 123000 4.2640
4.0387 0.2583 124000 4.2511
3.8761 0.2604 125000 4.2514
3.9877 0.2625 126000 4.2384
4.1627 0.2646 127000 4.2349
3.9465 0.2667 128000 4.2322
4.022 0.2687 129000 4.2292
3.9426 0.2708 130000 4.2244
3.9976 0.2729 131000 4.2385
3.8491 0.275 132000 4.2288
3.7929 0.2771 133000 4.2520
3.9137 0.2792 134000 4.2459
4.0672 0.2812 135000 4.2212
3.949 0.2833 136000 4.2392
3.8779 0.2854 137000 4.2357
3.6715 0.2875 138000 4.2332
3.8978 0.2896 139000 4.2228
3.8223 0.2917 140000 4.2103
3.7498 0.2938 141000 4.2173
3.9105 0.2958 142000 4.2089
3.6953 0.2979 143000 4.2044
3.9106 0.3 144000 4.2080
3.926 0.3021 145000 4.1949
3.731 0.3042 146000 4.2039
3.8684 0.3063 147000 4.1905
3.8938 0.3083 148000 4.2073
4.0697 0.3104 149000 4.2016
3.9828 0.3125 150000 4.2087
3.8982 0.3146 151000 4.1983
3.8948 0.3167 152000 4.2070
3.7433 0.3187 153000 4.1971
3.9262 0.3208 154000 4.2023
3.8333 0.3229 155000 4.1901
3.7393 0.325 156000 4.1789
3.8245 0.3271 157000 4.1949
3.7302 0.3292 158000 4.2008
3.7852 0.3312 159000 4.2114
4.1717 0.3333 160000 4.2122
3.9231 0.3354 161000 4.2006
3.9928 0.3375 162000 4.2033
3.9021 0.3396 163000 4.1992
3.8111 0.3417 164000 4.1922
3.8097 0.3438 165000 4.1896
3.9761 0.3458 166000 4.1899
3.9246 0.3479 167000 4.1845
3.7704 0.35 168000 4.1973
3.848 0.3521 169000 4.1928
3.9466 0.3542 170000 4.1868
3.9217 0.3563 171000 4.1876
3.8834 0.3583 172000 4.1830
3.9101 0.3604 173000 4.1831
3.863 0.3625 174000 4.1878
3.7479 0.3646 175000 4.1790
3.9295 0.3667 176000 4.1882
3.9213 0.3688 177000 4.1748
3.9537 0.3708 178000 4.1818
3.8242 0.3729 179000 4.1876
3.8014 0.375 180000 4.1797
3.977 0.3771 181000 4.1774
3.8955 0.3792 182000 4.1732
3.6788 0.3812 183000 4.1850
3.7937 0.3833 184000 4.1899
3.7619 0.3854 185000 4.1798
3.6499 0.3875 186000 4.1672
3.88 0.3896 187000 4.1677
3.8804 0.3917 188000 4.1707
3.9147 0.3937 189000 4.1752
3.8926 0.3958 190000 4.1679
3.914 0.3979 191000 4.1687
3.8568 0.4 192000 4.1657
3.9354 0.4021 193000 4.1766
4.0083 0.4042 194000 4.1803
3.8001 0.4062 195000 4.1860
3.8921 0.4083 196000 4.1826
3.8765 0.4104 197000 4.1873
3.8115 0.4125 198000 4.1793
3.7874 0.4146 199000 4.1726
3.8914 0.4167 200000 4.1795
3.7419 0.4188 201000 4.1726
3.8238 0.4208 202000 4.1700
3.8894 0.4229 203000 4.1794
3.8286 0.425 204000 4.1889
3.8354 0.4271 205000 4.1837
3.7126 0.4292 206000 4.1815
3.9142 0.4313 207000 4.1855
3.7889 0.4333 208000 4.1708
3.8422 0.4354 209000 4.1663
3.672 0.4375 210000 4.1822
3.7588 0.4396 211000 4.1781
3.8026 0.4417 212000 4.1646
3.6782 0.4437 213000 4.1546
3.8097 0.4458 214000 4.1580
3.793 0.4479 215000 4.1565
3.747 0.45 216000 4.1563
3.717 0.4521 217000 4.1519
3.7103 0.4542 218000 4.1483
3.7573 0.4562 219000 4.1505
3.7231 0.4583 220000 4.1426
3.7455 0.4604 221000 4.1467
3.9432 0.4625 222000 4.1394
3.7637 0.4646 223000 4.1492
3.7523 0.4667 224000 4.1488
3.8442 0.4688 225000 4.1421
3.8075 0.4708 226000 4.1489
3.8635 0.4729 227000 4.1428
3.7968 0.475 228000 4.1277
4.0897 0.4771 229000 4.1367
3.7897 0.4792 230000 4.1387
3.8465 0.4813 231000 4.1335
3.7556 0.4833 232000 4.1389
3.7213 0.4854 233000 4.1405
3.8532 0.4875 234000 4.1432
3.8223 0.4896 235000 4.1326
3.7056 0.4917 236000 4.1450
3.8021 0.4938 237000 4.1437
3.6611 0.4958 238000 4.1558
3.6681 0.4979 239000 4.1495
3.7164 0.5 240000 4.1465
3.8413 0.5021 241000 4.1446
3.7366 0.5042 242000 4.1417
3.7334 0.5062 243000 4.1509
3.6592 0.5083 244000 4.1501
3.9593 0.5104 245000 4.1344
3.7163 0.5125 246000 4.1437
3.8853 0.5146 247000 4.1278
3.9094 0.5167 248000 4.1356
3.7408 0.5188 249000 4.1312
3.7387 0.5208 250000 4.1353
3.9224 0.5229 251000 4.1273
3.6346 0.525 252000 4.1171
3.7065 0.5271 253000 4.1247
3.7919 0.5292 254000 4.1224
3.6299 0.5312 255000 4.1313
3.808 0.5333 256000 4.1354
3.7369 0.5354 257000 4.1361
3.7384 0.5375 258000 4.1302
3.6545 0.5396 259000 4.1199
3.5952 0.5417 260000 4.1144
3.9045 0.5437 261000 4.1138
3.7152 0.5458 262000 4.1106
3.6045 0.5479 263000 4.1139
3.6828 0.55 264000 4.1062
3.6521 0.5521 265000 4.1086
3.7868 0.5542 266000 4.1060
3.6959 0.5563 267000 4.1037
3.7066 0.5583 268000 4.1052
3.4761 0.5604 269000 4.1036
3.8575 0.5625 270000 4.1076
3.8409 0.5646 271000 4.1045
3.6896 0.5667 272000 4.0957
3.6256 0.5687 273000 4.0979
3.7911 0.5708 274000 4.1005
3.7844 0.5729 275000 4.1017
3.7466 0.575 276000 4.0897
3.7865 0.5771 277000 4.0869
3.8352 0.5792 278000 4.0947
3.7003 0.5813 279000 4.0921
3.8638 0.5833 280000 4.0935
3.7721 0.5854 281000 4.0952
3.5453 0.5875 282000 4.0965
3.6949 0.5896 283000 4.1038
3.8031 0.5917 284000 4.0956
3.8408 0.5938 285000 4.0953
3.9591 0.5958 286000 4.0891
3.711 0.5979 287000 4.0869
3.7253 0.6 288000 4.0866
3.9207 0.6021 289000 4.0826
3.9178 0.6042 290000 4.0798
3.8347 0.6062 291000 4.0909
3.7539 0.6083 292000 4.0801
3.6809 0.6104 293000 4.0882
3.9774 0.6125 294000 4.0892
3.99 0.6146 295000 4.0904
3.8178 0.6167 296000 4.0853
3.8028 0.6188 297000 4.0819
3.7628 0.6208 298000 4.0846
3.8974 0.6229 299000 4.0858
3.8916 0.625 300000 4.0794
3.6816 0.6271 301000 4.0889
3.6983 0.6292 302000 4.0925
3.8691 0.6312 303000 4.0833
3.8447 0.6333 304000 4.0812
3.8257 0.6354 305000 4.0760
3.7056 0.6375 306000 4.0866
3.663 0.6396 307000 4.0830
3.8589 0.6417 308000 4.0789
3.6582 0.6438 309000 4.0806
3.7783 0.6458 310000 4.0789
3.5443 0.6479 311000 4.0866
3.8477 0.65 312000 4.0782
3.9814 0.6521 313000 4.0796
3.6754 0.6542 314000 4.0780
3.6139 0.6562 315000 4.0765
3.862 0.6583 316000 4.0782
3.7231 0.6604 317000 4.0777
3.9263 0.6625 318000 4.0740
3.6825 0.6646 319000 4.0617
3.8881 0.6667 320000 4.0692
3.8735 0.6687 321000 4.0639
4.0071 0.6708 322000 4.0702
3.9353 0.6729 323000 4.0590
3.8194 0.675 324000 4.0661
3.8207 0.6771 325000 4.0681
3.7925 0.6792 326000 4.0666
3.6689 0.6813 327000 4.0579
3.571 0.6833 328000 4.0545
3.7981 0.6854 329000 4.0564
3.5618 0.6875 330000 4.0552
3.6315 0.6896 331000 4.0509
3.6225 0.6917 332000 4.0490
3.391 0.6937 333000 4.0562
3.3154 0.6958 334000 4.0567
3.4765 0.6979 335000 4.0542
3.6147 0.7 336000 4.0432
3.3405 0.7021 337000 4.0461
3.4943 0.7042 338000 4.0509
3.4187 0.7063 339000 4.0499
3.3465 0.7083 340000 4.0498
3.4459 0.7104 341000 4.0488
3.2883 0.7125 342000 4.0545
3.4472 0.7146 343000 4.0549
3.3612 0.7167 344000 4.0524
3.5244 0.7188 345000 4.0501
3.265 0.7208 346000 4.0496
3.3411 0.7229 347000 4.0579
3.3817 0.725 348000 4.0592
3.2819 0.7271 349000 4.0600
3.427 0.7292 350000 4.0575
3.3368 0.7312 351000 4.0601
3.3201 0.7333 352000 4.0515
3.2364 0.7354 353000 4.0528
3.3919 0.7375 354000 4.0591
3.4735 0.7396 355000 4.0526
3.4194 0.7417 356000 4.0530
3.4013 0.7438 357000 4.0557
3.4342 0.7458 358000 4.0545
3.3518 0.7479 359000 4.0585
3.3723 0.75 360000 4.0546
3.2925 0.7521 361000 4.0522
3.2357 0.7542 362000 4.0526
3.3607 0.7562 363000 4.0535
3.3699 0.7583 364000 4.0549
3.3595 0.7604 365000 4.0536
3.247 0.7625 366000 4.0564
3.383 0.7646 367000 4.0516
3.2552 0.7667 368000 4.0509
3.2187 0.7688 369000 4.0511
3.3001 0.7708 370000 4.0524
3.227 0.7729 371000 4.0477
3.2463 0.775 372000 4.0482
3.2435 0.7771 373000 4.0481
3.3318 0.7792 374000 4.0482
3.4435 0.7812 375000 4.0465
3.2585 0.7833 376000 4.0413
3.3729 0.7854 377000 4.0385
3.2543 0.7875 378000 4.0430
3.3345 0.7896 379000 4.0438
3.3178 0.7917 380000 4.0443
3.2855 0.7937 381000 4.0426
3.3473 0.7958 382000 4.0372
3.2588 0.7979 383000 4.0413
3.3472 0.8 384000 4.0390
3.2474 0.8021 385000 4.0397
3.2547 0.8042 386000 4.0406
3.3257 0.8063 387000 4.0389
3.3034 0.8083 388000 4.0395
3.2571 0.8104 389000 4.0396
3.3638 0.8125 390000 4.0415
3.3034 0.8146 391000 4.0406
3.2749 0.8167 392000 4.0414
3.362 0.8187 393000 4.0383
3.329 0.8208 394000 4.0358
3.3517 0.8229 395000 4.0359
3.3818 0.825 396000 4.0355
3.4573 0.8271 397000 4.0342
3.4271 0.8292 398000 4.0367
3.3829 0.8313 399000 4.0316
3.3357 0.8333 400000 4.0337
3.2081 0.8354 401000 4.0346
3.3231 0.8375 402000 4.0315
3.3298 0.8396 403000 4.0338
3.3668 0.8417 404000 4.0367
3.4354 0.8438 405000 4.0350
3.2523 0.8458 406000 4.0339
3.3122 0.8479 407000 4.0335
3.394 0.85 408000 4.0304
3.3952 0.8521 409000 4.0309
3.2627 0.8542 410000 4.0303
3.3738 0.8562 411000 4.0293
3.2511 0.8583 412000 4.0313
3.2491 0.8604 413000 4.0282
3.3731 0.8625 414000 4.0288
3.2801 0.8646 415000 4.0294
3.4429 0.8667 416000 4.0298
3.4543 0.8688 417000 4.0262
3.3252 0.8708 418000 4.0266
3.3625 0.8729 419000 4.0246
3.2972 0.875 420000 4.0243
3.3973 0.8771 421000 4.0265
3.2867 0.8792 422000 4.0273
3.2743 0.8812 423000 4.0242
3.421 0.8833 424000 4.0249
3.253 0.8854 425000 4.0254
3.2381 0.8875 426000 4.0244
3.3784 0.8896 427000 4.0264
3.3259 0.8917 428000 4.0260
3.3907 0.8938 429000 4.0245
3.3264 0.8958 430000 4.0253
3.3454 0.8979 431000 4.0236
3.3574 0.9 432000 4.0245
3.2424 0.9021 433000 4.0241
3.2631 0.9042 434000 4.0220
3.2213 0.9062 435000 4.0215
3.4233 0.9083 436000 4.0215
3.2751 0.9104 437000 4.0216
3.2734 0.9125 438000 4.0238
3.2237 0.9146 439000 4.0246
3.2542 0.9167 440000 4.0227
3.3647 0.9187 441000 4.0240
3.3046 0.9208 442000 4.0235
3.3348 0.9229 443000 4.0245
3.3078 0.925 444000 4.0226
3.2955 0.9271 445000 4.0226
3.3743 0.9292 446000 4.0217
3.2428 0.9313 447000 4.0229
3.2628 0.9333 448000 4.0239
3.2181 0.9354 449000 4.0235
3.3325 0.9375 450000 4.0230
3.3257 0.9396 451000 4.0227
3.3099 0.9417 452000 4.0224
3.2589 0.9437 453000 4.0222
3.2739 0.9458 454000 4.0216
3.3549 0.9479 455000 4.0222
3.3398 0.95 456000 4.0222
3.3307 0.9521 457000 4.0221
3.3722 0.9542 458000 4.0211
3.3067 0.9563 459000 4.0213
3.3333 0.9583 460000 4.0219
3.3305 0.9604 461000 4.0218
3.3302 0.9625 462000 4.0221
3.3891 0.9646 463000 4.0223
3.4306 0.9667 464000 4.0220
3.3226 0.9688 465000 4.0229
3.326 0.9708 466000 4.0216
3.3749 0.9729 467000 4.0214
3.3369 0.975 468000 4.0216
3.3384 0.9771 469000 4.0215
3.3811 0.9792 470000 4.0214
3.2602 0.9812 471000 4.0217
3.4191 0.9833 472000 4.0221
3.3566 0.9854 473000 4.0221
3.1665 0.9875 474000 4.0221
3.2972 0.9896 475000 4.0218
3.3541 0.9917 476000 4.0218
3.4398 0.9938 477000 4.0217
3.3558 0.9958 478000 4.0217
3.2996 0.9979 479000 4.0221
3.3787 1.0 480000 4.0213

Framework versions

  • Transformers 4.50.3
  • Pytorch 2.6.0+cu124
  • Datasets 2.21.0
  • Tokenizers 0.21.4