Llama-3.3-70B-Instruct-v2-3d-2M-200K-0.1-reverse-padzero-99-512D-1L-4H-2048I

This model is a fine-tuned version of meta-llama/Llama-3.3-70B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1852

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 128
  • eval_batch_size: 128
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 3.1906
1.7523 0.032 500 1.7232
1.4772 0.064 1000 1.4734
1.4086 0.096 1500 1.4104
1.3694 0.128 2000 1.3682
1.3449 0.16 2500 1.3490
1.3431 0.192 3000 1.3346
1.3246 0.224 3500 1.3294
1.3286 0.256 4000 1.3275
1.3148 0.288 4500 1.3159
1.3144 0.32 5000 1.3105
1.3095 0.352 5500 1.3107
1.3052 0.384 6000 1.3044
1.2998 0.416 6500 1.3033
1.3028 0.448 7000 1.3006
1.3005 0.48 7500 1.2995
1.2977 0.512 8000 1.2968
1.3005 0.544 8500 1.2979
1.2929 0.576 9000 1.2924
1.2927 0.608 9500 1.2920
1.2936 0.64 10000 1.2934
1.2931 0.672 10500 1.2907
1.2922 0.704 11000 1.2902
1.2901 0.736 11500 1.2897
1.2913 0.768 12000 1.2937
1.2826 0.8 12500 1.2856
1.2841 0.832 13000 1.2885
1.2802 0.864 13500 1.2838
1.2823 0.896 14000 1.2847
1.2818 0.928 14500 1.2807
1.2803 0.96 15000 1.2818
1.2806 0.992 15500 1.2805
1.2791 1.024 16000 1.2764
1.2746 1.056 16500 1.2753
1.2773 1.088 17000 1.2795
1.2716 1.12 17500 1.2728
1.2669 1.152 18000 1.2670
1.2642 1.184 18500 1.2624
1.2632 1.216 19000 1.2640
1.2589 1.248 19500 1.2586
1.2588 1.28 20000 1.2581
1.2545 1.312 20500 1.2522
1.254 1.3440 21000 1.2678
1.251 1.376 21500 1.2519
1.2528 1.408 22000 1.2523
1.2486 1.44 22500 1.2481
1.2493 1.472 23000 1.2473
1.2495 1.504 23500 1.2461
1.2444 1.536 24000 1.2457
1.245 1.568 24500 1.2454
1.2453 1.6 25000 1.2466
1.2397 1.6320 25500 1.2405
1.242 1.6640 26000 1.2395
1.2411 1.696 26500 1.2411
1.2202 1.728 27000 1.2193
1.2165 1.76 27500 1.2155
1.2192 1.792 28000 1.2106
1.215 1.8240 28500 1.2132
1.2112 1.8560 29000 1.2091
1.2083 1.888 29500 1.2151
1.208 1.92 30000 1.2087
1.2051 1.952 30500 1.2072
1.2064 1.984 31000 1.2027
1.2052 2.016 31500 1.2047
1.2051 2.048 32000 1.2056
1.2028 2.08 32500 1.2037
1.2041 2.112 33000 1.2090
1.2035 2.144 33500 1.2031
1.2033 2.176 34000 1.2004
1.2034 2.208 34500 1.2025
1.2025 2.24 35000 1.2017
1.2013 2.2720 35500 1.1998
1.1977 2.304 36000 1.1981
1.2001 2.336 36500 1.1975
1.1978 2.368 37000 1.1982
1.1985 2.4 37500 1.1970
1.1973 2.432 38000 1.1986
1.198 2.464 38500 1.1964
1.1958 2.496 39000 1.1962
1.1973 2.528 39500 1.1953
1.196 2.56 40000 1.1958
1.1943 2.592 40500 1.1929
1.196 2.624 41000 1.1954
1.1942 2.656 41500 1.1935
1.1941 2.6880 42000 1.1935
1.1945 2.7200 42500 1.1935
1.1938 2.752 43000 1.1945
1.1932 2.784 43500 1.1919
1.1918 2.816 44000 1.1920
1.192 2.848 44500 1.1909
1.1937 2.88 45000 1.1913
1.1908 2.912 45500 1.1919
1.19 2.944 46000 1.1904
1.1916 2.976 46500 1.1903
1.1882 3.008 47000 1.1908
1.1869 3.04 47500 1.1910
1.1889 3.072 48000 1.1898
1.1888 3.104 48500 1.1896
1.19 3.136 49000 1.1891
1.1879 3.168 49500 1.1888
1.1915 3.2 50000 1.1887
1.1889 3.232 50500 1.1907
1.1898 3.2640 51000 1.1891
1.1847 3.296 51500 1.1879
1.1891 3.328 52000 1.1881
1.1864 3.36 52500 1.1879
1.1863 3.392 53000 1.1875
1.1876 3.424 53500 1.1887
1.1881 3.456 54000 1.1875
1.1865 3.488 54500 1.1870
1.1846 3.52 55000 1.1869
1.185 3.552 55500 1.1872
1.1834 3.584 56000 1.1866
1.1864 3.616 56500 1.1865
1.1861 3.648 57000 1.1865
1.1854 3.68 57500 1.1862
1.1834 3.7120 58000 1.1860
1.1879 3.7440 58500 1.1861
1.1873 3.776 59000 1.1861
1.1849 3.808 59500 1.1859
1.1875 3.84 60000 1.1858
1.1859 3.872 60500 1.1858
1.1847 3.904 61000 1.1859
1.1844 3.936 61500 1.1857
1.186 3.968 62000 1.1856
1.1855 4.0 62500 1.1856
1.1853 4.032 63000 1.1855
1.1863 4.064 63500 1.1855
1.1834 4.096 64000 1.1854
1.1868 4.128 64500 1.1854
1.1834 4.16 65000 1.1853
1.1848 4.192 65500 1.1853
1.1836 4.224 66000 1.1853
1.1843 4.256 66500 1.1853
1.1869 4.288 67000 1.1853
1.1855 4.32 67500 1.1853
1.1856 4.352 68000 1.1852
1.1851 4.384 68500 1.1852
1.1845 4.416 69000 1.1852
1.1862 4.448 69500 1.1852
1.1823 4.48 70000 1.1852
1.1833 4.5120 70500 1.1852
1.1825 4.5440 71000 1.1852
1.1845 4.576 71500 1.1852
1.1834 4.608 72000 1.1852
1.185 4.64 72500 1.1852
1.1834 4.672 73000 1.1852
1.1855 4.704 73500 1.1852
1.187 4.736 74000 1.1852
1.1841 4.768 74500 1.1851
1.183 4.8 75000 1.1852
1.1841 4.832 75500 1.1852
1.1875 4.864 76000 1.1852
1.1843 4.896 76500 1.1852
1.1846 4.928 77000 1.1852
1.1844 4.96 77500 1.1852
1.1854 4.992 78000 1.1852

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.0+cu128
  • Datasets 4.5.0
  • Tokenizers 0.22.1
Downloads last month
1,529
Safetensors
Model size
4.22M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for arithmetic-circuit-overloading/Llama-3.3-70B-Instruct-v2-3d-2M-200K-0.1-reverse-padzero-99-512D-1L-4H-2048I

Finetuned
(598)
this model