Llama-3.3-70B-Instruct-v2-3d-2M-200K-0.1-reverse-padzero-99-512D-1L-4H-2048I
This model is a fine-tuned version of meta-llama/Llama-3.3-70B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1852
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 128
- eval_batch_size: 128
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 5
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 0 | 0 | 3.1906 |
| 1.7523 | 0.032 | 500 | 1.7232 |
| 1.4772 | 0.064 | 1000 | 1.4734 |
| 1.4086 | 0.096 | 1500 | 1.4104 |
| 1.3694 | 0.128 | 2000 | 1.3682 |
| 1.3449 | 0.16 | 2500 | 1.3490 |
| 1.3431 | 0.192 | 3000 | 1.3346 |
| 1.3246 | 0.224 | 3500 | 1.3294 |
| 1.3286 | 0.256 | 4000 | 1.3275 |
| 1.3148 | 0.288 | 4500 | 1.3159 |
| 1.3144 | 0.32 | 5000 | 1.3105 |
| 1.3095 | 0.352 | 5500 | 1.3107 |
| 1.3052 | 0.384 | 6000 | 1.3044 |
| 1.2998 | 0.416 | 6500 | 1.3033 |
| 1.3028 | 0.448 | 7000 | 1.3006 |
| 1.3005 | 0.48 | 7500 | 1.2995 |
| 1.2977 | 0.512 | 8000 | 1.2968 |
| 1.3005 | 0.544 | 8500 | 1.2979 |
| 1.2929 | 0.576 | 9000 | 1.2924 |
| 1.2927 | 0.608 | 9500 | 1.2920 |
| 1.2936 | 0.64 | 10000 | 1.2934 |
| 1.2931 | 0.672 | 10500 | 1.2907 |
| 1.2922 | 0.704 | 11000 | 1.2902 |
| 1.2901 | 0.736 | 11500 | 1.2897 |
| 1.2913 | 0.768 | 12000 | 1.2937 |
| 1.2826 | 0.8 | 12500 | 1.2856 |
| 1.2841 | 0.832 | 13000 | 1.2885 |
| 1.2802 | 0.864 | 13500 | 1.2838 |
| 1.2823 | 0.896 | 14000 | 1.2847 |
| 1.2818 | 0.928 | 14500 | 1.2807 |
| 1.2803 | 0.96 | 15000 | 1.2818 |
| 1.2806 | 0.992 | 15500 | 1.2805 |
| 1.2791 | 1.024 | 16000 | 1.2764 |
| 1.2746 | 1.056 | 16500 | 1.2753 |
| 1.2773 | 1.088 | 17000 | 1.2795 |
| 1.2716 | 1.12 | 17500 | 1.2728 |
| 1.2669 | 1.152 | 18000 | 1.2670 |
| 1.2642 | 1.184 | 18500 | 1.2624 |
| 1.2632 | 1.216 | 19000 | 1.2640 |
| 1.2589 | 1.248 | 19500 | 1.2586 |
| 1.2588 | 1.28 | 20000 | 1.2581 |
| 1.2545 | 1.312 | 20500 | 1.2522 |
| 1.254 | 1.3440 | 21000 | 1.2678 |
| 1.251 | 1.376 | 21500 | 1.2519 |
| 1.2528 | 1.408 | 22000 | 1.2523 |
| 1.2486 | 1.44 | 22500 | 1.2481 |
| 1.2493 | 1.472 | 23000 | 1.2473 |
| 1.2495 | 1.504 | 23500 | 1.2461 |
| 1.2444 | 1.536 | 24000 | 1.2457 |
| 1.245 | 1.568 | 24500 | 1.2454 |
| 1.2453 | 1.6 | 25000 | 1.2466 |
| 1.2397 | 1.6320 | 25500 | 1.2405 |
| 1.242 | 1.6640 | 26000 | 1.2395 |
| 1.2411 | 1.696 | 26500 | 1.2411 |
| 1.2202 | 1.728 | 27000 | 1.2193 |
| 1.2165 | 1.76 | 27500 | 1.2155 |
| 1.2192 | 1.792 | 28000 | 1.2106 |
| 1.215 | 1.8240 | 28500 | 1.2132 |
| 1.2112 | 1.8560 | 29000 | 1.2091 |
| 1.2083 | 1.888 | 29500 | 1.2151 |
| 1.208 | 1.92 | 30000 | 1.2087 |
| 1.2051 | 1.952 | 30500 | 1.2072 |
| 1.2064 | 1.984 | 31000 | 1.2027 |
| 1.2052 | 2.016 | 31500 | 1.2047 |
| 1.2051 | 2.048 | 32000 | 1.2056 |
| 1.2028 | 2.08 | 32500 | 1.2037 |
| 1.2041 | 2.112 | 33000 | 1.2090 |
| 1.2035 | 2.144 | 33500 | 1.2031 |
| 1.2033 | 2.176 | 34000 | 1.2004 |
| 1.2034 | 2.208 | 34500 | 1.2025 |
| 1.2025 | 2.24 | 35000 | 1.2017 |
| 1.2013 | 2.2720 | 35500 | 1.1998 |
| 1.1977 | 2.304 | 36000 | 1.1981 |
| 1.2001 | 2.336 | 36500 | 1.1975 |
| 1.1978 | 2.368 | 37000 | 1.1982 |
| 1.1985 | 2.4 | 37500 | 1.1970 |
| 1.1973 | 2.432 | 38000 | 1.1986 |
| 1.198 | 2.464 | 38500 | 1.1964 |
| 1.1958 | 2.496 | 39000 | 1.1962 |
| 1.1973 | 2.528 | 39500 | 1.1953 |
| 1.196 | 2.56 | 40000 | 1.1958 |
| 1.1943 | 2.592 | 40500 | 1.1929 |
| 1.196 | 2.624 | 41000 | 1.1954 |
| 1.1942 | 2.656 | 41500 | 1.1935 |
| 1.1941 | 2.6880 | 42000 | 1.1935 |
| 1.1945 | 2.7200 | 42500 | 1.1935 |
| 1.1938 | 2.752 | 43000 | 1.1945 |
| 1.1932 | 2.784 | 43500 | 1.1919 |
| 1.1918 | 2.816 | 44000 | 1.1920 |
| 1.192 | 2.848 | 44500 | 1.1909 |
| 1.1937 | 2.88 | 45000 | 1.1913 |
| 1.1908 | 2.912 | 45500 | 1.1919 |
| 1.19 | 2.944 | 46000 | 1.1904 |
| 1.1916 | 2.976 | 46500 | 1.1903 |
| 1.1882 | 3.008 | 47000 | 1.1908 |
| 1.1869 | 3.04 | 47500 | 1.1910 |
| 1.1889 | 3.072 | 48000 | 1.1898 |
| 1.1888 | 3.104 | 48500 | 1.1896 |
| 1.19 | 3.136 | 49000 | 1.1891 |
| 1.1879 | 3.168 | 49500 | 1.1888 |
| 1.1915 | 3.2 | 50000 | 1.1887 |
| 1.1889 | 3.232 | 50500 | 1.1907 |
| 1.1898 | 3.2640 | 51000 | 1.1891 |
| 1.1847 | 3.296 | 51500 | 1.1879 |
| 1.1891 | 3.328 | 52000 | 1.1881 |
| 1.1864 | 3.36 | 52500 | 1.1879 |
| 1.1863 | 3.392 | 53000 | 1.1875 |
| 1.1876 | 3.424 | 53500 | 1.1887 |
| 1.1881 | 3.456 | 54000 | 1.1875 |
| 1.1865 | 3.488 | 54500 | 1.1870 |
| 1.1846 | 3.52 | 55000 | 1.1869 |
| 1.185 | 3.552 | 55500 | 1.1872 |
| 1.1834 | 3.584 | 56000 | 1.1866 |
| 1.1864 | 3.616 | 56500 | 1.1865 |
| 1.1861 | 3.648 | 57000 | 1.1865 |
| 1.1854 | 3.68 | 57500 | 1.1862 |
| 1.1834 | 3.7120 | 58000 | 1.1860 |
| 1.1879 | 3.7440 | 58500 | 1.1861 |
| 1.1873 | 3.776 | 59000 | 1.1861 |
| 1.1849 | 3.808 | 59500 | 1.1859 |
| 1.1875 | 3.84 | 60000 | 1.1858 |
| 1.1859 | 3.872 | 60500 | 1.1858 |
| 1.1847 | 3.904 | 61000 | 1.1859 |
| 1.1844 | 3.936 | 61500 | 1.1857 |
| 1.186 | 3.968 | 62000 | 1.1856 |
| 1.1855 | 4.0 | 62500 | 1.1856 |
| 1.1853 | 4.032 | 63000 | 1.1855 |
| 1.1863 | 4.064 | 63500 | 1.1855 |
| 1.1834 | 4.096 | 64000 | 1.1854 |
| 1.1868 | 4.128 | 64500 | 1.1854 |
| 1.1834 | 4.16 | 65000 | 1.1853 |
| 1.1848 | 4.192 | 65500 | 1.1853 |
| 1.1836 | 4.224 | 66000 | 1.1853 |
| 1.1843 | 4.256 | 66500 | 1.1853 |
| 1.1869 | 4.288 | 67000 | 1.1853 |
| 1.1855 | 4.32 | 67500 | 1.1853 |
| 1.1856 | 4.352 | 68000 | 1.1852 |
| 1.1851 | 4.384 | 68500 | 1.1852 |
| 1.1845 | 4.416 | 69000 | 1.1852 |
| 1.1862 | 4.448 | 69500 | 1.1852 |
| 1.1823 | 4.48 | 70000 | 1.1852 |
| 1.1833 | 4.5120 | 70500 | 1.1852 |
| 1.1825 | 4.5440 | 71000 | 1.1852 |
| 1.1845 | 4.576 | 71500 | 1.1852 |
| 1.1834 | 4.608 | 72000 | 1.1852 |
| 1.185 | 4.64 | 72500 | 1.1852 |
| 1.1834 | 4.672 | 73000 | 1.1852 |
| 1.1855 | 4.704 | 73500 | 1.1852 |
| 1.187 | 4.736 | 74000 | 1.1852 |
| 1.1841 | 4.768 | 74500 | 1.1851 |
| 1.183 | 4.8 | 75000 | 1.1852 |
| 1.1841 | 4.832 | 75500 | 1.1852 |
| 1.1875 | 4.864 | 76000 | 1.1852 |
| 1.1843 | 4.896 | 76500 | 1.1852 |
| 1.1846 | 4.928 | 77000 | 1.1852 |
| 1.1844 | 4.96 | 77500 | 1.1852 |
| 1.1854 | 4.992 | 78000 | 1.1852 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.9.0+cu128
- Datasets 4.5.0
- Tokenizers 0.22.1
- Downloads last month
- 1,529
Model tree for arithmetic-circuit-overloading/Llama-3.3-70B-Instruct-v2-3d-2M-200K-0.1-reverse-padzero-99-512D-1L-4H-2048I
Base model
meta-llama/Llama-3.1-70B Finetuned
meta-llama/Llama-3.3-70B-Instruct