SemiticGPT / eval /sft_3b.log
ronnengmail's picture
Upload eval/sft_3b.log with huggingface_hub
bffe8cd verified
Device: cuda
Loading tokenizer: /tmp/eval/multilingual_32k.model
Loading base model: /tmp/eval/best_model.pt
Model loaded: 3.04B parameters
Loading SFT data from: /tmp/sft_data_v2
Train: 3949348 tokens, Val: 201020 tokens
Using 8-bit AdamW (bitsandbytes)
Starting SFT training for 4000 steps...
Batch size: 1 x 4 accum = 4 effective, Seq len: 2048, LR: 2e-05
Step 10/4000 | Loss: 2.3791 | LR: 0.000001 | TPS: 1196 | 68s
Step 20/4000 | Loss: 2.5346 | LR: 0.000002 | TPS: 1418 | 116s
Step 30/4000 | Loss: 2.7910 | LR: 0.000003 | TPS: 1511 | 163s
Step 40/4000 | Loss: 2.5189 | LR: 0.000004 | TPS: 1562 | 210s
Step 50/4000 | Loss: 2.5049 | LR: 0.000005 | TPS: 1594 | 257s
Step 60/4000 | Loss: 2.5417 | LR: 0.000006 | TPS: 1616 | 304s
Step 70/4000 | Loss: 2.2374 | LR: 0.000007 | TPS: 1633 | 351s
Step 80/4000 | Loss: 2.5328 | LR: 0.000008 | TPS: 1645 | 398s
Step 90/4000 | Loss: 2.5359 | LR: 0.000009 | TPS: 1655 | 445s
Step 100/4000 | Loss: 2.4830 | LR: 0.000010 | TPS: 1663 | 493s
Step 110/4000 | Loss: 2.3015 | LR: 0.000011 | TPS: 1669 | 540s
Step 120/4000 | Loss: 2.4667 | LR: 0.000012 | TPS: 1675 | 587s
Step 130/4000 | Loss: 2.3792 | LR: 0.000013 | TPS: 1680 | 634s
Step 140/4000 | Loss: 2.3918 | LR: 0.000014 | TPS: 1684 | 681s
Step 150/4000 | Loss: 2.3368 | LR: 0.000015 | TPS: 1687 | 728s
Step 160/4000 | Loss: 2.4838 | LR: 0.000016 | TPS: 1690 | 775s
Step 170/4000 | Loss: 2.3578 | LR: 0.000017 | TPS: 1693 | 823s
Step 180/4000 | Loss: 2.5485 | LR: 0.000018 | TPS: 1695 | 870s
Step 190/4000 | Loss: 2.0834 | LR: 0.000019 | TPS: 1698 | 917s
Step 200/4000 | Loss: 1.9784 | LR: 0.000020 | TPS: 1699 | 964s
Step 210/4000 | Loss: 2.4826 | LR: 0.000020 | TPS: 1701 | 1011s
Step 220/4000 | Loss: 2.3540 | LR: 0.000020 | TPS: 1703 | 1058s
Step 230/4000 | Loss: 2.2093 | LR: 0.000020 | TPS: 1704 | 1105s
Step 240/4000 | Loss: 2.2137 | LR: 0.000020 | TPS: 1706 | 1153s
Step 250/4000 | Loss: 2.2151 | LR: 0.000020 | TPS: 1707 | 1200s
Step 260/4000 | Loss: 2.2535 | LR: 0.000020 | TPS: 1708 | 1247s
Step 270/4000 | Loss: 2.2235 | LR: 0.000020 | TPS: 1709 | 1294s
Step 280/4000 | Loss: 2.0449 | LR: 0.000020 | TPS: 1710 | 1341s
Step 290/4000 | Loss: 2.1502 | LR: 0.000020 | TPS: 1711 | 1388s
Step 300/4000 | Loss: 2.3716 | LR: 0.000020 | TPS: 1712 | 1435s
Step 310/4000 | Loss: 2.1591 | LR: 0.000020 | TPS: 1713 | 1483s
Step 320/4000 | Loss: 2.2153 | LR: 0.000020 | TPS: 1714 | 1530s
Step 330/4000 | Loss: 2.2023 | LR: 0.000020 | TPS: 1714 | 1577s
Step 340/4000 | Loss: 2.3968 | LR: 0.000020 | TPS: 1715 | 1624s
Step 350/4000 | Loss: 2.1146 | LR: 0.000020 | TPS: 1716 | 1671s
Step 360/4000 | Loss: 2.1857 | LR: 0.000020 | TPS: 1716 | 1718s
Step 370/4000 | Loss: 2.1965 | LR: 0.000020 | TPS: 1717 | 1765s
Step 380/4000 | Loss: 2.1613 | LR: 0.000020 | TPS: 1717 | 1813s
Step 390/4000 | Loss: 2.3080 | LR: 0.000020 | TPS: 1718 | 1860s
Step 400/4000 | Loss: 2.2964 | LR: 0.000020 | TPS: 1718 | 1907s
📊 Val loss: 2.2256 (NEW BEST!)
💾 Best model saved to /tmp/sft/sft_model_v2.pt
Step 410/4000 | Loss: 2.2859 | LR: 0.000020 | TPS: 1703 | 1973s
Step 420/4000 | Loss: 2.1711 | LR: 0.000020 | TPS: 1703 | 2020s
Step 430/4000 | Loss: 2.1434 | LR: 0.000020 | TPS: 1704 | 2067s
Step 440/4000 | Loss: 2.2115 | LR: 0.000020 | TPS: 1705 | 2114s
Step 450/4000 | Loss: 2.2985 | LR: 0.000020 | TPS: 1706 | 2161s
Step 460/4000 | Loss: 1.9845 | LR: 0.000020 | TPS: 1707 | 2208s
Step 470/4000 | Loss: 2.3135 | LR: 0.000020 | TPS: 1707 | 2255s
Step 480/4000 | Loss: 2.3004 | LR: 0.000020 | TPS: 1708 | 2302s
Step 490/4000 | Loss: 2.1841 | LR: 0.000020 | TPS: 1709 | 2349s
Step 500/4000 | Loss: 2.3647 | LR: 0.000020 | TPS: 1709 | 2396s
Step 510/4000 | Loss: 2.1587 | LR: 0.000020 | TPS: 1710 | 2443s
Step 520/4000 | Loss: 2.0790 | LR: 0.000020 | TPS: 1711 | 2490s
Step 530/4000 | Loss: 2.0842 | LR: 0.000020 | TPS: 1711 | 2537s
Step 540/4000 | Loss: 2.4031 | LR: 0.000020 | TPS: 1712 | 2584s
Step 550/4000 | Loss: 2.3037 | LR: 0.000020 | TPS: 1712 | 2632s
Step 560/4000 | Loss: 2.2433 | LR: 0.000020 | TPS: 1713 | 2679s
Step 570/4000 | Loss: 2.1670 | LR: 0.000020 | TPS: 1713 | 2726s
Step 580/4000 | Loss: 2.1579 | LR: 0.000020 | TPS: 1714 | 2773s
Step 590/4000 | Loss: 1.9392 | LR: 0.000020 | TPS: 1714 | 2820s
Step 600/4000 | Loss: 2.1226 | LR: 0.000020 | TPS: 1715 | 2867s
Step 610/4000 | Loss: 2.2641 | LR: 0.000019 | TPS: 1715 | 2914s
Step 620/4000 | Loss: 2.0771 | LR: 0.000019 | TPS: 1715 | 2961s
Step 630/4000 | Loss: 2.4527 | LR: 0.000019 | TPS: 1716 | 3008s
Step 640/4000 | Loss: 2.2605 | LR: 0.000019 | TPS: 1716 | 3055s
Step 650/4000 | Loss: 1.9801 | LR: 0.000019 | TPS: 1717 | 3102s
Step 660/4000 | Loss: 2.4208 | LR: 0.000019 | TPS: 1717 | 3149s
Step 670/4000 | Loss: 2.3331 | LR: 0.000019 | TPS: 1717 | 3196s
Step 680/4000 | Loss: 2.1299 | LR: 0.000019 | TPS: 1718 | 3243s
Step 690/4000 | Loss: 2.1551 | LR: 0.000019 | TPS: 1718 | 3290s
Step 700/4000 | Loss: 2.0940 | LR: 0.000019 | TPS: 1718 | 3337s
Step 710/4000 | Loss: 2.0533 | LR: 0.000019 | TPS: 1719 | 3384s
Step 720/4000 | Loss: 2.2076 | LR: 0.000019 | TPS: 1719 | 3431s
Step 730/4000 | Loss: 1.9816 | LR: 0.000019 | TPS: 1719 | 3478s
Step 740/4000 | Loss: 2.1420 | LR: 0.000019 | TPS: 1719 | 3526s
Step 750/4000 | Loss: 2.2928 | LR: 0.000019 | TPS: 1720 | 3573s
Step 760/4000 | Loss: 2.1035 | LR: 0.000019 | TPS: 1720 | 3620s
Step 770/4000 | Loss: 2.1663 | LR: 0.000019 | TPS: 1720 | 3667s
Step 780/4000 | Loss: 2.2270 | LR: 0.000019 | TPS: 1721 | 3714s
Step 790/4000 | Loss: 2.1436 | LR: 0.000019 | TPS: 1721 | 3761s
Step 800/4000 | Loss: 2.3599 | LR: 0.000019 | TPS: 1721 | 3808s
📊 Val loss: 2.1960 (NEW BEST!)
💾 Best model saved to /tmp/sft/sft_model_v2.pt
Step 810/4000 | Loss: 2.2325 | LR: 0.000019 | TPS: 1696 | 3912s
Step 820/4000 | Loss: 2.0798 | LR: 0.000019 | TPS: 1696 | 3960s
Step 830/4000 | Loss: 2.1527 | LR: 0.000019 | TPS: 1697 | 4007s
Step 840/4000 | Loss: 2.2046 | LR: 0.000019 | TPS: 1697 | 4054s
Step 850/4000 | Loss: 2.0648 | LR: 0.000019 | TPS: 1698 | 4101s
Step 860/4000 | Loss: 2.1708 | LR: 0.000019 | TPS: 1698 | 4148s
Step 870/4000 | Loss: 2.3088 | LR: 0.000019 | TPS: 1699 | 4195s
Step 880/4000 | Loss: 1.9936 | LR: 0.000019 | TPS: 1699 | 4242s
Step 890/4000 | Loss: 2.1869 | LR: 0.000019 | TPS: 1700 | 4290s
Step 900/4000 | Loss: 2.4199 | LR: 0.000019 | TPS: 1700 | 4337s
Step 910/4000 | Loss: 2.3803 | LR: 0.000018 | TPS: 1700 | 4384s
Step 920/4000 | Loss: 2.0193 | LR: 0.000018 | TPS: 1701 | 4431s
Step 930/4000 | Loss: 2.1047 | LR: 0.000018 | TPS: 1701 | 4478s
Step 940/4000 | Loss: 2.1449 | LR: 0.000018 | TPS: 1702 | 4525s
Step 950/4000 | Loss: 2.1521 | LR: 0.000018 | TPS: 1702 | 4572s
Step 960/4000 | Loss: 2.2820 | LR: 0.000018 | TPS: 1702 | 4620s
Step 970/4000 | Loss: 2.2996 | LR: 0.000018 | TPS: 1703 | 4667s
Step 980/4000 | Loss: 2.3187 | LR: 0.000018 | TPS: 1703 | 4714s
Step 990/4000 | Loss: 2.1756 | LR: 0.000018 | TPS: 1703 | 4761s
Step 1000/4000 | Loss: 1.9765 | LR: 0.000018 | TPS: 1704 | 4808s
🔤 Generation samples (step 1000):
[EN] The capital of France is located in Normandy.
[HE] מלזיה.
[AR] باريس.
[FA] پاریس یکی از شهرهای بزرگ و تاریخی جهان است که دارای جاذبه های طبیعی، فرهنگی و اقتصادی متعددی می باشد. شهر پاریس در غرب کشورمان قرار دارد و به عنوان یکی از مهم ترین مراکز تجاری و مالی دنیا شناخته شده ا
[TRANSLATE] "תודה על הכול, אבא. אני כאן איתך בכל רגע נתון."
Step 1010/4000 | Loss: 2.1665 | LR: 0.000018 | TPS: 1703 | 4859s
Step 1020/4000 | Loss: 2.1047 | LR: 0.000018 | TPS: 1703 | 4906s
Step 1030/4000 | Loss: 2.2359 | LR: 0.000018 | TPS: 1704 | 4953s
Step 1040/4000 | Loss: 2.0109 | LR: 0.000018 | TPS: 1704 | 5000s
Step 1050/4000 | Loss: 2.1515 | LR: 0.000018 | TPS: 1704 | 5047s
Step 1060/4000 | Loss: 2.0880 | LR: 0.000018 | TPS: 1705 | 5094s
Step 1070/4000 | Loss: 2.2460 | LR: 0.000018 | TPS: 1705 | 5142s
Step 1080/4000 | Loss: 1.9325 | LR: 0.000018 | TPS: 1705 | 5189s
Step 1090/4000 | Loss: 2.2283 | LR: 0.000018 | TPS: 1705 | 5236s
Step 1100/4000 | Loss: 2.3303 | LR: 0.000018 | TPS: 1706 | 5283s
Step 1110/4000 | Loss: 2.1772 | LR: 0.000018 | TPS: 1706 | 5330s
Step 1120/4000 | Loss: 2.1615 | LR: 0.000018 | TPS: 1706 | 5377s
Step 1130/4000 | Loss: 2.1470 | LR: 0.000017 | TPS: 1707 | 5424s
Step 1140/4000 | Loss: 1.9640 | LR: 0.000017 | TPS: 1707 | 5472s
Step 1150/4000 | Loss: 2.1891 | LR: 0.000017 | TPS: 1707 | 5519s
Step 1160/4000 | Loss: 2.2183 | LR: 0.000017 | TPS: 1707 | 5566s
Step 1170/4000 | Loss: 2.0268 | LR: 0.000017 | TPS: 1708 | 5613s
Step 1180/4000 | Loss: 2.2234 | LR: 0.000017 | TPS: 1708 | 5660s
Step 1190/4000 | Loss: 2.1961 | LR: 0.000017 | TPS: 1708 | 5707s
Step 1200/4000 | Loss: 2.2019 | LR: 0.000017 | TPS: 1708 | 5754s
📊 Val loss: 2.2238
Step 1210/4000 | Loss: 2.0809 | LR: 0.000017 | TPS: 1707 | 5807s
Step 1220/4000 | Loss: 2.1716 | LR: 0.000017 | TPS: 1707 | 5854s
Step 1230/4000 | Loss: 2.2607 | LR: 0.000017 | TPS: 1707 | 5901s
Step 1240/4000 | Loss: 2.1838 | LR: 0.000017 | TPS: 1708 | 5949s
Step 1250/4000 | Loss: 2.0725 | LR: 0.000017 | TPS: 1708 | 5996s
Step 1260/4000 | Loss: 2.2797 | LR: 0.000017 | TPS: 1708 | 6043s
Step 1270/4000 | Loss: 2.0366 | LR: 0.000017 | TPS: 1708 | 6090s
Step 1280/4000 | Loss: 2.1469 | LR: 0.000017 | TPS: 1709 | 6137s
Step 1290/4000 | Loss: 2.1541 | LR: 0.000017 | TPS: 1709 | 6184s
Step 1300/4000 | Loss: 2.0311 | LR: 0.000017 | TPS: 1709 | 6231s
Step 1310/4000 | Loss: 2.1828 | LR: 0.000016 | TPS: 1709 | 6279s
Step 1320/4000 | Loss: 2.2004 | LR: 0.000016 | TPS: 1709 | 6326s
Step 1330/4000 | Loss: 2.2589 | LR: 0.000016 | TPS: 1710 | 6373s
Step 1340/4000 | Loss: 2.1475 | LR: 0.000016 | TPS: 1710 | 6420s
Step 1350/4000 | Loss: 2.1672 | LR: 0.000016 | TPS: 1710 | 6467s
Step 1360/4000 | Loss: 2.1921 | LR: 0.000016 | TPS: 1710 | 6514s
Step 1370/4000 | Loss: 2.0689 | LR: 0.000016 | TPS: 1710 | 6561s
Step 1380/4000 | Loss: 2.2560 | LR: 0.000016 | TPS: 1711 | 6609s
Step 1390/4000 | Loss: 1.9519 | LR: 0.000016 | TPS: 1711 | 6656s
Step 1400/4000 | Loss: 1.9671 | LR: 0.000016 | TPS: 1711 | 6703s
Step 1410/4000 | Loss: 2.1535 | LR: 0.000016 | TPS: 1711 | 6750s
Step 1420/4000 | Loss: 2.1726 | LR: 0.000016 | TPS: 1711 | 6797s
Step 1430/4000 | Loss: 2.0854 | LR: 0.000016 | TPS: 1712 | 6844s
Step 1440/4000 | Loss: 2.0955 | LR: 0.000016 | TPS: 1712 | 6891s
Step 1450/4000 | Loss: 2.1260 | LR: 0.000016 | TPS: 1712 | 6939s
Step 1460/4000 | Loss: 2.2860 | LR: 0.000016 | TPS: 1712 | 6986s
Step 1470/4000 | Loss: 1.6098 | LR: 0.000015 | TPS: 1712 | 7033s
Step 1480/4000 | Loss: 2.1327 | LR: 0.000015 | TPS: 1712 | 7080s
Step 1490/4000 | Loss: 2.0506 | LR: 0.000015 | TPS: 1713 | 7127s
Step 1500/4000 | Loss: 2.0568 | LR: 0.000015 | TPS: 1713 | 7174s
Step 1510/4000 | Loss: 2.0177 | LR: 0.000015 | TPS: 1713 | 7221s
Step 1520/4000 | Loss: 2.0383 | LR: 0.000015 | TPS: 1713 | 7269s
Step 1530/4000 | Loss: 2.0994 | LR: 0.000015 | TPS: 1713 | 7316s
Step 1540/4000 | Loss: 2.0863 | LR: 0.000015 | TPS: 1713 | 7363s
Step 1550/4000 | Loss: 2.3287 | LR: 0.000015 | TPS: 1714 | 7410s
Step 1560/4000 | Loss: 2.1585 | LR: 0.000015 | TPS: 1714 | 7457s
Step 1570/4000 | Loss: 1.9781 | LR: 0.000015 | TPS: 1714 | 7504s
Step 1580/4000 | Loss: 1.9344 | LR: 0.000015 | TPS: 1714 | 7551s
Step 1590/4000 | Loss: 2.1031 | LR: 0.000015 | TPS: 1714 | 7599s
Step 1600/4000 | Loss: 2.2633 | LR: 0.000015 | TPS: 1714 | 7646s
📊 Val loss: 2.1164 (NEW BEST!)
💾 Best model saved to /tmp/sft/sft_model_v2.pt
Step 1610/4000 | Loss: 2.0217 | LR: 0.000015 | TPS: 1702 | 7750s
Step 1620/4000 | Loss: 2.0437 | LR: 0.000014 | TPS: 1702 | 7797s
Step 1630/4000 | Loss: 2.3588 | LR: 0.000014 | TPS: 1702 | 7844s
Step 1640/4000 | Loss: 2.1927 | LR: 0.000014 | TPS: 1702 | 7892s
Step 1650/4000 | Loss: 1.9298 | LR: 0.000014 | TPS: 1703 | 7939s
Step 1660/4000 | Loss: 2.1604 | LR: 0.000014 | TPS: 1703 | 7986s
Step 1670/4000 | Loss: 2.0326 | LR: 0.000014 | TPS: 1703 | 8033s
Step 1680/4000 | Loss: 2.1872 | LR: 0.000014 | TPS: 1703 | 8080s
Step 1690/4000 | Loss: 2.0633 | LR: 0.000014 | TPS: 1703 | 8127s
Step 1700/4000 | Loss: 2.2547 | LR: 0.000014 | TPS: 1704 | 8174s
Step 1710/4000 | Loss: 1.8940 | LR: 0.000014 | TPS: 1704 | 8221s
Step 1720/4000 | Loss: 2.0726 | LR: 0.000014 | TPS: 1704 | 8269s
Step 1730/4000 | Loss: 2.0857 | LR: 0.000014 | TPS: 1704 | 8316s
Step 1740/4000 | Loss: 2.0686 | LR: 0.000014 | TPS: 1704 | 8363s
Step 1750/4000 | Loss: 2.1306 | LR: 0.000014 | TPS: 1705 | 8410s
Step 1760/4000 | Loss: 2.0932 | LR: 0.000013 | TPS: 1705 | 8457s
Step 1770/4000 | Loss: 2.0751 | LR: 0.000013 | TPS: 1705 | 8504s
Step 1780/4000 | Loss: 2.1802 | LR: 0.000013 | TPS: 1705 | 8551s
Step 1790/4000 | Loss: 1.6657 | LR: 0.000013 | TPS: 1705 | 8599s
Step 1800/4000 | Loss: 2.1290 | LR: 0.000013 | TPS: 1706 | 8646s
Step 1810/4000 | Loss: 2.1032 | LR: 0.000013 | TPS: 1706 | 8693s
Step 1820/4000 | Loss: 2.1255 | LR: 0.000013 | TPS: 1706 | 8740s
Step 1830/4000 | Loss: 2.1091 | LR: 0.000013 | TPS: 1706 | 8787s
Step 1840/4000 | Loss: 1.9875 | LR: 0.000013 | TPS: 1706 | 8834s
Step 1850/4000 | Loss: 1.9615 | LR: 0.000013 | TPS: 1706 | 8881s
Step 1860/4000 | Loss: 2.0189 | LR: 0.000013 | TPS: 1707 | 8929s
Step 1870/4000 | Loss: 2.1387 | LR: 0.000013 | TPS: 1707 | 8976s
Step 1880/4000 | Loss: 2.0963 | LR: 0.000013 | TPS: 1707 | 9023s
Step 1890/4000 | Loss: 2.1750 | LR: 0.000013 | TPS: 1707 | 9070s
Step 1900/4000 | Loss: 2.3945 | LR: 0.000012 | TPS: 1707 | 9117s
Step 1910/4000 | Loss: 2.1515 | LR: 0.000012 | TPS: 1707 | 9164s
Step 1920/4000 | Loss: 2.2224 | LR: 0.000012 | TPS: 1708 | 9211s
Step 1930/4000 | Loss: 2.3160 | LR: 0.000012 | TPS: 1708 | 9259s
Step 1940/4000 | Loss: 2.0126 | LR: 0.000012 | TPS: 1708 | 9306s
Step 1950/4000 | Loss: 2.2443 | LR: 0.000012 | TPS: 1708 | 9353s
Step 1960/4000 | Loss: 1.9590 | LR: 0.000012 | TPS: 1708 | 9400s
Step 1970/4000 | Loss: 2.2280 | LR: 0.000012 | TPS: 1708 | 9447s
Step 1980/4000 | Loss: 1.9723 | LR: 0.000012 | TPS: 1708 | 9494s
Step 1990/4000 | Loss: 2.0697 | LR: 0.000012 | TPS: 1709 | 9541s
Step 2000/4000 | Loss: 2.0568 | LR: 0.000012 | TPS: 1709 | 9589s
📊 Val loss: 2.1674
🔤 Generation samples (step 2000):
[EN] Paris (pronounced "Paris") is a city located in northeastern France. It borders Germany to the east, with Belgium and Luxembourg as its easternmost provinces.
[HE] בצרפת, העיר העתיקה היא אזור התיירות העיקרי.
[AR] باريس
[FA] پاریس، پایتخت کشور فرانسه است.
[TRANSLATE] The answer is YES.
Step 2010/4000 | Loss: 1.9474 | LR: 0.000012 | TPS: 1708 | 9643s
Step 2020/4000 | Loss: 2.1131 | LR: 0.000012 | TPS: 1708 | 9690s
Step 2030/4000 | Loss: 2.0446 | LR: 0.000012 | TPS: 1708 | 9737s
Step 2040/4000 | Loss: 2.2229 | LR: 0.000011 | TPS: 1708 | 9784s
Step 2050/4000 | Loss: 2.1576 | LR: 0.000011 | TPS: 1708 | 9832s
Step 2060/4000 | Loss: 2.1899 | LR: 0.000011 | TPS: 1708 | 9879s
Step 2070/4000 | Loss: 2.0957 | LR: 0.000011 | TPS: 1708 | 9926s
Step 2080/4000 | Loss: 2.2643 | LR: 0.000011 | TPS: 1709 | 9973s
Step 2090/4000 | Loss: 2.0676 | LR: 0.000011 | TPS: 1709 | 10020s
Step 2100/4000 | Loss: 2.1386 | LR: 0.000011 | TPS: 1709 | 10067s
Step 2110/4000 | Loss: 2.1891 | LR: 0.000011 | TPS: 1709 | 10114s
Step 2120/4000 | Loss: 1.9532 | LR: 0.000011 | TPS: 1709 | 10162s
Step 2130/4000 | Loss: 1.9766 | LR: 0.000011 | TPS: 1709 | 10209s
Step 2140/4000 | Loss: 2.3656 | LR: 0.000011 | TPS: 1709 | 10256s
Step 2150/4000 | Loss: 2.0545 | LR: 0.000011 | TPS: 1709 | 10303s
Step 2160/4000 | Loss: 1.9706 | LR: 0.000011 | TPS: 1710 | 10350s
Step 2170/4000 | Loss: 2.0302 | LR: 0.000010 | TPS: 1710 | 10397s
Step 2180/4000 | Loss: 2.1752 | LR: 0.000010 | TPS: 1710 | 10444s
Step 2190/4000 | Loss: 2.1455 | LR: 0.000010 | TPS: 1710 | 10492s
Step 2200/4000 | Loss: 2.2238 | LR: 0.000010 | TPS: 1710 | 10539s
Step 2210/4000 | Loss: 2.1010 | LR: 0.000010 | TPS: 1710 | 10586s
Step 2220/4000 | Loss: 2.1831 | LR: 0.000010 | TPS: 1710 | 10633s
Step 2230/4000 | Loss: 1.6542 | LR: 0.000010 | TPS: 1710 | 10680s
Step 2240/4000 | Loss: 2.1102 | LR: 0.000010 | TPS: 1711 | 10727s
Step 2250/4000 | Loss: 2.2099 | LR: 0.000010 | TPS: 1711 | 10774s
Step 2260/4000 | Loss: 2.1750 | LR: 0.000010 | TPS: 1711 | 10821s
Step 2270/4000 | Loss: 2.2369 | LR: 0.000010 | TPS: 1711 | 10869s
Step 2280/4000 | Loss: 2.0393 | LR: 0.000010 | TPS: 1711 | 10916s
Step 2290/4000 | Loss: 2.3140 | LR: 0.000010 | TPS: 1711 | 10963s
Step 2300/4000 | Loss: 2.0601 | LR: 0.000010 | TPS: 1711 | 11010s
Step 2310/4000 | Loss: 2.1472 | LR: 0.000009 | TPS: 1711 | 11057s
Step 2320/4000 | Loss: 2.0987 | LR: 0.000009 | TPS: 1712 | 11104s
Step 2330/4000 | Loss: 2.0354 | LR: 0.000009 | TPS: 1712 | 11152s
Step 2340/4000 | Loss: 1.9309 | LR: 0.000009 | TPS: 1712 | 11199s
Step 2350/4000 | Loss: 2.1222 | LR: 0.000009 | TPS: 1712 | 11246s
Step 2360/4000 | Loss: 1.9861 | LR: 0.000009 | TPS: 1712 | 11293s
Step 2370/4000 | Loss: 2.1986 | LR: 0.000009 | TPS: 1712 | 11340s
Step 2380/4000 | Loss: 2.0335 | LR: 0.000009 | TPS: 1712 | 11387s
Step 2390/4000 | Loss: 2.2123 | LR: 0.000009 | TPS: 1712 | 11434s
Step 2400/4000 | Loss: 2.0287 | LR: 0.000009 | TPS: 1712 | 11482s
📊 Val loss: 2.1943
Step 2410/4000 | Loss: 2.0483 | LR: 0.000009 | TPS: 1712 | 11534s
Step 2420/4000 | Loss: 2.0710 | LR: 0.000009 | TPS: 1712 | 11581s
Step 2430/4000 | Loss: 2.3005 | LR: 0.000009 | TPS: 1712 | 11629s
Step 2440/4000 | Loss: 2.0617 | LR: 0.000009 | TPS: 1712 | 11676s
Step 2450/4000 | Loss: 2.2063 | LR: 0.000008 | TPS: 1712 | 11723s
Step 2460/4000 | Loss: 2.0405 | LR: 0.000008 | TPS: 1712 | 11770s
Step 2470/4000 | Loss: 2.2280 | LR: 0.000008 | TPS: 1712 | 11817s
Step 2480/4000 | Loss: 2.3856 | LR: 0.000008 | TPS: 1712 | 11864s
Step 2490/4000 | Loss: 1.9853 | LR: 0.000008 | TPS: 1712 | 11911s
Step 2500/4000 | Loss: 2.0673 | LR: 0.000008 | TPS: 1713 | 11959s
Step 2510/4000 | Loss: 2.1777 | LR: 0.000008 | TPS: 1713 | 12006s
Step 2520/4000 | Loss: 1.9846 | LR: 0.000008 | TPS: 1713 | 12053s
Step 2530/4000 | Loss: 2.1922 | LR: 0.000008 | TPS: 1713 | 12100s
Step 2540/4000 | Loss: 2.0542 | LR: 0.000008 | TPS: 1713 | 12147s
Step 2550/4000 | Loss: 2.1041 | LR: 0.000008 | TPS: 1713 | 12194s
Step 2560/4000 | Loss: 2.0099 | LR: 0.000008 | TPS: 1713 | 12241s
Step 2570/4000 | Loss: 1.8186 | LR: 0.000008 | TPS: 1713 | 12289s
Step 2580/4000 | Loss: 2.2079 | LR: 0.000008 | TPS: 1713 | 12336s
Step 2590/4000 | Loss: 1.9931 | LR: 0.000007 | TPS: 1713 | 12383s
Step 2600/4000 | Loss: 2.0986 | LR: 0.000007 | TPS: 1714 | 12430s
Step 2610/4000 | Loss: 2.0439 | LR: 0.000007 | TPS: 1714 | 12477s
Step 2620/4000 | Loss: 1.9408 | LR: 0.000007 | TPS: 1714 | 12524s
Step 2630/4000 | Loss: 2.1992 | LR: 0.000007 | TPS: 1714 | 12571s
Step 2640/4000 | Loss: 2.0929 | LR: 0.000007 | TPS: 1714 | 12619s
Step 2650/4000 | Loss: 1.9728 | LR: 0.000007 | TPS: 1714 | 12666s
Step 2660/4000 | Loss: 1.8369 | LR: 0.000007 | TPS: 1714 | 12713s
Step 2670/4000 | Loss: 1.9926 | LR: 0.000007 | TPS: 1714 | 12760s
Step 2680/4000 | Loss: 2.0414 | LR: 0.000007 | TPS: 1714 | 12807s
Step 2690/4000 | Loss: 2.1368 | LR: 0.000007 | TPS: 1714 | 12854s
Step 2700/4000 | Loss: 2.0254 | LR: 0.000007 | TPS: 1714 | 12901s
Step 2710/4000 | Loss: 2.1572 | LR: 0.000007 | TPS: 1715 | 12948s
Step 2720/4000 | Loss: 2.0418 | LR: 0.000007 | TPS: 1715 | 12996s
Step 2730/4000 | Loss: 2.1235 | LR: 0.000007 | TPS: 1715 | 13043s
Step 2740/4000 | Loss: 2.0756 | LR: 0.000006 | TPS: 1715 | 13090s
Step 2750/4000 | Loss: 2.1417 | LR: 0.000006 | TPS: 1715 | 13137s
Step 2760/4000 | Loss: 1.9427 | LR: 0.000006 | TPS: 1715 | 13184s
Step 2770/4000 | Loss: 2.1166 | LR: 0.000006 | TPS: 1715 | 13231s
Step 2780/4000 | Loss: 1.9711 | LR: 0.000006 | TPS: 1715 | 13278s
Step 2790/4000 | Loss: 2.1390 | LR: 0.000006 | TPS: 1715 | 13326s
Step 2800/4000 | Loss: 2.0557 | LR: 0.000006 | TPS: 1715 | 13373s
📊 Val loss: 2.1839
Step 2810/4000 | Loss: 2.0581 | LR: 0.000006 | TPS: 1715 | 13425s
Step 2820/4000 | Loss: 2.1139 | LR: 0.000006 | TPS: 1715 | 13473s
Step 2830/4000 | Loss: 2.1228 | LR: 0.000006 | TPS: 1715 | 13520s
Step 2840/4000 | Loss: 1.9685 | LR: 0.000006 | TPS: 1715 | 13567s
Step 2850/4000 | Loss: 2.1206 | LR: 0.000006 | TPS: 1715 | 13614s
Step 2860/4000 | Loss: 2.1942 | LR: 0.000006 | TPS: 1715 | 13661s
Step 2870/4000 | Loss: 1.9068 | LR: 0.000006 | TPS: 1715 | 13708s
Step 2880/4000 | Loss: 2.2099 | LR: 0.000006 | TPS: 1715 | 13755s
Step 2890/4000 | Loss: 2.0948 | LR: 0.000006 | TPS: 1715 | 13803s
Step 2900/4000 | Loss: 2.0630 | LR: 0.000005 | TPS: 1715 | 13850s
Step 2910/4000 | Loss: 1.9867 | LR: 0.000005 | TPS: 1715 | 13897s
Step 2920/4000 | Loss: 2.0602 | LR: 0.000005 | TPS: 1715 | 13944s
Step 2930/4000 | Loss: 2.0163 | LR: 0.000005 | TPS: 1716 | 13991s
Step 2940/4000 | Loss: 2.0337 | LR: 0.000005 | TPS: 1716 | 14038s
Step 2950/4000 | Loss: 2.2476 | LR: 0.000005 | TPS: 1716 | 14085s
Step 2960/4000 | Loss: 2.0430 | LR: 0.000005 | TPS: 1716 | 14133s
Step 2970/4000 | Loss: 2.3037 | LR: 0.000005 | TPS: 1716 | 14180s
Step 2980/4000 | Loss: 2.0831 | LR: 0.000005 | TPS: 1716 | 14227s
Step 2990/4000 | Loss: 2.1781 | LR: 0.000005 | TPS: 1716 | 14274s
Step 3000/4000 | Loss: 2.0784 | LR: 0.000005 | TPS: 1716 | 14321s
🔤 Generation samples (step 3000):
[EN] The city of Paris is a metropolitan area in Europe, consisting of 57 counties. Its main cities include Lyons, Bordeaux and Valence.
[HE] איטליה.
[AR] باريس.
[FA] پاریس پایتخت کشور فرانسه و یکی از شهرهای بزرگ این کشور است. شهر پاریس در شمال غربی قاره اروپا قرار دارد.
[TRANSLATE] You are the first one in the world to learn how to think.
Step 3010/4000 | Loss: 2.1244 | LR: 0.000005 | TPS: 1716 | 14370s
Step 3020/4000 | Loss: 2.1107 | LR: 0.000005 | TPS: 1716 | 14417s
Step 3030/4000 | Loss: 2.3589 | LR: 0.000005 | TPS: 1716 | 14464s
Step 3040/4000 | Loss: 2.0592 | LR: 0.000005 | TPS: 1716 | 14511s
Step 3050/4000 | Loss: 2.0730 | LR: 0.000005 | TPS: 1716 | 14559s
Step 3060/4000 | Loss: 2.1365 | LR: 0.000005 | TPS: 1716 | 14606s
Step 3070/4000 | Loss: 1.9819 | LR: 0.000005 | TPS: 1716 | 14653s
Step 3080/4000 | Loss: 2.2175 | LR: 0.000004 | TPS: 1716 | 14700s
Step 3090/4000 | Loss: 2.1442 | LR: 0.000004 | TPS: 1716 | 14747s
Step 3100/4000 | Loss: 2.0811 | LR: 0.000004 | TPS: 1717 | 14794s
Step 3110/4000 | Loss: 2.1427 | LR: 0.000004 | TPS: 1717 | 14841s
Step 3120/4000 | Loss: 2.1722 | LR: 0.000004 | TPS: 1717 | 14889s
Step 3130/4000 | Loss: 2.0577 | LR: 0.000004 | TPS: 1717 | 14936s
Step 3140/4000 | Loss: 2.0873 | LR: 0.000004 | TPS: 1717 | 14983s
Step 3150/4000 | Loss: 2.2920 | LR: 0.000004 | TPS: 1717 | 15030s
Step 3160/4000 | Loss: 1.8839 | LR: 0.000004 | TPS: 1717 | 15077s
Step 3170/4000 | Loss: 2.0144 | LR: 0.000004 | TPS: 1717 | 15124s
Step 3180/4000 | Loss: 1.9689 | LR: 0.000004 | TPS: 1717 | 15171s
Step 3190/4000 | Loss: 2.2123 | LR: 0.000004 | TPS: 1717 | 15219s
Step 3200/4000 | Loss: 2.0510 | LR: 0.000004 | TPS: 1717 | 15266s
📊 Val loss: 2.1269
Step 3210/4000 | Loss: 2.4087 | LR: 0.000004 | TPS: 1717 | 15318s
Step 3220/4000 | Loss: 2.2608 | LR: 0.000004 | TPS: 1717 | 15365s
Step 3230/4000 | Loss: 2.1930 | LR: 0.000004 | TPS: 1717 | 15413s
Step 3240/4000 | Loss: 2.0713 | LR: 0.000004 | TPS: 1717 | 15460s
Step 3250/4000 | Loss: 2.2660 | LR: 0.000004 | TPS: 1717 | 15507s
Step 3260/4000 | Loss: 1.9479 | LR: 0.000004 | TPS: 1717 | 15554s
Step 3270/4000 | Loss: 1.9657 | LR: 0.000004 | TPS: 1717 | 15601s
Step 3280/4000 | Loss: 2.1884 | LR: 0.000004 | TPS: 1717 | 15648s
Step 3290/4000 | Loss: 2.0927 | LR: 0.000004 | TPS: 1717 | 15695s
Step 3300/4000 | Loss: 2.0393 | LR: 0.000003 | TPS: 1717 | 15743s
Step 3310/4000 | Loss: 2.1302 | LR: 0.000003 | TPS: 1717 | 15790s
Step 3320/4000 | Loss: 2.0059 | LR: 0.000003 | TPS: 1717 | 15837s
Step 3330/4000 | Loss: 1.8687 | LR: 0.000003 | TPS: 1717 | 15884s
Step 3340/4000 | Loss: 2.0293 | LR: 0.000003 | TPS: 1717 | 15931s
Step 3350/4000 | Loss: 2.1500 | LR: 0.000003 | TPS: 1718 | 15978s
Step 3360/4000 | Loss: 1.9667 | LR: 0.000003 | TPS: 1718 | 16025s
Step 3370/4000 | Loss: 2.1206 | LR: 0.000003 | TPS: 1718 | 16073s
Step 3380/4000 | Loss: 2.3028 | LR: 0.000003 | TPS: 1718 | 16120s
Step 3390/4000 | Loss: 2.0075 | LR: 0.000003 | TPS: 1718 | 16167s
Step 3400/4000 | Loss: 2.0562 | LR: 0.000003 | TPS: 1718 | 16214s
Step 3410/4000 | Loss: 1.9977 | LR: 0.000003 | TPS: 1718 | 16261s
Step 3420/4000 | Loss: 2.1680 | LR: 0.000003 | TPS: 1718 | 16308s
Step 3430/4000 | Loss: 2.0009 | LR: 0.000003 | TPS: 1718 | 16355s
Step 3440/4000 | Loss: 1.8301 | LR: 0.000003 | TPS: 1718 | 16403s
Step 3450/4000 | Loss: 2.0239 | LR: 0.000003 | TPS: 1718 | 16450s
Step 3460/4000 | Loss: 2.0535 | LR: 0.000003 | TPS: 1718 | 16497s
Step 3470/4000 | Loss: 2.1348 | LR: 0.000003 | TPS: 1718 | 16544s
Step 3480/4000 | Loss: 2.0337 | LR: 0.000003 | TPS: 1718 | 16591s
Step 3490/4000 | Loss: 1.9342 | LR: 0.000003 | TPS: 1718 | 16638s
Step 3500/4000 | Loss: 2.0052 | LR: 0.000003 | TPS: 1718 | 16685s
Step 3510/4000 | Loss: 1.9902 | LR: 0.000003 | TPS: 1718 | 16732s
Step 3520/4000 | Loss: 2.1567 | LR: 0.000003 | TPS: 1719 | 16780s
Step 3530/4000 | Loss: 2.0515 | LR: 0.000003 | TPS: 1719 | 16827s
Step 3540/4000 | Loss: 2.1572 | LR: 0.000003 | TPS: 1719 | 16874s
Step 3550/4000 | Loss: 2.1381 | LR: 0.000003 | TPS: 1719 | 16921s
Step 3560/4000 | Loss: 2.0383 | LR: 0.000003 | TPS: 1719 | 16968s
Step 3570/4000 | Loss: 2.3566 | LR: 0.000003 | TPS: 1719 | 17015s
Step 3580/4000 | Loss: 1.9773 | LR: 0.000003 | TPS: 1719 | 17062s
Step 3590/4000 | Loss: 2.0418 | LR: 0.000003 | TPS: 1719 | 17110s
Step 3600/4000 | Loss: 2.1756 | LR: 0.000002 | TPS: 1719 | 17157s
📊 Val loss: 2.1478
Step 3610/4000 | Loss: 2.0761 | LR: 0.000002 | TPS: 1718 | 17209s
Step 3620/4000 | Loss: 2.1353 | LR: 0.000002 | TPS: 1718 | 17257s
Step 3630/4000 | Loss: 2.1856 | LR: 0.000002 | TPS: 1719 | 17304s
Step 3640/4000 | Loss: 2.1298 | LR: 0.000002 | TPS: 1719 | 17351s
Step 3650/4000 | Loss: 2.0784 | LR: 0.000002 | TPS: 1719 | 17398s
Step 3660/4000 | Loss: 2.0533 | LR: 0.000002 | TPS: 1719 | 17445s
Step 3670/4000 | Loss: 2.2151 | LR: 0.000002 | TPS: 1719 | 17492s
Step 3680/4000 | Loss: 2.0177 | LR: 0.000002 | TPS: 1719 | 17539s
Step 3690/4000 | Loss: 2.1048 | LR: 0.000002 | TPS: 1719 | 17587s
Step 3700/4000 | Loss: 2.0629 | LR: 0.000002 | TPS: 1719 | 17634s
Step 3710/4000 | Loss: 2.0375 | LR: 0.000002 | TPS: 1719 | 17681s
Step 3720/4000 | Loss: 2.2282 | LR: 0.000002 | TPS: 1719 | 17728s
Step 3730/4000 | Loss: 2.2049 | LR: 0.000002 | TPS: 1719 | 17775s
Step 3740/4000 | Loss: 2.0247 | LR: 0.000002 | TPS: 1719 | 17822s
Step 3750/4000 | Loss: 2.0337 | LR: 0.000002 | TPS: 1719 | 17869s
Step 3760/4000 | Loss: 2.0922 | LR: 0.000002 | TPS: 1719 | 17917s
Step 3770/4000 | Loss: 2.1018 | LR: 0.000002 | TPS: 1719 | 17964s
Step 3780/4000 | Loss: 2.1183 | LR: 0.000002 | TPS: 1719 | 18011s
Step 3790/4000 | Loss: 2.2469 | LR: 0.000002 | TPS: 1719 | 18058s
Step 3800/4000 | Loss: 2.1373 | LR: 0.000002 | TPS: 1719 | 18105s
Step 3810/4000 | Loss: 2.1103 | LR: 0.000002 | TPS: 1719 | 18152s
Step 3820/4000 | Loss: 2.0317 | LR: 0.000002 | TPS: 1719 | 18199s
Step 3830/4000 | Loss: 2.0022 | LR: 0.000002 | TPS: 1720 | 18247s
Step 3840/4000 | Loss: 2.1618 | LR: 0.000002 | TPS: 1720 | 18294s
Step 3850/4000 | Loss: 2.1421 | LR: 0.000002 | TPS: 1720 | 18341s
Step 3860/4000 | Loss: 1.9279 | LR: 0.000002 | TPS: 1720 | 18388s
Step 3870/4000 | Loss: 2.1657 | LR: 0.000002 | TPS: 1720 | 18435s
Step 3880/4000 | Loss: 2.1433 | LR: 0.000002 | TPS: 1720 | 18482s
Step 3890/4000 | Loss: 2.0893 | LR: 0.000002 | TPS: 1720 | 18529s
Step 3900/4000 | Loss: 2.0036 | LR: 0.000002 | TPS: 1720 | 18576s
Step 3910/4000 | Loss: 2.0691 | LR: 0.000002 | TPS: 1720 | 18624s
Step 3920/4000 | Loss: 2.0282 | LR: 0.000002 | TPS: 1720 | 18671s
Step 3930/4000 | Loss: 1.9818 | LR: 0.000002 | TPS: 1720 | 18718s
Step 3940/4000 | Loss: 2.1466 | LR: 0.000002 | TPS: 1720 | 18765s
Step 3950/4000 | Loss: 2.0455 | LR: 0.000002 | TPS: 1720 | 18812s
Step 3960/4000 | Loss: 2.1226 | LR: 0.000002 | TPS: 1720 | 18859s
Step 3970/4000 | Loss: 1.9890 | LR: 0.000002 | TPS: 1720 | 18906s
Step 3980/4000 | Loss: 2.1891 | LR: 0.000002 | TPS: 1720 | 18954s
Step 3990/4000 | Loss: 1.8920 | LR: 0.000002 | TPS: 1720 | 19001s
Step 4000/4000 | Loss: 2.0073 | LR: 0.000002 | TPS: 1720 | 19048s
📊 Val loss: 2.1472
🔤 Generation samples (step 4000):
[EN] The capital of France consists of 38 cities, 26.9% (14) of which are in the metropolitan area.
[HE] צרפת היא אחת מיעדי התיירות הפופולאריים ביותר בעולם, בשל היותה מוקד משיכה תיירותי משמעותי עבור תיירים מכל רחבי העולם. העיר בנויה משני חלקים עיקריים - כיכר ד'ארסאן (Droite Sud) ורחוב ד'ארסאן (De La Roch
[AR] باريس.
[FA] پاریس شهری بزرگ و تاریخی در شمال غربی اروپا است.
[TRANSLATE] It’s very short.
============================================================
SFT TRAINING COMPLETE
Steps: 4000, Time: 19057s (317.6min)
Best val loss: 2.1164
Model saved to: /tmp/sft/sft_model_v2.pt
============================================================
Uploading to S3...