| Device: cuda |
| Loading tokenizer: /tmp/eval/multilingual_32k.model |
| Loading base model: /tmp/eval/best_model.pt |
| Model loaded: 3.04B parameters |
| Loading SFT data from: /tmp/sft_data_v2 |
| Train: 3949348 tokens, Val: 201020 tokens |
| Using 8-bit AdamW (bitsandbytes) |
|
|
| Starting SFT training for 4000 steps... |
| Batch size: 1 x 4 accum = 4 effective, Seq len: 2048, LR: 2e-05 |
| Step 10/4000 | Loss: 2.3791 | LR: 0.000001 | TPS: 1196 | 68s |
| Step 20/4000 | Loss: 2.5346 | LR: 0.000002 | TPS: 1418 | 116s |
| Step 30/4000 | Loss: 2.7910 | LR: 0.000003 | TPS: 1511 | 163s |
| Step 40/4000 | Loss: 2.5189 | LR: 0.000004 | TPS: 1562 | 210s |
| Step 50/4000 | Loss: 2.5049 | LR: 0.000005 | TPS: 1594 | 257s |
| Step 60/4000 | Loss: 2.5417 | LR: 0.000006 | TPS: 1616 | 304s |
| Step 70/4000 | Loss: 2.2374 | LR: 0.000007 | TPS: 1633 | 351s |
| Step 80/4000 | Loss: 2.5328 | LR: 0.000008 | TPS: 1645 | 398s |
| Step 90/4000 | Loss: 2.5359 | LR: 0.000009 | TPS: 1655 | 445s |
| Step 100/4000 | Loss: 2.4830 | LR: 0.000010 | TPS: 1663 | 493s |
| Step 110/4000 | Loss: 2.3015 | LR: 0.000011 | TPS: 1669 | 540s |
| Step 120/4000 | Loss: 2.4667 | LR: 0.000012 | TPS: 1675 | 587s |
| Step 130/4000 | Loss: 2.3792 | LR: 0.000013 | TPS: 1680 | 634s |
| Step 140/4000 | Loss: 2.3918 | LR: 0.000014 | TPS: 1684 | 681s |
| Step 150/4000 | Loss: 2.3368 | LR: 0.000015 | TPS: 1687 | 728s |
| Step 160/4000 | Loss: 2.4838 | LR: 0.000016 | TPS: 1690 | 775s |
| Step 170/4000 | Loss: 2.3578 | LR: 0.000017 | TPS: 1693 | 823s |
| Step 180/4000 | Loss: 2.5485 | LR: 0.000018 | TPS: 1695 | 870s |
| Step 190/4000 | Loss: 2.0834 | LR: 0.000019 | TPS: 1698 | 917s |
| Step 200/4000 | Loss: 1.9784 | LR: 0.000020 | TPS: 1699 | 964s |
| Step 210/4000 | Loss: 2.4826 | LR: 0.000020 | TPS: 1701 | 1011s |
| Step 220/4000 | Loss: 2.3540 | LR: 0.000020 | TPS: 1703 | 1058s |
| Step 230/4000 | Loss: 2.2093 | LR: 0.000020 | TPS: 1704 | 1105s |
| Step 240/4000 | Loss: 2.2137 | LR: 0.000020 | TPS: 1706 | 1153s |
| Step 250/4000 | Loss: 2.2151 | LR: 0.000020 | TPS: 1707 | 1200s |
| Step 260/4000 | Loss: 2.2535 | LR: 0.000020 | TPS: 1708 | 1247s |
| Step 270/4000 | Loss: 2.2235 | LR: 0.000020 | TPS: 1709 | 1294s |
| Step 280/4000 | Loss: 2.0449 | LR: 0.000020 | TPS: 1710 | 1341s |
| Step 290/4000 | Loss: 2.1502 | LR: 0.000020 | TPS: 1711 | 1388s |
| Step 300/4000 | Loss: 2.3716 | LR: 0.000020 | TPS: 1712 | 1435s |
| Step 310/4000 | Loss: 2.1591 | LR: 0.000020 | TPS: 1713 | 1483s |
| Step 320/4000 | Loss: 2.2153 | LR: 0.000020 | TPS: 1714 | 1530s |
| Step 330/4000 | Loss: 2.2023 | LR: 0.000020 | TPS: 1714 | 1577s |
| Step 340/4000 | Loss: 2.3968 | LR: 0.000020 | TPS: 1715 | 1624s |
| Step 350/4000 | Loss: 2.1146 | LR: 0.000020 | TPS: 1716 | 1671s |
| Step 360/4000 | Loss: 2.1857 | LR: 0.000020 | TPS: 1716 | 1718s |
| Step 370/4000 | Loss: 2.1965 | LR: 0.000020 | TPS: 1717 | 1765s |
| Step 380/4000 | Loss: 2.1613 | LR: 0.000020 | TPS: 1717 | 1813s |
| Step 390/4000 | Loss: 2.3080 | LR: 0.000020 | TPS: 1718 | 1860s |
| Step 400/4000 | Loss: 2.2964 | LR: 0.000020 | TPS: 1718 | 1907s |
| 📊 Val loss: 2.2256 (NEW BEST!) |
| 💾 Best model saved to /tmp/sft/sft_model_v2.pt |
| Step 410/4000 | Loss: 2.2859 | LR: 0.000020 | TPS: 1703 | 1973s |
| Step 420/4000 | Loss: 2.1711 | LR: 0.000020 | TPS: 1703 | 2020s |
| Step 430/4000 | Loss: 2.1434 | LR: 0.000020 | TPS: 1704 | 2067s |
| Step 440/4000 | Loss: 2.2115 | LR: 0.000020 | TPS: 1705 | 2114s |
| Step 450/4000 | Loss: 2.2985 | LR: 0.000020 | TPS: 1706 | 2161s |
| Step 460/4000 | Loss: 1.9845 | LR: 0.000020 | TPS: 1707 | 2208s |
| Step 470/4000 | Loss: 2.3135 | LR: 0.000020 | TPS: 1707 | 2255s |
| Step 480/4000 | Loss: 2.3004 | LR: 0.000020 | TPS: 1708 | 2302s |
| Step 490/4000 | Loss: 2.1841 | LR: 0.000020 | TPS: 1709 | 2349s |
| Step 500/4000 | Loss: 2.3647 | LR: 0.000020 | TPS: 1709 | 2396s |
| Step 510/4000 | Loss: 2.1587 | LR: 0.000020 | TPS: 1710 | 2443s |
| Step 520/4000 | Loss: 2.0790 | LR: 0.000020 | TPS: 1711 | 2490s |
| Step 530/4000 | Loss: 2.0842 | LR: 0.000020 | TPS: 1711 | 2537s |
| Step 540/4000 | Loss: 2.4031 | LR: 0.000020 | TPS: 1712 | 2584s |
| Step 550/4000 | Loss: 2.3037 | LR: 0.000020 | TPS: 1712 | 2632s |
| Step 560/4000 | Loss: 2.2433 | LR: 0.000020 | TPS: 1713 | 2679s |
| Step 570/4000 | Loss: 2.1670 | LR: 0.000020 | TPS: 1713 | 2726s |
| Step 580/4000 | Loss: 2.1579 | LR: 0.000020 | TPS: 1714 | 2773s |
| Step 590/4000 | Loss: 1.9392 | LR: 0.000020 | TPS: 1714 | 2820s |
| Step 600/4000 | Loss: 2.1226 | LR: 0.000020 | TPS: 1715 | 2867s |
| Step 610/4000 | Loss: 2.2641 | LR: 0.000019 | TPS: 1715 | 2914s |
| Step 620/4000 | Loss: 2.0771 | LR: 0.000019 | TPS: 1715 | 2961s |
| Step 630/4000 | Loss: 2.4527 | LR: 0.000019 | TPS: 1716 | 3008s |
| Step 640/4000 | Loss: 2.2605 | LR: 0.000019 | TPS: 1716 | 3055s |
| Step 650/4000 | Loss: 1.9801 | LR: 0.000019 | TPS: 1717 | 3102s |
| Step 660/4000 | Loss: 2.4208 | LR: 0.000019 | TPS: 1717 | 3149s |
| Step 670/4000 | Loss: 2.3331 | LR: 0.000019 | TPS: 1717 | 3196s |
| Step 680/4000 | Loss: 2.1299 | LR: 0.000019 | TPS: 1718 | 3243s |
| Step 690/4000 | Loss: 2.1551 | LR: 0.000019 | TPS: 1718 | 3290s |
| Step 700/4000 | Loss: 2.0940 | LR: 0.000019 | TPS: 1718 | 3337s |
| Step 710/4000 | Loss: 2.0533 | LR: 0.000019 | TPS: 1719 | 3384s |
| Step 720/4000 | Loss: 2.2076 | LR: 0.000019 | TPS: 1719 | 3431s |
| Step 730/4000 | Loss: 1.9816 | LR: 0.000019 | TPS: 1719 | 3478s |
| Step 740/4000 | Loss: 2.1420 | LR: 0.000019 | TPS: 1719 | 3526s |
| Step 750/4000 | Loss: 2.2928 | LR: 0.000019 | TPS: 1720 | 3573s |
| Step 760/4000 | Loss: 2.1035 | LR: 0.000019 | TPS: 1720 | 3620s |
| Step 770/4000 | Loss: 2.1663 | LR: 0.000019 | TPS: 1720 | 3667s |
| Step 780/4000 | Loss: 2.2270 | LR: 0.000019 | TPS: 1721 | 3714s |
| Step 790/4000 | Loss: 2.1436 | LR: 0.000019 | TPS: 1721 | 3761s |
| Step 800/4000 | Loss: 2.3599 | LR: 0.000019 | TPS: 1721 | 3808s |
| 📊 Val loss: 2.1960 (NEW BEST!) |
| 💾 Best model saved to /tmp/sft/sft_model_v2.pt |
| Step 810/4000 | Loss: 2.2325 | LR: 0.000019 | TPS: 1696 | 3912s |
| Step 820/4000 | Loss: 2.0798 | LR: 0.000019 | TPS: 1696 | 3960s |
| Step 830/4000 | Loss: 2.1527 | LR: 0.000019 | TPS: 1697 | 4007s |
| Step 840/4000 | Loss: 2.2046 | LR: 0.000019 | TPS: 1697 | 4054s |
| Step 850/4000 | Loss: 2.0648 | LR: 0.000019 | TPS: 1698 | 4101s |
| Step 860/4000 | Loss: 2.1708 | LR: 0.000019 | TPS: 1698 | 4148s |
| Step 870/4000 | Loss: 2.3088 | LR: 0.000019 | TPS: 1699 | 4195s |
| Step 880/4000 | Loss: 1.9936 | LR: 0.000019 | TPS: 1699 | 4242s |
| Step 890/4000 | Loss: 2.1869 | LR: 0.000019 | TPS: 1700 | 4290s |
| Step 900/4000 | Loss: 2.4199 | LR: 0.000019 | TPS: 1700 | 4337s |
| Step 910/4000 | Loss: 2.3803 | LR: 0.000018 | TPS: 1700 | 4384s |
| Step 920/4000 | Loss: 2.0193 | LR: 0.000018 | TPS: 1701 | 4431s |
| Step 930/4000 | Loss: 2.1047 | LR: 0.000018 | TPS: 1701 | 4478s |
| Step 940/4000 | Loss: 2.1449 | LR: 0.000018 | TPS: 1702 | 4525s |
| Step 950/4000 | Loss: 2.1521 | LR: 0.000018 | TPS: 1702 | 4572s |
| Step 960/4000 | Loss: 2.2820 | LR: 0.000018 | TPS: 1702 | 4620s |
| Step 970/4000 | Loss: 2.2996 | LR: 0.000018 | TPS: 1703 | 4667s |
| Step 980/4000 | Loss: 2.3187 | LR: 0.000018 | TPS: 1703 | 4714s |
| Step 990/4000 | Loss: 2.1756 | LR: 0.000018 | TPS: 1703 | 4761s |
| Step 1000/4000 | Loss: 1.9765 | LR: 0.000018 | TPS: 1704 | 4808s |
|
|
| 🔤 Generation samples (step 1000): |
| [EN] The capital of France is located in Normandy. |
| [HE] מלזיה. |
| [AR] باريس. |
| [FA] پاریس یکی از شهرهای بزرگ و تاریخی جهان است که دارای جاذبه های طبیعی، فرهنگی و اقتصادی متعددی می باشد. شهر پاریس در غرب کشورمان قرار دارد و به عنوان یکی از مهم ترین مراکز تجاری و مالی دنیا شناخته شده ا |
| [TRANSLATE] "תודה על הכול, אבא. אני כאן איתך בכל רגע נתון." |
|
|
| Step 1010/4000 | Loss: 2.1665 | LR: 0.000018 | TPS: 1703 | 4859s |
| Step 1020/4000 | Loss: 2.1047 | LR: 0.000018 | TPS: 1703 | 4906s |
| Step 1030/4000 | Loss: 2.2359 | LR: 0.000018 | TPS: 1704 | 4953s |
| Step 1040/4000 | Loss: 2.0109 | LR: 0.000018 | TPS: 1704 | 5000s |
| Step 1050/4000 | Loss: 2.1515 | LR: 0.000018 | TPS: 1704 | 5047s |
| Step 1060/4000 | Loss: 2.0880 | LR: 0.000018 | TPS: 1705 | 5094s |
| Step 1070/4000 | Loss: 2.2460 | LR: 0.000018 | TPS: 1705 | 5142s |
| Step 1080/4000 | Loss: 1.9325 | LR: 0.000018 | TPS: 1705 | 5189s |
| Step 1090/4000 | Loss: 2.2283 | LR: 0.000018 | TPS: 1705 | 5236s |
| Step 1100/4000 | Loss: 2.3303 | LR: 0.000018 | TPS: 1706 | 5283s |
| Step 1110/4000 | Loss: 2.1772 | LR: 0.000018 | TPS: 1706 | 5330s |
| Step 1120/4000 | Loss: 2.1615 | LR: 0.000018 | TPS: 1706 | 5377s |
| Step 1130/4000 | Loss: 2.1470 | LR: 0.000017 | TPS: 1707 | 5424s |
| Step 1140/4000 | Loss: 1.9640 | LR: 0.000017 | TPS: 1707 | 5472s |
| Step 1150/4000 | Loss: 2.1891 | LR: 0.000017 | TPS: 1707 | 5519s |
| Step 1160/4000 | Loss: 2.2183 | LR: 0.000017 | TPS: 1707 | 5566s |
| Step 1170/4000 | Loss: 2.0268 | LR: 0.000017 | TPS: 1708 | 5613s |
| Step 1180/4000 | Loss: 2.2234 | LR: 0.000017 | TPS: 1708 | 5660s |
| Step 1190/4000 | Loss: 2.1961 | LR: 0.000017 | TPS: 1708 | 5707s |
| Step 1200/4000 | Loss: 2.2019 | LR: 0.000017 | TPS: 1708 | 5754s |
| 📊 Val loss: 2.2238 |
| Step 1210/4000 | Loss: 2.0809 | LR: 0.000017 | TPS: 1707 | 5807s |
| Step 1220/4000 | Loss: 2.1716 | LR: 0.000017 | TPS: 1707 | 5854s |
| Step 1230/4000 | Loss: 2.2607 | LR: 0.000017 | TPS: 1707 | 5901s |
| Step 1240/4000 | Loss: 2.1838 | LR: 0.000017 | TPS: 1708 | 5949s |
| Step 1250/4000 | Loss: 2.0725 | LR: 0.000017 | TPS: 1708 | 5996s |
| Step 1260/4000 | Loss: 2.2797 | LR: 0.000017 | TPS: 1708 | 6043s |
| Step 1270/4000 | Loss: 2.0366 | LR: 0.000017 | TPS: 1708 | 6090s |
| Step 1280/4000 | Loss: 2.1469 | LR: 0.000017 | TPS: 1709 | 6137s |
| Step 1290/4000 | Loss: 2.1541 | LR: 0.000017 | TPS: 1709 | 6184s |
| Step 1300/4000 | Loss: 2.0311 | LR: 0.000017 | TPS: 1709 | 6231s |
| Step 1310/4000 | Loss: 2.1828 | LR: 0.000016 | TPS: 1709 | 6279s |
| Step 1320/4000 | Loss: 2.2004 | LR: 0.000016 | TPS: 1709 | 6326s |
| Step 1330/4000 | Loss: 2.2589 | LR: 0.000016 | TPS: 1710 | 6373s |
| Step 1340/4000 | Loss: 2.1475 | LR: 0.000016 | TPS: 1710 | 6420s |
| Step 1350/4000 | Loss: 2.1672 | LR: 0.000016 | TPS: 1710 | 6467s |
| Step 1360/4000 | Loss: 2.1921 | LR: 0.000016 | TPS: 1710 | 6514s |
| Step 1370/4000 | Loss: 2.0689 | LR: 0.000016 | TPS: 1710 | 6561s |
| Step 1380/4000 | Loss: 2.2560 | LR: 0.000016 | TPS: 1711 | 6609s |
| Step 1390/4000 | Loss: 1.9519 | LR: 0.000016 | TPS: 1711 | 6656s |
| Step 1400/4000 | Loss: 1.9671 | LR: 0.000016 | TPS: 1711 | 6703s |
| Step 1410/4000 | Loss: 2.1535 | LR: 0.000016 | TPS: 1711 | 6750s |
| Step 1420/4000 | Loss: 2.1726 | LR: 0.000016 | TPS: 1711 | 6797s |
| Step 1430/4000 | Loss: 2.0854 | LR: 0.000016 | TPS: 1712 | 6844s |
| Step 1440/4000 | Loss: 2.0955 | LR: 0.000016 | TPS: 1712 | 6891s |
| Step 1450/4000 | Loss: 2.1260 | LR: 0.000016 | TPS: 1712 | 6939s |
| Step 1460/4000 | Loss: 2.2860 | LR: 0.000016 | TPS: 1712 | 6986s |
| Step 1470/4000 | Loss: 1.6098 | LR: 0.000015 | TPS: 1712 | 7033s |
| Step 1480/4000 | Loss: 2.1327 | LR: 0.000015 | TPS: 1712 | 7080s |
| Step 1490/4000 | Loss: 2.0506 | LR: 0.000015 | TPS: 1713 | 7127s |
| Step 1500/4000 | Loss: 2.0568 | LR: 0.000015 | TPS: 1713 | 7174s |
| Step 1510/4000 | Loss: 2.0177 | LR: 0.000015 | TPS: 1713 | 7221s |
| Step 1520/4000 | Loss: 2.0383 | LR: 0.000015 | TPS: 1713 | 7269s |
| Step 1530/4000 | Loss: 2.0994 | LR: 0.000015 | TPS: 1713 | 7316s |
| Step 1540/4000 | Loss: 2.0863 | LR: 0.000015 | TPS: 1713 | 7363s |
| Step 1550/4000 | Loss: 2.3287 | LR: 0.000015 | TPS: 1714 | 7410s |
| Step 1560/4000 | Loss: 2.1585 | LR: 0.000015 | TPS: 1714 | 7457s |
| Step 1570/4000 | Loss: 1.9781 | LR: 0.000015 | TPS: 1714 | 7504s |
| Step 1580/4000 | Loss: 1.9344 | LR: 0.000015 | TPS: 1714 | 7551s |
| Step 1590/4000 | Loss: 2.1031 | LR: 0.000015 | TPS: 1714 | 7599s |
| Step 1600/4000 | Loss: 2.2633 | LR: 0.000015 | TPS: 1714 | 7646s |
| 📊 Val loss: 2.1164 (NEW BEST!) |
| 💾 Best model saved to /tmp/sft/sft_model_v2.pt |
| Step 1610/4000 | Loss: 2.0217 | LR: 0.000015 | TPS: 1702 | 7750s |
| Step 1620/4000 | Loss: 2.0437 | LR: 0.000014 | TPS: 1702 | 7797s |
| Step 1630/4000 | Loss: 2.3588 | LR: 0.000014 | TPS: 1702 | 7844s |
| Step 1640/4000 | Loss: 2.1927 | LR: 0.000014 | TPS: 1702 | 7892s |
| Step 1650/4000 | Loss: 1.9298 | LR: 0.000014 | TPS: 1703 | 7939s |
| Step 1660/4000 | Loss: 2.1604 | LR: 0.000014 | TPS: 1703 | 7986s |
| Step 1670/4000 | Loss: 2.0326 | LR: 0.000014 | TPS: 1703 | 8033s |
| Step 1680/4000 | Loss: 2.1872 | LR: 0.000014 | TPS: 1703 | 8080s |
| Step 1690/4000 | Loss: 2.0633 | LR: 0.000014 | TPS: 1703 | 8127s |
| Step 1700/4000 | Loss: 2.2547 | LR: 0.000014 | TPS: 1704 | 8174s |
| Step 1710/4000 | Loss: 1.8940 | LR: 0.000014 | TPS: 1704 | 8221s |
| Step 1720/4000 | Loss: 2.0726 | LR: 0.000014 | TPS: 1704 | 8269s |
| Step 1730/4000 | Loss: 2.0857 | LR: 0.000014 | TPS: 1704 | 8316s |
| Step 1740/4000 | Loss: 2.0686 | LR: 0.000014 | TPS: 1704 | 8363s |
| Step 1750/4000 | Loss: 2.1306 | LR: 0.000014 | TPS: 1705 | 8410s |
| Step 1760/4000 | Loss: 2.0932 | LR: 0.000013 | TPS: 1705 | 8457s |
| Step 1770/4000 | Loss: 2.0751 | LR: 0.000013 | TPS: 1705 | 8504s |
| Step 1780/4000 | Loss: 2.1802 | LR: 0.000013 | TPS: 1705 | 8551s |
| Step 1790/4000 | Loss: 1.6657 | LR: 0.000013 | TPS: 1705 | 8599s |
| Step 1800/4000 | Loss: 2.1290 | LR: 0.000013 | TPS: 1706 | 8646s |
| Step 1810/4000 | Loss: 2.1032 | LR: 0.000013 | TPS: 1706 | 8693s |
| Step 1820/4000 | Loss: 2.1255 | LR: 0.000013 | TPS: 1706 | 8740s |
| Step 1830/4000 | Loss: 2.1091 | LR: 0.000013 | TPS: 1706 | 8787s |
| Step 1840/4000 | Loss: 1.9875 | LR: 0.000013 | TPS: 1706 | 8834s |
| Step 1850/4000 | Loss: 1.9615 | LR: 0.000013 | TPS: 1706 | 8881s |
| Step 1860/4000 | Loss: 2.0189 | LR: 0.000013 | TPS: 1707 | 8929s |
| Step 1870/4000 | Loss: 2.1387 | LR: 0.000013 | TPS: 1707 | 8976s |
| Step 1880/4000 | Loss: 2.0963 | LR: 0.000013 | TPS: 1707 | 9023s |
| Step 1890/4000 | Loss: 2.1750 | LR: 0.000013 | TPS: 1707 | 9070s |
| Step 1900/4000 | Loss: 2.3945 | LR: 0.000012 | TPS: 1707 | 9117s |
| Step 1910/4000 | Loss: 2.1515 | LR: 0.000012 | TPS: 1707 | 9164s |
| Step 1920/4000 | Loss: 2.2224 | LR: 0.000012 | TPS: 1708 | 9211s |
| Step 1930/4000 | Loss: 2.3160 | LR: 0.000012 | TPS: 1708 | 9259s |
| Step 1940/4000 | Loss: 2.0126 | LR: 0.000012 | TPS: 1708 | 9306s |
| Step 1950/4000 | Loss: 2.2443 | LR: 0.000012 | TPS: 1708 | 9353s |
| Step 1960/4000 | Loss: 1.9590 | LR: 0.000012 | TPS: 1708 | 9400s |
| Step 1970/4000 | Loss: 2.2280 | LR: 0.000012 | TPS: 1708 | 9447s |
| Step 1980/4000 | Loss: 1.9723 | LR: 0.000012 | TPS: 1708 | 9494s |
| Step 1990/4000 | Loss: 2.0697 | LR: 0.000012 | TPS: 1709 | 9541s |
| Step 2000/4000 | Loss: 2.0568 | LR: 0.000012 | TPS: 1709 | 9589s |
| 📊 Val loss: 2.1674 |
|
|
| 🔤 Generation samples (step 2000): |
| [EN] Paris (pronounced "Paris") is a city located in northeastern France. It borders Germany to the east, with Belgium and Luxembourg as its easternmost provinces. |
| [HE] בצרפת, העיר העתיקה היא אזור התיירות העיקרי. |
| [AR] باريس |
| [FA] پاریس، پایتخت کشور فرانسه است. |
| [TRANSLATE] The answer is YES. |
|
|
| Step 2010/4000 | Loss: 1.9474 | LR: 0.000012 | TPS: 1708 | 9643s |
| Step 2020/4000 | Loss: 2.1131 | LR: 0.000012 | TPS: 1708 | 9690s |
| Step 2030/4000 | Loss: 2.0446 | LR: 0.000012 | TPS: 1708 | 9737s |
| Step 2040/4000 | Loss: 2.2229 | LR: 0.000011 | TPS: 1708 | 9784s |
| Step 2050/4000 | Loss: 2.1576 | LR: 0.000011 | TPS: 1708 | 9832s |
| Step 2060/4000 | Loss: 2.1899 | LR: 0.000011 | TPS: 1708 | 9879s |
| Step 2070/4000 | Loss: 2.0957 | LR: 0.000011 | TPS: 1708 | 9926s |
| Step 2080/4000 | Loss: 2.2643 | LR: 0.000011 | TPS: 1709 | 9973s |
| Step 2090/4000 | Loss: 2.0676 | LR: 0.000011 | TPS: 1709 | 10020s |
| Step 2100/4000 | Loss: 2.1386 | LR: 0.000011 | TPS: 1709 | 10067s |
| Step 2110/4000 | Loss: 2.1891 | LR: 0.000011 | TPS: 1709 | 10114s |
| Step 2120/4000 | Loss: 1.9532 | LR: 0.000011 | TPS: 1709 | 10162s |
| Step 2130/4000 | Loss: 1.9766 | LR: 0.000011 | TPS: 1709 | 10209s |
| Step 2140/4000 | Loss: 2.3656 | LR: 0.000011 | TPS: 1709 | 10256s |
| Step 2150/4000 | Loss: 2.0545 | LR: 0.000011 | TPS: 1709 | 10303s |
| Step 2160/4000 | Loss: 1.9706 | LR: 0.000011 | TPS: 1710 | 10350s |
| Step 2170/4000 | Loss: 2.0302 | LR: 0.000010 | TPS: 1710 | 10397s |
| Step 2180/4000 | Loss: 2.1752 | LR: 0.000010 | TPS: 1710 | 10444s |
| Step 2190/4000 | Loss: 2.1455 | LR: 0.000010 | TPS: 1710 | 10492s |
| Step 2200/4000 | Loss: 2.2238 | LR: 0.000010 | TPS: 1710 | 10539s |
| Step 2210/4000 | Loss: 2.1010 | LR: 0.000010 | TPS: 1710 | 10586s |
| Step 2220/4000 | Loss: 2.1831 | LR: 0.000010 | TPS: 1710 | 10633s |
| Step 2230/4000 | Loss: 1.6542 | LR: 0.000010 | TPS: 1710 | 10680s |
| Step 2240/4000 | Loss: 2.1102 | LR: 0.000010 | TPS: 1711 | 10727s |
| Step 2250/4000 | Loss: 2.2099 | LR: 0.000010 | TPS: 1711 | 10774s |
| Step 2260/4000 | Loss: 2.1750 | LR: 0.000010 | TPS: 1711 | 10821s |
| Step 2270/4000 | Loss: 2.2369 | LR: 0.000010 | TPS: 1711 | 10869s |
| Step 2280/4000 | Loss: 2.0393 | LR: 0.000010 | TPS: 1711 | 10916s |
| Step 2290/4000 | Loss: 2.3140 | LR: 0.000010 | TPS: 1711 | 10963s |
| Step 2300/4000 | Loss: 2.0601 | LR: 0.000010 | TPS: 1711 | 11010s |
| Step 2310/4000 | Loss: 2.1472 | LR: 0.000009 | TPS: 1711 | 11057s |
| Step 2320/4000 | Loss: 2.0987 | LR: 0.000009 | TPS: 1712 | 11104s |
| Step 2330/4000 | Loss: 2.0354 | LR: 0.000009 | TPS: 1712 | 11152s |
| Step 2340/4000 | Loss: 1.9309 | LR: 0.000009 | TPS: 1712 | 11199s |
| Step 2350/4000 | Loss: 2.1222 | LR: 0.000009 | TPS: 1712 | 11246s |
| Step 2360/4000 | Loss: 1.9861 | LR: 0.000009 | TPS: 1712 | 11293s |
| Step 2370/4000 | Loss: 2.1986 | LR: 0.000009 | TPS: 1712 | 11340s |
| Step 2380/4000 | Loss: 2.0335 | LR: 0.000009 | TPS: 1712 | 11387s |
| Step 2390/4000 | Loss: 2.2123 | LR: 0.000009 | TPS: 1712 | 11434s |
| Step 2400/4000 | Loss: 2.0287 | LR: 0.000009 | TPS: 1712 | 11482s |
| 📊 Val loss: 2.1943 |
| Step 2410/4000 | Loss: 2.0483 | LR: 0.000009 | TPS: 1712 | 11534s |
| Step 2420/4000 | Loss: 2.0710 | LR: 0.000009 | TPS: 1712 | 11581s |
| Step 2430/4000 | Loss: 2.3005 | LR: 0.000009 | TPS: 1712 | 11629s |
| Step 2440/4000 | Loss: 2.0617 | LR: 0.000009 | TPS: 1712 | 11676s |
| Step 2450/4000 | Loss: 2.2063 | LR: 0.000008 | TPS: 1712 | 11723s |
| Step 2460/4000 | Loss: 2.0405 | LR: 0.000008 | TPS: 1712 | 11770s |
| Step 2470/4000 | Loss: 2.2280 | LR: 0.000008 | TPS: 1712 | 11817s |
| Step 2480/4000 | Loss: 2.3856 | LR: 0.000008 | TPS: 1712 | 11864s |
| Step 2490/4000 | Loss: 1.9853 | LR: 0.000008 | TPS: 1712 | 11911s |
| Step 2500/4000 | Loss: 2.0673 | LR: 0.000008 | TPS: 1713 | 11959s |
| Step 2510/4000 | Loss: 2.1777 | LR: 0.000008 | TPS: 1713 | 12006s |
| Step 2520/4000 | Loss: 1.9846 | LR: 0.000008 | TPS: 1713 | 12053s |
| Step 2530/4000 | Loss: 2.1922 | LR: 0.000008 | TPS: 1713 | 12100s |
| Step 2540/4000 | Loss: 2.0542 | LR: 0.000008 | TPS: 1713 | 12147s |
| Step 2550/4000 | Loss: 2.1041 | LR: 0.000008 | TPS: 1713 | 12194s |
| Step 2560/4000 | Loss: 2.0099 | LR: 0.000008 | TPS: 1713 | 12241s |
| Step 2570/4000 | Loss: 1.8186 | LR: 0.000008 | TPS: 1713 | 12289s |
| Step 2580/4000 | Loss: 2.2079 | LR: 0.000008 | TPS: 1713 | 12336s |
| Step 2590/4000 | Loss: 1.9931 | LR: 0.000007 | TPS: 1713 | 12383s |
| Step 2600/4000 | Loss: 2.0986 | LR: 0.000007 | TPS: 1714 | 12430s |
| Step 2610/4000 | Loss: 2.0439 | LR: 0.000007 | TPS: 1714 | 12477s |
| Step 2620/4000 | Loss: 1.9408 | LR: 0.000007 | TPS: 1714 | 12524s |
| Step 2630/4000 | Loss: 2.1992 | LR: 0.000007 | TPS: 1714 | 12571s |
| Step 2640/4000 | Loss: 2.0929 | LR: 0.000007 | TPS: 1714 | 12619s |
| Step 2650/4000 | Loss: 1.9728 | LR: 0.000007 | TPS: 1714 | 12666s |
| Step 2660/4000 | Loss: 1.8369 | LR: 0.000007 | TPS: 1714 | 12713s |
| Step 2670/4000 | Loss: 1.9926 | LR: 0.000007 | TPS: 1714 | 12760s |
| Step 2680/4000 | Loss: 2.0414 | LR: 0.000007 | TPS: 1714 | 12807s |
| Step 2690/4000 | Loss: 2.1368 | LR: 0.000007 | TPS: 1714 | 12854s |
| Step 2700/4000 | Loss: 2.0254 | LR: 0.000007 | TPS: 1714 | 12901s |
| Step 2710/4000 | Loss: 2.1572 | LR: 0.000007 | TPS: 1715 | 12948s |
| Step 2720/4000 | Loss: 2.0418 | LR: 0.000007 | TPS: 1715 | 12996s |
| Step 2730/4000 | Loss: 2.1235 | LR: 0.000007 | TPS: 1715 | 13043s |
| Step 2740/4000 | Loss: 2.0756 | LR: 0.000006 | TPS: 1715 | 13090s |
| Step 2750/4000 | Loss: 2.1417 | LR: 0.000006 | TPS: 1715 | 13137s |
| Step 2760/4000 | Loss: 1.9427 | LR: 0.000006 | TPS: 1715 | 13184s |
| Step 2770/4000 | Loss: 2.1166 | LR: 0.000006 | TPS: 1715 | 13231s |
| Step 2780/4000 | Loss: 1.9711 | LR: 0.000006 | TPS: 1715 | 13278s |
| Step 2790/4000 | Loss: 2.1390 | LR: 0.000006 | TPS: 1715 | 13326s |
| Step 2800/4000 | Loss: 2.0557 | LR: 0.000006 | TPS: 1715 | 13373s |
| 📊 Val loss: 2.1839 |
| Step 2810/4000 | Loss: 2.0581 | LR: 0.000006 | TPS: 1715 | 13425s |
| Step 2820/4000 | Loss: 2.1139 | LR: 0.000006 | TPS: 1715 | 13473s |
| Step 2830/4000 | Loss: 2.1228 | LR: 0.000006 | TPS: 1715 | 13520s |
| Step 2840/4000 | Loss: 1.9685 | LR: 0.000006 | TPS: 1715 | 13567s |
| Step 2850/4000 | Loss: 2.1206 | LR: 0.000006 | TPS: 1715 | 13614s |
| Step 2860/4000 | Loss: 2.1942 | LR: 0.000006 | TPS: 1715 | 13661s |
| Step 2870/4000 | Loss: 1.9068 | LR: 0.000006 | TPS: 1715 | 13708s |
| Step 2880/4000 | Loss: 2.2099 | LR: 0.000006 | TPS: 1715 | 13755s |
| Step 2890/4000 | Loss: 2.0948 | LR: 0.000006 | TPS: 1715 | 13803s |
| Step 2900/4000 | Loss: 2.0630 | LR: 0.000005 | TPS: 1715 | 13850s |
| Step 2910/4000 | Loss: 1.9867 | LR: 0.000005 | TPS: 1715 | 13897s |
| Step 2920/4000 | Loss: 2.0602 | LR: 0.000005 | TPS: 1715 | 13944s |
| Step 2930/4000 | Loss: 2.0163 | LR: 0.000005 | TPS: 1716 | 13991s |
| Step 2940/4000 | Loss: 2.0337 | LR: 0.000005 | TPS: 1716 | 14038s |
| Step 2950/4000 | Loss: 2.2476 | LR: 0.000005 | TPS: 1716 | 14085s |
| Step 2960/4000 | Loss: 2.0430 | LR: 0.000005 | TPS: 1716 | 14133s |
| Step 2970/4000 | Loss: 2.3037 | LR: 0.000005 | TPS: 1716 | 14180s |
| Step 2980/4000 | Loss: 2.0831 | LR: 0.000005 | TPS: 1716 | 14227s |
| Step 2990/4000 | Loss: 2.1781 | LR: 0.000005 | TPS: 1716 | 14274s |
| Step 3000/4000 | Loss: 2.0784 | LR: 0.000005 | TPS: 1716 | 14321s |
|
|
| 🔤 Generation samples (step 3000): |
| [EN] The city of Paris is a metropolitan area in Europe, consisting of 57 counties. Its main cities include Lyons, Bordeaux and Valence. |
| [HE] איטליה. |
| [AR] باريس. |
| [FA] پاریس پایتخت کشور فرانسه و یکی از شهرهای بزرگ این کشور است. شهر پاریس در شمال غربی قاره اروپا قرار دارد. |
| [TRANSLATE] You are the first one in the world to learn how to think. |
|
|
| Step 3010/4000 | Loss: 2.1244 | LR: 0.000005 | TPS: 1716 | 14370s |
| Step 3020/4000 | Loss: 2.1107 | LR: 0.000005 | TPS: 1716 | 14417s |
| Step 3030/4000 | Loss: 2.3589 | LR: 0.000005 | TPS: 1716 | 14464s |
| Step 3040/4000 | Loss: 2.0592 | LR: 0.000005 | TPS: 1716 | 14511s |
| Step 3050/4000 | Loss: 2.0730 | LR: 0.000005 | TPS: 1716 | 14559s |
| Step 3060/4000 | Loss: 2.1365 | LR: 0.000005 | TPS: 1716 | 14606s |
| Step 3070/4000 | Loss: 1.9819 | LR: 0.000005 | TPS: 1716 | 14653s |
| Step 3080/4000 | Loss: 2.2175 | LR: 0.000004 | TPS: 1716 | 14700s |
| Step 3090/4000 | Loss: 2.1442 | LR: 0.000004 | TPS: 1716 | 14747s |
| Step 3100/4000 | Loss: 2.0811 | LR: 0.000004 | TPS: 1717 | 14794s |
| Step 3110/4000 | Loss: 2.1427 | LR: 0.000004 | TPS: 1717 | 14841s |
| Step 3120/4000 | Loss: 2.1722 | LR: 0.000004 | TPS: 1717 | 14889s |
| Step 3130/4000 | Loss: 2.0577 | LR: 0.000004 | TPS: 1717 | 14936s |
| Step 3140/4000 | Loss: 2.0873 | LR: 0.000004 | TPS: 1717 | 14983s |
| Step 3150/4000 | Loss: 2.2920 | LR: 0.000004 | TPS: 1717 | 15030s |
| Step 3160/4000 | Loss: 1.8839 | LR: 0.000004 | TPS: 1717 | 15077s |
| Step 3170/4000 | Loss: 2.0144 | LR: 0.000004 | TPS: 1717 | 15124s |
| Step 3180/4000 | Loss: 1.9689 | LR: 0.000004 | TPS: 1717 | 15171s |
| Step 3190/4000 | Loss: 2.2123 | LR: 0.000004 | TPS: 1717 | 15219s |
| Step 3200/4000 | Loss: 2.0510 | LR: 0.000004 | TPS: 1717 | 15266s |
| 📊 Val loss: 2.1269 |
| Step 3210/4000 | Loss: 2.4087 | LR: 0.000004 | TPS: 1717 | 15318s |
| Step 3220/4000 | Loss: 2.2608 | LR: 0.000004 | TPS: 1717 | 15365s |
| Step 3230/4000 | Loss: 2.1930 | LR: 0.000004 | TPS: 1717 | 15413s |
| Step 3240/4000 | Loss: 2.0713 | LR: 0.000004 | TPS: 1717 | 15460s |
| Step 3250/4000 | Loss: 2.2660 | LR: 0.000004 | TPS: 1717 | 15507s |
| Step 3260/4000 | Loss: 1.9479 | LR: 0.000004 | TPS: 1717 | 15554s |
| Step 3270/4000 | Loss: 1.9657 | LR: 0.000004 | TPS: 1717 | 15601s |
| Step 3280/4000 | Loss: 2.1884 | LR: 0.000004 | TPS: 1717 | 15648s |
| Step 3290/4000 | Loss: 2.0927 | LR: 0.000004 | TPS: 1717 | 15695s |
| Step 3300/4000 | Loss: 2.0393 | LR: 0.000003 | TPS: 1717 | 15743s |
| Step 3310/4000 | Loss: 2.1302 | LR: 0.000003 | TPS: 1717 | 15790s |
| Step 3320/4000 | Loss: 2.0059 | LR: 0.000003 | TPS: 1717 | 15837s |
| Step 3330/4000 | Loss: 1.8687 | LR: 0.000003 | TPS: 1717 | 15884s |
| Step 3340/4000 | Loss: 2.0293 | LR: 0.000003 | TPS: 1717 | 15931s |
| Step 3350/4000 | Loss: 2.1500 | LR: 0.000003 | TPS: 1718 | 15978s |
| Step 3360/4000 | Loss: 1.9667 | LR: 0.000003 | TPS: 1718 | 16025s |
| Step 3370/4000 | Loss: 2.1206 | LR: 0.000003 | TPS: 1718 | 16073s |
| Step 3380/4000 | Loss: 2.3028 | LR: 0.000003 | TPS: 1718 | 16120s |
| Step 3390/4000 | Loss: 2.0075 | LR: 0.000003 | TPS: 1718 | 16167s |
| Step 3400/4000 | Loss: 2.0562 | LR: 0.000003 | TPS: 1718 | 16214s |
| Step 3410/4000 | Loss: 1.9977 | LR: 0.000003 | TPS: 1718 | 16261s |
| Step 3420/4000 | Loss: 2.1680 | LR: 0.000003 | TPS: 1718 | 16308s |
| Step 3430/4000 | Loss: 2.0009 | LR: 0.000003 | TPS: 1718 | 16355s |
| Step 3440/4000 | Loss: 1.8301 | LR: 0.000003 | TPS: 1718 | 16403s |
| Step 3450/4000 | Loss: 2.0239 | LR: 0.000003 | TPS: 1718 | 16450s |
| Step 3460/4000 | Loss: 2.0535 | LR: 0.000003 | TPS: 1718 | 16497s |
| Step 3470/4000 | Loss: 2.1348 | LR: 0.000003 | TPS: 1718 | 16544s |
| Step 3480/4000 | Loss: 2.0337 | LR: 0.000003 | TPS: 1718 | 16591s |
| Step 3490/4000 | Loss: 1.9342 | LR: 0.000003 | TPS: 1718 | 16638s |
| Step 3500/4000 | Loss: 2.0052 | LR: 0.000003 | TPS: 1718 | 16685s |
| Step 3510/4000 | Loss: 1.9902 | LR: 0.000003 | TPS: 1718 | 16732s |
| Step 3520/4000 | Loss: 2.1567 | LR: 0.000003 | TPS: 1719 | 16780s |
| Step 3530/4000 | Loss: 2.0515 | LR: 0.000003 | TPS: 1719 | 16827s |
| Step 3540/4000 | Loss: 2.1572 | LR: 0.000003 | TPS: 1719 | 16874s |
| Step 3550/4000 | Loss: 2.1381 | LR: 0.000003 | TPS: 1719 | 16921s |
| Step 3560/4000 | Loss: 2.0383 | LR: 0.000003 | TPS: 1719 | 16968s |
| Step 3570/4000 | Loss: 2.3566 | LR: 0.000003 | TPS: 1719 | 17015s |
| Step 3580/4000 | Loss: 1.9773 | LR: 0.000003 | TPS: 1719 | 17062s |
| Step 3590/4000 | Loss: 2.0418 | LR: 0.000003 | TPS: 1719 | 17110s |
| Step 3600/4000 | Loss: 2.1756 | LR: 0.000002 | TPS: 1719 | 17157s |
| 📊 Val loss: 2.1478 |
| Step 3610/4000 | Loss: 2.0761 | LR: 0.000002 | TPS: 1718 | 17209s |
| Step 3620/4000 | Loss: 2.1353 | LR: 0.000002 | TPS: 1718 | 17257s |
| Step 3630/4000 | Loss: 2.1856 | LR: 0.000002 | TPS: 1719 | 17304s |
| Step 3640/4000 | Loss: 2.1298 | LR: 0.000002 | TPS: 1719 | 17351s |
| Step 3650/4000 | Loss: 2.0784 | LR: 0.000002 | TPS: 1719 | 17398s |
| Step 3660/4000 | Loss: 2.0533 | LR: 0.000002 | TPS: 1719 | 17445s |
| Step 3670/4000 | Loss: 2.2151 | LR: 0.000002 | TPS: 1719 | 17492s |
| Step 3680/4000 | Loss: 2.0177 | LR: 0.000002 | TPS: 1719 | 17539s |
| Step 3690/4000 | Loss: 2.1048 | LR: 0.000002 | TPS: 1719 | 17587s |
| Step 3700/4000 | Loss: 2.0629 | LR: 0.000002 | TPS: 1719 | 17634s |
| Step 3710/4000 | Loss: 2.0375 | LR: 0.000002 | TPS: 1719 | 17681s |
| Step 3720/4000 | Loss: 2.2282 | LR: 0.000002 | TPS: 1719 | 17728s |
| Step 3730/4000 | Loss: 2.2049 | LR: 0.000002 | TPS: 1719 | 17775s |
| Step 3740/4000 | Loss: 2.0247 | LR: 0.000002 | TPS: 1719 | 17822s |
| Step 3750/4000 | Loss: 2.0337 | LR: 0.000002 | TPS: 1719 | 17869s |
| Step 3760/4000 | Loss: 2.0922 | LR: 0.000002 | TPS: 1719 | 17917s |
| Step 3770/4000 | Loss: 2.1018 | LR: 0.000002 | TPS: 1719 | 17964s |
| Step 3780/4000 | Loss: 2.1183 | LR: 0.000002 | TPS: 1719 | 18011s |
| Step 3790/4000 | Loss: 2.2469 | LR: 0.000002 | TPS: 1719 | 18058s |
| Step 3800/4000 | Loss: 2.1373 | LR: 0.000002 | TPS: 1719 | 18105s |
| Step 3810/4000 | Loss: 2.1103 | LR: 0.000002 | TPS: 1719 | 18152s |
| Step 3820/4000 | Loss: 2.0317 | LR: 0.000002 | TPS: 1719 | 18199s |
| Step 3830/4000 | Loss: 2.0022 | LR: 0.000002 | TPS: 1720 | 18247s |
| Step 3840/4000 | Loss: 2.1618 | LR: 0.000002 | TPS: 1720 | 18294s |
| Step 3850/4000 | Loss: 2.1421 | LR: 0.000002 | TPS: 1720 | 18341s |
| Step 3860/4000 | Loss: 1.9279 | LR: 0.000002 | TPS: 1720 | 18388s |
| Step 3870/4000 | Loss: 2.1657 | LR: 0.000002 | TPS: 1720 | 18435s |
| Step 3880/4000 | Loss: 2.1433 | LR: 0.000002 | TPS: 1720 | 18482s |
| Step 3890/4000 | Loss: 2.0893 | LR: 0.000002 | TPS: 1720 | 18529s |
| Step 3900/4000 | Loss: 2.0036 | LR: 0.000002 | TPS: 1720 | 18576s |
| Step 3910/4000 | Loss: 2.0691 | LR: 0.000002 | TPS: 1720 | 18624s |
| Step 3920/4000 | Loss: 2.0282 | LR: 0.000002 | TPS: 1720 | 18671s |
| Step 3930/4000 | Loss: 1.9818 | LR: 0.000002 | TPS: 1720 | 18718s |
| Step 3940/4000 | Loss: 2.1466 | LR: 0.000002 | TPS: 1720 | 18765s |
| Step 3950/4000 | Loss: 2.0455 | LR: 0.000002 | TPS: 1720 | 18812s |
| Step 3960/4000 | Loss: 2.1226 | LR: 0.000002 | TPS: 1720 | 18859s |
| Step 3970/4000 | Loss: 1.9890 | LR: 0.000002 | TPS: 1720 | 18906s |
| Step 3980/4000 | Loss: 2.1891 | LR: 0.000002 | TPS: 1720 | 18954s |
| Step 3990/4000 | Loss: 1.8920 | LR: 0.000002 | TPS: 1720 | 19001s |
| Step 4000/4000 | Loss: 2.0073 | LR: 0.000002 | TPS: 1720 | 19048s |
| 📊 Val loss: 2.1472 |
|
|
| 🔤 Generation samples (step 4000): |
| [EN] The capital of France consists of 38 cities, 26.9% (14) of which are in the metropolitan area. |
| [HE] צרפת היא אחת מיעדי התיירות הפופולאריים ביותר בעולם, בשל היותה מוקד משיכה תיירותי משמעותי עבור תיירים מכל רחבי העולם. העיר בנויה משני חלקים עיקריים - כיכר ד'ארסאן (Droite Sud) ורחוב ד'ארסאן (De La Roch |
| [AR] باريس. |
| [FA] پاریس شهری بزرگ و تاریخی در شمال غربی اروپا است. |
| [TRANSLATE] It’s very short. |
|
|
|
|
| ============================================================ |
| SFT TRAINING COMPLETE |
| Steps: 4000, Time: 19057s (317.6min) |
| Best val loss: 2.1164 |
| Model saved to: /tmp/sft/sft_model_v2.pt |
| ============================================================ |
| Uploading to S3... |
|
|