2026-04-16 19:52:41,835 INFO use char tokenizer 2026-04-16 19:52:41,835 INFO training on multiple gpus, this gpu 0, rank 0, world_size 1 2026-04-16 19:52:42,250 INFO [Rank 0] Checkpoint: save to checkpoint exp/conformer_mongolian/init.pt 2026-04-16 19:52:42,365 INFO Epoch 0 Step 0 TRAIN info lr 8.0000e-08 rank 0 2026-04-16 19:52:42,383 INFO using accumulate grad, new batch size is 1 times larger than before Epoch 0 TRAIN: 0it [00:00, ?it/s]/home/rookie/Documents/batuka/projects/wenet/wenet/wenet/utils/train_utils.py:689: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. amp_autocast(enabled=dtype is not None, /home/rookie/Documents/batuka/projects/wenet/wenet/wenet/utils/train_utils.py:693: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. amp_autocast(enabled=scaler is not None), [rank0]:[W416 19:52:47.411323494 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) Epoch 0 TRAIN: 0it [00:05, ?it/s, epoch=0, grad_norm=926.298, loss=638.1240, lr=8.00e-08] Epoch 0 TRAIN: 1it [00:05, 5.26s/it, epoch=0, grad_norm=926.298, loss=638.1240, lr=8.00e-08] Epoch 0 TRAIN: 1it [00:05, 5.26s/it, epoch=0, grad_norm=992.981, loss=637.5255, lr=1.60e-07] Epoch 0 TRAIN: 1it [00:05, 5.26s/it, epoch=0, grad_norm=977.971, loss=683.0901, lr=2.40e-07] Epoch 0 TRAIN: 1it [00:05, 5.26s/it, epoch=0, grad_norm=996.669, loss=655.2043, lr=3.20e-07] Epoch 0 TRAIN: 1it [00:05, 5.26s/it, epoch=0, grad_norm=1068.044, loss=639.4346, lr=4.00e-07] Epoch 0 TRAIN: 5it [00:05, 1.12it/s, epoch=0, grad_norm=1068.044, loss=639.4346, lr=4.00e-07] Epoch 0 TRAIN: 5it [00:05, 1.12it/s, epoch=0, grad_norm=906.760, loss=605.3057, lr=4.80e-07] Epoch 0 TRAIN: 5it [00:06, 1.12it/s, epoch=0, grad_norm=1022.865, loss=611.7023, lr=5.60e-07] Epoch 0 TRAIN: 5it [00:06, 1.12it/s, epoch=0, grad_norm=910.587, loss=681.9362, lr=6.40e-07] Epoch 0 TRAIN: 5it [00:06, 1.12it/s, epoch=0, grad_norm=946.675, loss=652.0659, lr=7.20e-07] Epoch 0 TRAIN: 9it [00:06, 2.09it/s, epoch=0, grad_norm=946.675, loss=652.0659, lr=7.20e-07] Epoch 0 TRAIN: 9it [00:06, 2.09it/s, epoch=0, grad_norm=1130.374, loss=706.6662, lr=8.00e-07] Epoch 0 TRAIN: 9it [00:06, 2.09it/s, epoch=0, grad_norm=998.064, loss=650.3630, lr=8.80e-07] Epoch 0 TRAIN: 9it [00:06, 2.09it/s, epoch=0, grad_norm=988.106, loss=646.7837, lr=9.60e-07] Epoch 0 TRAIN: 9it [00:06, 2.09it/s, epoch=0, grad_norm=1193.629, loss=648.5864, lr=1.04e-06] Epoch 0 TRAIN: 13it [00:06, 3.09it/s, epoch=0, grad_norm=1193.629, loss=648.5864, lr=1.04e-06] Epoch 0 TRAIN: 13it [00:06, 3.09it/s, epoch=0, grad_norm=1023.404, loss=671.2255, lr=1.12e-06] Epoch 0 TRAIN: 13it [00:07, 3.09it/s, epoch=0, grad_norm=1263.773, loss=661.2456, lr=1.20e-06] Epoch 0 TRAIN: 13it [00:07, 3.09it/s, epoch=0, grad_norm=1190.967, loss=657.4905, lr=1.28e-06] Epoch 0 TRAIN: 13it [00:07, 3.09it/s, epoch=0, grad_norm=1121.275, loss=612.0612, lr=1.36e-06] Epoch 0 TRAIN: 17it [00:07, 3.99it/s, epoch=0, grad_norm=1121.275, loss=612.0612, lr=1.36e-06] Epoch 0 TRAIN: 17it [00:07, 3.99it/s, epoch=0, grad_norm=1514.131, loss=665.5478, lr=1.44e-06] Epoch 0 TRAIN: 17it [00:07, 3.99it/s, epoch=0, grad_norm=1602.619, loss=619.3444, lr=1.52e-06] Epoch 0 TRAIN: 17it [00:07, 3.99it/s, epoch=0, grad_norm=1565.864, loss=630.3460, lr=1.60e-06] Epoch 0 TRAIN: 17it [00:07, 3.99it/s, epoch=0, grad_norm=1720.920, loss=621.9006, lr=1.68e-06] Epoch 0 TRAIN: 21it [00:07, 4.79it/s, epoch=0, grad_norm=1720.920, loss=621.9006, lr=1.68e-06] Epoch 0 TRAIN: 21it [00:08, 4.79it/s, epoch=0, grad_norm=1584.154, loss=680.7437, lr=1.76e-06] Epoch 0 TRAIN: 21it [00:08, 4.79it/s, epoch=0, grad_norm=1658.726, loss=684.2361, lr=1.84e-06] Epoch 0 TRAIN: 21it [00:08, 4.79it/s, epoch=0, grad_norm=1586.082, loss=643.9550, lr=1.92e-06] Epoch 0 TRAIN: 21it [00:08, 4.79it/s, epoch=0, grad_norm=1671.008, loss=637.3904, lr=2.00e-06] Epoch 0 TRAIN: 25it [00:08, 5.47it/s, epoch=0, grad_norm=1671.008, loss=637.3904, lr=2.00e-06] Epoch 0 TRAIN: 25it [00:08, 5.47it/s, epoch=0, grad_norm=2115.991, loss=641.5280, lr=2.08e-06] Epoch 0 TRAIN: 25it [00:08, 5.47it/s, epoch=0, grad_norm=1686.839, loss=672.2903, lr=2.16e-06] Epoch 0 TRAIN: 25it [00:08, 5.47it/s, epoch=0, grad_norm=1857.319, loss=651.2874, lr=2.24e-06] Epoch 0 TRAIN: 25it [00:08, 5.47it/s, epoch=0, grad_norm=2173.224, loss=631.9431, lr=2.32e-06] Epoch 0 TRAIN: 29it [00:08, 6.04it/s, epoch=0, grad_norm=2173.224, loss=631.9431, lr=2.32e-06] Epoch 0 TRAIN: 29it [00:09, 6.04it/s, epoch=0, grad_norm=2082.918, loss=631.9880, lr=2.40e-06] Epoch 0 TRAIN: 29it [00:09, 6.04it/s, epoch=0, grad_norm=2223.663, loss=601.3704, lr=2.48e-06] Epoch 0 TRAIN: 29it [00:09, 6.04it/s, epoch=0, grad_norm=1928.032, loss=612.6840, lr=2.56e-06] Epoch 0 TRAIN: 29it [00:09, 6.04it/s, epoch=0, grad_norm=2489.374, loss=603.3690, lr=2.64e-06] Epoch 0 TRAIN: 33it [00:09, 6.46it/s, epoch=0, grad_norm=2489.374, loss=603.3690, lr=2.64e-06] Epoch 0 TRAIN: 33it [00:09, 6.46it/s, epoch=0, grad_norm=2687.037, loss=605.9614, lr=2.72e-06] Epoch 0 TRAIN: 33it [00:09, 6.46it/s, epoch=0, grad_norm=2411.289, loss=642.3756, lr=2.80e-06] Epoch 0 TRAIN: 33it [00:09, 6.46it/s, epoch=0, grad_norm=2224.177, loss=597.1509, lr=2.88e-06] Epoch 0 TRAIN: 33it [00:10, 6.46it/s, epoch=0, grad_norm=2423.902, loss=585.6821, lr=2.96e-06] Epoch 0 TRAIN: 37it [00:10, 6.75it/s, epoch=0, grad_norm=2423.902, loss=585.6821, lr=2.96e-06] Epoch 0 TRAIN: 37it [00:10, 6.75it/s, epoch=0, grad_norm=2116.568, loss=532.5958, lr=3.04e-06] Epoch 0 TRAIN: 37it [00:10, 6.75it/s, epoch=0, grad_norm=2429.042, loss=552.8181, lr=3.12e-06] Epoch 0 TRAIN: 37it [00:10, 6.75it/s, epoch=0, grad_norm=2214.772, loss=530.7015, lr=3.20e-06] Epoch 0 TRAIN: 37it [00:10, 6.75it/s, epoch=0, grad_norm=1900.028, loss=604.6144, lr=3.28e-06] Epoch 0 TRAIN: 41it [00:10, 6.96it/s, epoch=0, grad_norm=1900.028, loss=604.6144, lr=3.28e-06] Epoch 0 TRAIN: 41it [00:10, 6.96it/s, epoch=0, grad_norm=1978.599, loss=575.6720, lr=3.36e-06] Epoch 0 TRAIN: 41it [00:10, 6.96it/s, epoch=0, grad_norm=2026.278, loss=515.3434, lr=3.44e-06] Epoch 0 TRAIN: 41it [00:10, 6.96it/s, epoch=0, grad_norm=1779.731, loss=504.6479, lr=3.52e-06] Epoch 0 TRAIN: 41it [00:11, 6.96it/s, epoch=0, grad_norm=1719.675, loss=496.8148, lr=3.60e-06] Epoch 0 TRAIN: 45it [00:11, 7.14it/s, epoch=0, grad_norm=1719.675, loss=496.8148, lr=3.60e-06] Epoch 0 TRAIN: 45it [00:11, 7.14it/s, epoch=0, grad_norm=1726.525, loss=430.4977, lr=3.68e-06] Epoch 0 TRAIN: 45it [00:11, 7.14it/s, epoch=0, grad_norm=1488.317, loss=516.7393, lr=3.76e-06] Epoch 0 TRAIN: 45it [00:11, 7.14it/s, epoch=0, grad_norm=1077.220, loss=495.2209, lr=3.84e-06] Epoch 0 TRAIN: 45it [00:11, 7.14it/s, epoch=0, grad_norm=1130.226, loss=480.8752, lr=3.92e-06] Epoch 0 TRAIN: 49it [00:11, 7.18it/s, epoch=0, grad_norm=1130.226, loss=480.8752, lr=3.92e-06] Epoch 0 TRAIN: 49it [00:11, 7.18it/s, epoch=0, grad_norm=968.026, loss=487.2144, lr=4.00e-06] Epoch 0 TRAIN: 49it [00:11, 7.18it/s, epoch=0, grad_norm=629.367, loss=492.3438, lr=4.08e-06] Epoch 0 TRAIN: 49it [00:12, 7.18it/s, epoch=0, grad_norm=736.209, loss=479.8107, lr=4.16e-06] Epoch 0 TRAIN: 49it [00:12, 7.18it/s, epoch=0, grad_norm=582.512, loss=486.5223, lr=4.24e-06] Epoch 0 TRAIN: 53it [00:12, 7.34it/s, epoch=0, grad_norm=582.512, loss=486.5223, lr=4.24e-06] Epoch 0 TRAIN: 53it [00:12, 7.34it/s, epoch=0, grad_norm=587.407, loss=479.9575, lr=4.32e-06] Epoch 0 TRAIN: 53it [00:12, 7.34it/s, epoch=0, grad_norm=392.346, loss=515.6404, lr=4.40e-06] Epoch 0 TRAIN: 53it [00:12, 7.34it/s, epoch=0, grad_norm=450.239, loss=464.7114, lr=4.48e-06] Epoch 0 TRAIN: 53it [00:12, 7.34it/s, epoch=0, grad_norm=492.085, loss=468.1231, lr=4.56e-06] Epoch 0 TRAIN: 57it [00:12, 7.42it/s, epoch=0, grad_norm=492.085, loss=468.1231, lr=4.56e-06] Epoch 0 TRAIN: 57it [00:12, 7.42it/s, epoch=0, grad_norm=337.314, loss=381.2465, lr=4.64e-06] Epoch 0 TRAIN: 57it [00:12, 7.42it/s, epoch=0, grad_norm=402.237, loss=488.9800, lr=4.72e-06] Epoch 0 TRAIN: 57it [00:13, 7.42it/s, epoch=0, grad_norm=396.961, loss=418.4325, lr=4.80e-06] Epoch 0 TRAIN: 57it [00:13, 7.42it/s, epoch=0, grad_norm=329.547, loss=489.9693, lr=4.88e-06] Epoch 0 TRAIN: 61it [00:13, 7.43it/s, epoch=0, grad_norm=329.547, loss=489.9693, lr=4.88e-06] Epoch 0 TRAIN: 61it [00:13, 7.43it/s, epoch=0, grad_norm=343.415, loss=482.5835, lr=4.96e-06] Epoch 0 TRAIN: 61it [00:13, 7.43it/s, epoch=0, grad_norm=362.787, loss=458.0842, lr=5.04e-06] Epoch 0 TRAIN: 61it [00:13, 7.43it/s, epoch=0, grad_norm=476.676, loss=525.5262, lr=5.12e-06] Epoch 0 TRAIN: 61it [00:13, 7.43it/s, epoch=0, grad_norm=212.825, loss=406.7229, lr=5.20e-06] Epoch 0 TRAIN: 65it [00:13, 7.44it/s, epoch=0, grad_norm=212.825, loss=406.7229, lr=5.20e-06] Epoch 0 TRAIN: 65it [00:13, 7.44it/s, epoch=0, grad_norm=297.438, loss=461.5645, lr=5.28e-06] Epoch 0 TRAIN: 65it [00:14, 7.44it/s, epoch=0, grad_norm=237.144, loss=359.6244, lr=5.36e-06] Epoch 0 TRAIN: 65it [00:14, 7.44it/s, epoch=0, grad_norm=267.661, loss=485.2396, lr=5.44e-06] Epoch 0 TRAIN: 65it [00:14, 7.44it/s, epoch=0, grad_norm=399.563, loss=508.1697, lr=5.52e-06] Epoch 0 TRAIN: 69it [00:14, 7.34it/s, epoch=0, grad_norm=399.563, loss=508.1697, lr=5.52e-06] Epoch 0 TRAIN: 69it [00:14, 7.34it/s, epoch=0, grad_norm=354.242, loss=399.6199, lr=5.60e-06] Epoch 0 TRAIN: 69it [00:14, 7.34it/s, epoch=0, grad_norm=324.806, loss=436.9661, lr=5.68e-06] Epoch 0 TRAIN: 69it [00:14, 7.34it/s, epoch=0, grad_norm=283.265, loss=439.7288, lr=5.76e-06] Epoch 0 TRAIN: 69it [00:14, 7.34it/s, epoch=0, grad_norm=702.109, loss=359.9575, lr=5.84e-06] Epoch 0 TRAIN: 73it [00:14, 7.13it/s, epoch=0, grad_norm=702.109, loss=359.9575, lr=5.84e-06] Epoch 0 TRAIN: 73it [00:15, 7.13it/s, epoch=0, grad_norm=604.575, loss=394.9626, lr=5.92e-06] Epoch 0 TRAIN: 73it [00:15, 7.13it/s, epoch=0, grad_norm=251.583, loss=447.1705, lr=6.00e-06] Epoch 0 TRAIN: 73it [00:15, 7.13it/s, epoch=0, grad_norm=272.663, loss=476.9738, lr=6.08e-06] Epoch 0 TRAIN: 73it [00:15, 7.13it/s, epoch=0, grad_norm=369.053, loss=442.3559, lr=6.16e-06] Epoch 0 TRAIN: 77it [00:15, 7.20it/s, epoch=0, grad_norm=369.053, loss=442.3559, lr=6.16e-06] Epoch 0 TRAIN: 77it [00:15, 7.20it/s, epoch=0, grad_norm=177.709, loss=460.9892, lr=6.24e-06] Epoch 0 TRAIN: 77it [00:15, 7.20it/s, epoch=0, grad_norm=311.088, loss=439.9316, lr=6.32e-06] Epoch 0 TRAIN: 77it [00:15, 7.20it/s, epoch=0, grad_norm=228.235, loss=450.2321, lr=6.40e-06] Epoch 0 TRAIN: 77it [00:15, 7.20it/s, epoch=0, grad_norm=227.997, loss=502.1472, lr=6.48e-06] Epoch 0 TRAIN: 81it [00:16, 7.18it/s, epoch=0, grad_norm=227.997, loss=502.1472, lr=6.48e-06] Epoch 0 TRAIN: 81it [00:16, 7.18it/s, epoch=0, grad_norm=571.883, loss=367.5209, lr=6.56e-06] Epoch 0 TRAIN: 81it [00:16, 7.18it/s, epoch=0, grad_norm=246.524, loss=478.5785, lr=6.64e-06] Epoch 0 TRAIN: 81it [00:16, 7.18it/s, epoch=0, grad_norm=285.590, loss=434.1410, lr=6.72e-06] Epoch 0 TRAIN: 81it [00:16, 7.18it/s, epoch=0, grad_norm=244.235, loss=400.5124, lr=6.80e-06] Epoch 0 TRAIN: 85it [00:16, 6.98it/s, epoch=0, grad_norm=244.235, loss=400.5124, lr=6.80e-06] Epoch 0 TRAIN: 85it [00:16, 6.98it/s, epoch=0, grad_norm=197.852, loss=438.8212, lr=6.88e-06] Epoch 0 TRAIN: 85it [00:16, 6.98it/s, epoch=0, grad_norm=189.905, loss=444.6791, lr=6.96e-06] Epoch 0 TRAIN: 85it [00:17, 6.98it/s, epoch=0, grad_norm=304.052, loss=374.5923, lr=7.04e-06] Epoch 0 TRAIN: 85it [00:17, 6.98it/s, epoch=0, grad_norm=448.586, loss=354.1235, lr=7.12e-06] Epoch 0 TRAIN: 89it [00:17, 6.98it/s, epoch=0, grad_norm=448.586, loss=354.1235, lr=7.12e-06] Epoch 0 TRAIN: 89it [00:17, 6.98it/s, epoch=0, grad_norm=297.824, loss=409.8843, lr=7.20e-06] Epoch 0 TRAIN: 89it [00:17, 6.98it/s, epoch=0, grad_norm=265.790, loss=353.2458, lr=7.28e-06] Epoch 0 TRAIN: 89it [00:17, 6.98it/s, epoch=0, grad_norm=178.069, loss=377.3312, lr=7.36e-06] Epoch 0 TRAIN: 89it [00:17, 6.98it/s, epoch=0, grad_norm=207.597, loss=354.4006, lr=7.44e-06] Epoch 0 TRAIN: 93it [00:17, 7.10it/s, epoch=0, grad_norm=207.597, loss=354.4006, lr=7.44e-06] Epoch 0 TRAIN: 93it [00:17, 7.10it/s, epoch=0, grad_norm=278.059, loss=423.6377, lr=7.52e-06] Epoch 0 TRAIN: 93it [00:17, 7.10it/s, epoch=0, grad_norm=287.033, loss=346.7193, lr=7.60e-06] Epoch 0 TRAIN: 93it [00:18, 7.10it/s, epoch=0, grad_norm=284.286, loss=435.8151, lr=7.68e-06] Epoch 0 TRAIN: 93it [00:18, 7.10it/s, epoch=0, grad_norm=157.424, loss=347.6880, lr=7.76e-06] Epoch 0 TRAIN: 97it [00:18, 7.08it/s, epoch=0, grad_norm=157.424, loss=347.6880, lr=7.76e-06] Epoch 0 TRAIN: 97it [00:18, 7.08it/s, epoch=0, grad_norm=206.541, loss=439.0459, lr=7.84e-06] Epoch 0 TRAIN: 97it [00:18, 7.08it/s, epoch=0, grad_norm=211.181, loss=312.6534, lr=7.92e-06] Epoch 0 TRAIN: 97it [00:18, 7.08it/s, epoch=0, grad_norm=153.684, loss=381.5858, lr=8.00e-06]2026-04-16 19:53:01,132 DEBUG TRAIN | steps/sec 5.280| Batch 0/100 loss 381.585785 loss_att 371.019135 loss_ctc 406.241302 th_accuracy 0.051075 lr 8.0000e-06 grad_norm 153.683929 rank 0 Epoch 0 TRAIN: 97it [00:18, 7.08it/s, epoch=0, grad_norm=169.667, loss=400.3729, lr=8.08e-06] Epoch 0 TRAIN: 101it [00:18, 6.90it/s, epoch=0, grad_norm=169.667, loss=400.3729, lr=8.08e-06] Epoch 0 TRAIN: 101it [00:19, 6.90it/s, epoch=0, grad_norm=146.476, loss=371.9395, lr=8.16e-06] Epoch 0 TRAIN: 101it [00:19, 6.90it/s, epoch=0, grad_norm=158.290, loss=357.2529, lr=8.24e-06] Epoch 0 TRAIN: 101it [00:19, 6.90it/s, epoch=0, grad_norm=341.189, loss=533.7515, lr=8.32e-06] Epoch 0 TRAIN: 101it [00:19, 6.90it/s, epoch=0, grad_norm=237.057, loss=491.7017, lr=8.40e-06] Epoch 0 TRAIN: 105it [00:19, 7.02it/s, epoch=0, grad_norm=237.057, loss=491.7017, lr=8.40e-06] Epoch 0 TRAIN: 105it [00:19, 7.02it/s, epoch=0, grad_norm=273.167, loss=496.0121, lr=8.48e-06] Epoch 0 TRAIN: 105it [00:19, 7.02it/s, epoch=0, grad_norm=235.342, loss=464.7297, lr=8.56e-06] Epoch 0 TRAIN: 105it [00:19, 7.02it/s, epoch=0, grad_norm=569.803, loss=461.5146, lr=8.64e-06] Epoch 0 TRAIN: 105it [00:19, 7.02it/s, epoch=0, grad_norm=254.866, loss=442.5222, lr=8.72e-06] Epoch 0 TRAIN: 109it [00:19, 7.22it/s, epoch=0, grad_norm=254.866, loss=442.5222, lr=8.72e-06] Epoch 0 TRAIN: 109it [00:20, 7.22it/s, epoch=0, grad_norm=330.764, loss=440.5875, lr=8.80e-06] Epoch 0 TRAIN: 109it [00:20, 7.22it/s, epoch=0, grad_norm=163.950, loss=493.9410, lr=8.88e-06] Epoch 0 TRAIN: 109it [00:20, 7.22it/s, epoch=0, grad_norm=186.109, loss=483.6877, lr=8.96e-06] Epoch 0 TRAIN: 109it [00:20, 7.22it/s, epoch=0, grad_norm=158.826, loss=453.2689, lr=9.04e-06] Epoch 0 TRAIN: 113it [00:20, 6.99it/s, epoch=0, grad_norm=158.826, loss=453.2689, lr=9.04e-06] Epoch 0 TRAIN: 113it [00:20, 6.99it/s, epoch=0, grad_norm=226.001, loss=470.9177, lr=9.12e-06] Epoch 0 TRAIN: 113it [00:20, 6.99it/s, epoch=0, grad_norm=133.595, loss=494.8362, lr=9.20e-06] Epoch 0 TRAIN: 113it [00:20, 6.99it/s, epoch=0, grad_norm=156.068, loss=523.1016, lr=9.28e-06] Epoch 0 TRAIN: 113it [00:21, 6.99it/s, epoch=0, grad_norm=192.100, loss=561.2066, lr=9.36e-06] Epoch 0 TRAIN: 117it [00:21, 7.18it/s, epoch=0, grad_norm=192.100, loss=561.2066, lr=9.36e-06] Epoch 0 TRAIN: 117it [00:21, 7.18it/s, epoch=0, grad_norm=138.766, loss=470.1357, lr=9.44e-06] Epoch 0 TRAIN: 117it [00:21, 7.18it/s, epoch=0, grad_norm=222.032, loss=374.3672, lr=9.52e-06] Epoch 0 TRAIN: 117it [00:21, 7.18it/s, epoch=0, grad_norm=399.326, loss=316.7515, lr=9.60e-06] Epoch 0 TRAIN: 117it [00:21, 7.18it/s, epoch=0, grad_norm=246.843, loss=445.8704, lr=9.68e-06] Epoch 0 TRAIN: 121it [00:21, 7.06it/s, epoch=0, grad_norm=246.843, loss=445.8704, lr=9.68e-06] Epoch 0 TRAIN: 121it [00:21, 7.06it/s, epoch=0, grad_norm=225.469, loss=402.3617, lr=9.76e-06] Epoch 0 TRAIN: 121it [00:21, 7.06it/s, epoch=0, grad_norm=137.632, loss=440.3280, lr=9.84e-06] Epoch 0 TRAIN: 121it [00:22, 7.06it/s, epoch=0, grad_norm=187.080, loss=436.8904, lr=9.92e-06] Epoch 0 TRAIN: 121it [00:22, 7.06it/s, epoch=0, grad_norm=147.266, loss=396.0627, lr=1.00e-05] Epoch 0 TRAIN: 121it [00:22, 7.06it/s, epoch=0, grad_norm=319.739, loss=536.9909, lr=1.01e-05] Epoch 0 TRAIN: 126it [00:22, 7.32it/s, epoch=0, grad_norm=319.739, loss=536.9909, lr=1.01e-05] Epoch 0 TRAIN: 126it [00:22, 7.32it/s, epoch=0, grad_norm=163.436, loss=445.7943, lr=1.02e-05] Epoch 0 TRAIN: 126it [00:22, 7.32it/s, epoch=0, grad_norm=218.851, loss=469.9691, lr=1.02e-05] Epoch 0 TRAIN: 126it [00:22, 7.32it/s, epoch=0, grad_norm=141.942, loss=362.0862, lr=1.03e-05] Epoch 0 TRAIN: 126it [00:22, 7.32it/s, epoch=0, grad_norm=202.847, loss=526.8591, lr=1.04e-05] Epoch 0 TRAIN: 126it [00:22, 7.32it/s, epoch=0, grad_norm=179.164, loss=474.3831, lr=1.05e-05] Epoch 0 TRAIN: 131it [00:22, 7.67it/s, epoch=0, grad_norm=179.164, loss=474.3831, lr=1.05e-05] Epoch 0 TRAIN: 131it [00:23, 7.67it/s, epoch=0, grad_norm=167.620, loss=503.6517, lr=1.06e-05] Epoch 0 TRAIN: 132it [00:23, 5.72it/s, epoch=0, grad_norm=167.620, loss=503.6517, lr=1.06e-05] CV: 0it [00:00, ?it/s]/home/rookie/Documents/batuka/projects/wenet/wenet/wenet/utils/train_utils.py:689: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. amp_autocast(enabled=dtype is not None, CV: 1it [00:01, 1.20s/it] 2026-04-16 19:53:06,766 INFO Epoch 0 Step 132 CV info lr 1.0560e-05 cv_loss 493.0354031649503 rank 0 acc 0.08543461126585801 2026-04-16 19:53:06,768 INFO [Rank 0] Checkpoint: save to checkpoint exp/conformer_mongolian/epoch_0.pt 2026-04-16 19:53:06,881 INFO Epoch 1 Step 132 TRAIN info lr 1.0560e-05 rank 0 2026-04-16 19:53:06,894 INFO using accumulate grad, new batch size is 1 times larger than before Epoch 1 TRAIN: 0%| | 0/132 [00:00 _warnings.warn(warn_message, ResourceWarning)