[14:35:51] Device: cuda [14:35:51] === Training pong ([32, 64, 128]) === [14:35:51] 2,018,278 parameters [14:35:51] Phase 1: 10 epochs single-step [14:35:51] 8568 sequences [14:36:00] P1 pong Epoch 1/10 | loss=0.14558 [14:36:08] P1 pong Epoch 2/10 | loss=0.10721 [14:36:17] P1 pong Epoch 3/10 | loss=0.09795 [14:36:25] P1 pong Epoch 4/10 | loss=0.08996 [14:36:33] P1 pong Epoch 5/10 | loss=0.08384 [14:36:41] P1 pong Epoch 6/10 | loss=0.07755 [14:36:49] P1 pong Epoch 7/10 | loss=0.06995 [14:36:57] P1 pong Epoch 8/10 | loss=0.06272 [14:37:05] P1 pong Epoch 9/10 | loss=0.05640 [14:37:13] P1 pong Epoch 10/10 | loss=0.05177 [14:37:13] Phase 2: 25 epochs graduated AR [14:37:37] P2 pong Epoch 1/25 (steps=2) | loss=0.09787 lr=0.000500 [14:37:59] P2 pong Epoch 2/25 (steps=2) | loss=0.08854 lr=0.000500 [14:38:21] P2 pong Epoch 3/25 (steps=2) | loss=0.08343 lr=0.000500 [14:39:15] P2 pong Epoch 4/25 (steps=4) | loss=0.13928 lr=0.000500 [14:40:08] P2 pong Epoch 5/25 (steps=4) | loss=0.12631 lr=0.000500 [14:41:04] P2 pong Epoch 6/25 (steps=4) | loss=0.11644 lr=0.000500 [14:43:21] P2 pong Epoch 7/25 (steps=8) | loss=0.18012 lr=0.000500 [14:45:38] P2 pong Epoch 8/25 (steps=8) | loss=0.17484 lr=0.000500 [14:47:57] P2 pong Epoch 9/25 (steps=8) | loss=0.16717 lr=0.000500 [14:50:15] P2 pong Epoch 10/25 (steps=8) | loss=0.15650 lr=0.000500 [14:52:31] P2 pong Epoch 11/25 (steps=8) | loss=0.14624 lr=0.000500 [14:54:46] P2 pong Epoch 12/25 (steps=8) | loss=0.13932 lr=0.000500 [14:57:01] P2 pong Epoch 13/25 (steps=8) | loss=0.12899 lr=0.000493 [14:59:17] P2 pong Epoch 14/25 (steps=8) | loss=0.11960 lr=0.000471 [15:01:35] P2 pong Epoch 15/25 (steps=8) | loss=0.10872 lr=0.000437 [15:03:52] P2 pong Epoch 16/25 (steps=8) | loss=0.09965 lr=0.000392 [15:06:07] P2 pong Epoch 17/25 (steps=8) | loss=0.08785 lr=0.000339 [15:08:27] P2 pong Epoch 18/25 (steps=8) | loss=0.07890 lr=0.000280 [15:10:44] P2 pong Epoch 19/25 (steps=8) | loss=0.06718 lr=0.000220 [15:13:01] P2 pong Epoch 20/25 (steps=8) | loss=0.06123 lr=0.000161 [15:15:20] P2 pong Epoch 21/25 (steps=8) | loss=0.05374 lr=0.000108 [15:17:40] P2 pong Epoch 22/25 (steps=8) | loss=0.04863 lr=0.000063 [15:19:57] P2 pong Epoch 23/25 (steps=8) | loss=0.04435 lr=0.000029 [15:22:13] P2 pong Epoch 24/25 (steps=8) | loss=0.04174 lr=0.000010 [15:24:31] P2 pong Epoch 25/25 (steps=8) | loss=0.04022 lr=0.000010 [15:24:31] pong training complete. [15:24:31] === Training sonic ([40, 80, 160]) === [15:24:31] 3,150,686 parameters [15:24:31] Phase 1: 10 epochs single-step [15:24:34] 32256 sequences [15:25:03] P1 sonic Epoch 1/10 | loss=0.08400 [15:25:34] P1 sonic Epoch 2/10 | loss=0.06966 [15:26:03] P1 sonic Epoch 3/10 | loss=0.06589 [15:26:34] P1 sonic Epoch 4/10 | loss=0.06327 [15:27:03] P1 sonic Epoch 5/10 | loss=0.06111 [15:27:33] P1 sonic Epoch 6/10 | loss=0.05881 [15:28:03] P1 sonic Epoch 7/10 | loss=0.05682 [15:28:33] P1 sonic Epoch 8/10 | loss=0.05514 [15:29:02] P1 sonic Epoch 9/10 | loss=0.05358 [15:29:32] P1 sonic Epoch 10/10 | loss=0.05256 [15:29:32] Phase 2: 25 epochs graduated AR [15:30:57] P2 sonic Epoch 1/25 (steps=2) | loss=0.07446 lr=0.000500 [15:32:15] P2 sonic Epoch 2/25 (steps=2) | loss=0.07291 lr=0.000500 [15:33:41] P2 sonic Epoch 3/25 (steps=2) | loss=0.07128 lr=0.000500 [15:37:15] P2 sonic Epoch 4/25 (steps=4) | loss=0.10220 lr=0.000500 [15:40:50] P2 sonic Epoch 5/25 (steps=4) | loss=0.09976 lr=0.000500 [15:44:24] P2 sonic Epoch 6/25 (steps=4) | loss=0.09779 lr=0.000500 [15:53:05] P2 sonic Epoch 7/25 (steps=8) | loss=0.14037 lr=0.000500 [16:01:41] P2 sonic Epoch 8/25 (steps=8) | loss=0.13753 lr=0.000500 [16:10:26] P2 sonic Epoch 9/25 (steps=8) | loss=0.13476 lr=0.000500 [16:19:08] P2 sonic Epoch 10/25 (steps=8) | loss=0.13232 lr=0.000500 [16:28:05] P2 sonic Epoch 11/25 (steps=8) | loss=0.13010 lr=0.000500 [16:37:18] P2 sonic Epoch 12/25 (steps=8) | loss=0.12790 lr=0.000500 [16:46:19] P2 sonic Epoch 13/25 (steps=8) | loss=0.12592 lr=0.000493 [16:55:21] P2 sonic Epoch 14/25 (steps=8) | loss=0.12408 lr=0.000471 [17:04:34] P2 sonic Epoch 15/25 (steps=8) | loss=0.12210 lr=0.000437 [17:13:54] P2 sonic Epoch 16/25 (steps=8) | loss=0.11900 lr=0.000392 [17:23:04] P2 sonic Epoch 17/25 (steps=8) | loss=0.11596 lr=0.000339 [17:32:08] P2 sonic Epoch 18/25 (steps=8) | loss=0.11287 lr=0.000280 [17:41:13] P2 sonic Epoch 19/25 (steps=8) | loss=0.10939 lr=0.000220 [17:50:18] P2 sonic Epoch 20/25 (steps=8) | loss=0.10548 lr=0.000161 [17:59:23] P2 sonic Epoch 21/25 (steps=8) | loss=0.10183 lr=0.000108 [18:08:26] P2 sonic Epoch 22/25 (steps=8) | loss=0.09841 lr=0.000063 [18:17:35] P2 sonic Epoch 23/25 (steps=8) | loss=0.09526 lr=0.000029 [18:26:41] P2 sonic Epoch 24/25 (steps=8) | loss=0.09337 lr=0.000010 [18:35:42] P2 sonic Epoch 25/25 (steps=8) | loss=0.09193 lr=0.000010 [18:35:42] sonic training complete. [18:35:42] === Training pole_position ([24, 48, 96]) === [18:35:42] 1,137,006 parameters [18:35:42] Phase 1: 10 epochs single-step [18:35:42] 4284 sequences [18:35:46] P1 pole_position Epoch 1/10 | loss=0.05831 [18:35:50] P1 pole_position Epoch 2/10 | loss=0.03691 [18:35:54] P1 pole_position Epoch 3/10 | loss=0.03064 [18:35:57] P1 pole_position Epoch 4/10 | loss=0.02707 [18:36:00] P1 pole_position Epoch 5/10 | loss=0.02428 [18:36:04] P1 pole_position Epoch 6/10 | loss=0.02271 [18:36:07] P1 pole_position Epoch 7/10 | loss=0.02128 [18:36:11] P1 pole_position Epoch 8/10 | loss=0.02013 [18:36:15] P1 pole_position Epoch 9/10 | loss=0.01936 [18:36:19] P1 pole_position Epoch 10/10 | loss=0.01879 [18:36:19] Phase 2: 25 epochs graduated AR [18:36:31] P2 pole_position Epoch 1/25 (steps=2) | loss=0.02742 lr=0.000500 [18:36:42] P2 pole_position Epoch 2/25 (steps=2) | loss=0.02621 lr=0.000500 [18:36:54] P2 pole_position Epoch 3/25 (steps=2) | loss=0.02502 lr=0.000500 [18:37:22] P2 pole_position Epoch 4/25 (steps=4) | loss=0.03779 lr=0.000500 [18:37:51] P2 pole_position Epoch 5/25 (steps=4) | loss=0.03543 lr=0.000500 [18:38:19] P2 pole_position Epoch 6/25 (steps=4) | loss=0.03421 lr=0.000500 [18:39:31] P2 pole_position Epoch 7/25 (steps=8) | loss=0.05263 lr=0.000500 [18:40:42] P2 pole_position Epoch 8/25 (steps=8) | loss=0.05159 lr=0.000500 [18:41:53] P2 pole_position Epoch 9/25 (steps=8) | loss=0.04987 lr=0.000500 [18:43:05] P2 pole_position Epoch 10/25 (steps=8) | loss=0.04848 lr=0.000500 [18:44:17] P2 pole_position Epoch 11/25 (steps=8) | loss=0.04744 lr=0.000500 [18:45:30] P2 pole_position Epoch 12/25 (steps=8) | loss=0.04603 lr=0.000500 [18:46:42] P2 pole_position Epoch 13/25 (steps=8) | loss=0.04495 lr=0.000493 [18:47:54] P2 pole_position Epoch 14/25 (steps=8) | loss=0.04383 lr=0.000471 [18:49:05] P2 pole_position Epoch 15/25 (steps=8) | loss=0.04233 lr=0.000437 [18:50:18] P2 pole_position Epoch 16/25 (steps=8) | loss=0.04089 lr=0.000392 [18:51:30] P2 pole_position Epoch 17/25 (steps=8) | loss=0.03911 lr=0.000339 [18:52:43] P2 pole_position Epoch 18/25 (steps=8) | loss=0.03667 lr=0.000280 [18:53:55] P2 pole_position Epoch 19/25 (steps=8) | loss=0.03494 lr=0.000220 [18:55:06] P2 pole_position Epoch 20/25 (steps=8) | loss=0.03271 lr=0.000161 [18:56:18] P2 pole_position Epoch 21/25 (steps=8) | loss=0.03049 lr=0.000108 [18:57:31] P2 pole_position Epoch 22/25 (steps=8) | loss=0.02831 lr=0.000063 [18:58:44] P2 pole_position Epoch 23/25 (steps=8) | loss=0.02653 lr=0.000029 [18:59:58] P2 pole_position Epoch 24/25 (steps=8) | loss=0.02527 lr=0.000010 [19:01:11] P2 pole_position Epoch 25/25 (steps=8) | loss=0.02460 lr=0.000010 [19:01:11] pole_position training complete. [19:01:11] Evaluating... [19:02:25] Val SSIM=0.8626 | {'pong': 0.862, 'sonic': 0.7822, 'pole_position': 0.9435} [19:02:25] Experiment dir: 12.7 MB [19:02:25] Training complete.