File size: 7,921 Bytes
99c8044
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
[14:35:51] Device: cuda
[14:35:51] 
=== Training pong ([32, 64, 128]) ===
[14:35:51]   2,018,278 parameters
[14:35:51]   Phase 1: 10 epochs single-step
[14:35:51]   8568 sequences
[14:36:00]   P1 pong Epoch 1/10 | loss=0.14558
[14:36:08]   P1 pong Epoch 2/10 | loss=0.10721
[14:36:17]   P1 pong Epoch 3/10 | loss=0.09795
[14:36:25]   P1 pong Epoch 4/10 | loss=0.08996
[14:36:33]   P1 pong Epoch 5/10 | loss=0.08384
[14:36:41]   P1 pong Epoch 6/10 | loss=0.07755
[14:36:49]   P1 pong Epoch 7/10 | loss=0.06995
[14:36:57]   P1 pong Epoch 8/10 | loss=0.06272
[14:37:05]   P1 pong Epoch 9/10 | loss=0.05640
[14:37:13]   P1 pong Epoch 10/10 | loss=0.05177
[14:37:13]   Phase 2: 25 epochs graduated AR
[14:37:37]   P2 pong Epoch 1/25 (steps=2) | loss=0.09787 lr=0.000500
[14:37:59]   P2 pong Epoch 2/25 (steps=2) | loss=0.08854 lr=0.000500
[14:38:21]   P2 pong Epoch 3/25 (steps=2) | loss=0.08343 lr=0.000500
[14:39:15]   P2 pong Epoch 4/25 (steps=4) | loss=0.13928 lr=0.000500
[14:40:08]   P2 pong Epoch 5/25 (steps=4) | loss=0.12631 lr=0.000500
[14:41:04]   P2 pong Epoch 6/25 (steps=4) | loss=0.11644 lr=0.000500
[14:43:21]   P2 pong Epoch 7/25 (steps=8) | loss=0.18012 lr=0.000500
[14:45:38]   P2 pong Epoch 8/25 (steps=8) | loss=0.17484 lr=0.000500
[14:47:57]   P2 pong Epoch 9/25 (steps=8) | loss=0.16717 lr=0.000500
[14:50:15]   P2 pong Epoch 10/25 (steps=8) | loss=0.15650 lr=0.000500
[14:52:31]   P2 pong Epoch 11/25 (steps=8) | loss=0.14624 lr=0.000500
[14:54:46]   P2 pong Epoch 12/25 (steps=8) | loss=0.13932 lr=0.000500
[14:57:01]   P2 pong Epoch 13/25 (steps=8) | loss=0.12899 lr=0.000493
[14:59:17]   P2 pong Epoch 14/25 (steps=8) | loss=0.11960 lr=0.000471
[15:01:35]   P2 pong Epoch 15/25 (steps=8) | loss=0.10872 lr=0.000437
[15:03:52]   P2 pong Epoch 16/25 (steps=8) | loss=0.09965 lr=0.000392
[15:06:07]   P2 pong Epoch 17/25 (steps=8) | loss=0.08785 lr=0.000339
[15:08:27]   P2 pong Epoch 18/25 (steps=8) | loss=0.07890 lr=0.000280
[15:10:44]   P2 pong Epoch 19/25 (steps=8) | loss=0.06718 lr=0.000220
[15:13:01]   P2 pong Epoch 20/25 (steps=8) | loss=0.06123 lr=0.000161
[15:15:20]   P2 pong Epoch 21/25 (steps=8) | loss=0.05374 lr=0.000108
[15:17:40]   P2 pong Epoch 22/25 (steps=8) | loss=0.04863 lr=0.000063
[15:19:57]   P2 pong Epoch 23/25 (steps=8) | loss=0.04435 lr=0.000029
[15:22:13]   P2 pong Epoch 24/25 (steps=8) | loss=0.04174 lr=0.000010
[15:24:31]   P2 pong Epoch 25/25 (steps=8) | loss=0.04022 lr=0.000010
[15:24:31]   pong training complete.
[15:24:31] 
=== Training sonic ([40, 80, 160]) ===
[15:24:31]   3,150,686 parameters
[15:24:31]   Phase 1: 10 epochs single-step
[15:24:34]   32256 sequences
[15:25:03]   P1 sonic Epoch 1/10 | loss=0.08400
[15:25:34]   P1 sonic Epoch 2/10 | loss=0.06966
[15:26:03]   P1 sonic Epoch 3/10 | loss=0.06589
[15:26:34]   P1 sonic Epoch 4/10 | loss=0.06327
[15:27:03]   P1 sonic Epoch 5/10 | loss=0.06111
[15:27:33]   P1 sonic Epoch 6/10 | loss=0.05881
[15:28:03]   P1 sonic Epoch 7/10 | loss=0.05682
[15:28:33]   P1 sonic Epoch 8/10 | loss=0.05514
[15:29:02]   P1 sonic Epoch 9/10 | loss=0.05358
[15:29:32]   P1 sonic Epoch 10/10 | loss=0.05256
[15:29:32]   Phase 2: 25 epochs graduated AR
[15:30:57]   P2 sonic Epoch 1/25 (steps=2) | loss=0.07446 lr=0.000500
[15:32:15]   P2 sonic Epoch 2/25 (steps=2) | loss=0.07291 lr=0.000500
[15:33:41]   P2 sonic Epoch 3/25 (steps=2) | loss=0.07128 lr=0.000500
[15:37:15]   P2 sonic Epoch 4/25 (steps=4) | loss=0.10220 lr=0.000500
[15:40:50]   P2 sonic Epoch 5/25 (steps=4) | loss=0.09976 lr=0.000500
[15:44:24]   P2 sonic Epoch 6/25 (steps=4) | loss=0.09779 lr=0.000500
[15:53:05]   P2 sonic Epoch 7/25 (steps=8) | loss=0.14037 lr=0.000500
[16:01:41]   P2 sonic Epoch 8/25 (steps=8) | loss=0.13753 lr=0.000500
[16:10:26]   P2 sonic Epoch 9/25 (steps=8) | loss=0.13476 lr=0.000500
[16:19:08]   P2 sonic Epoch 10/25 (steps=8) | loss=0.13232 lr=0.000500
[16:28:05]   P2 sonic Epoch 11/25 (steps=8) | loss=0.13010 lr=0.000500
[16:37:18]   P2 sonic Epoch 12/25 (steps=8) | loss=0.12790 lr=0.000500
[16:46:19]   P2 sonic Epoch 13/25 (steps=8) | loss=0.12592 lr=0.000493
[16:55:21]   P2 sonic Epoch 14/25 (steps=8) | loss=0.12408 lr=0.000471
[17:04:34]   P2 sonic Epoch 15/25 (steps=8) | loss=0.12210 lr=0.000437
[17:13:54]   P2 sonic Epoch 16/25 (steps=8) | loss=0.11900 lr=0.000392
[17:23:04]   P2 sonic Epoch 17/25 (steps=8) | loss=0.11596 lr=0.000339
[17:32:08]   P2 sonic Epoch 18/25 (steps=8) | loss=0.11287 lr=0.000280
[17:41:13]   P2 sonic Epoch 19/25 (steps=8) | loss=0.10939 lr=0.000220
[17:50:18]   P2 sonic Epoch 20/25 (steps=8) | loss=0.10548 lr=0.000161
[17:59:23]   P2 sonic Epoch 21/25 (steps=8) | loss=0.10183 lr=0.000108
[18:08:26]   P2 sonic Epoch 22/25 (steps=8) | loss=0.09841 lr=0.000063
[18:17:35]   P2 sonic Epoch 23/25 (steps=8) | loss=0.09526 lr=0.000029
[18:26:41]   P2 sonic Epoch 24/25 (steps=8) | loss=0.09337 lr=0.000010
[18:35:42]   P2 sonic Epoch 25/25 (steps=8) | loss=0.09193 lr=0.000010
[18:35:42]   sonic training complete.
[18:35:42] 
=== Training pole_position ([24, 48, 96]) ===
[18:35:42]   1,137,006 parameters
[18:35:42]   Phase 1: 10 epochs single-step
[18:35:42]   4284 sequences
[18:35:46]   P1 pole_position Epoch 1/10 | loss=0.05831
[18:35:50]   P1 pole_position Epoch 2/10 | loss=0.03691
[18:35:54]   P1 pole_position Epoch 3/10 | loss=0.03064
[18:35:57]   P1 pole_position Epoch 4/10 | loss=0.02707
[18:36:00]   P1 pole_position Epoch 5/10 | loss=0.02428
[18:36:04]   P1 pole_position Epoch 6/10 | loss=0.02271
[18:36:07]   P1 pole_position Epoch 7/10 | loss=0.02128
[18:36:11]   P1 pole_position Epoch 8/10 | loss=0.02013
[18:36:15]   P1 pole_position Epoch 9/10 | loss=0.01936
[18:36:19]   P1 pole_position Epoch 10/10 | loss=0.01879
[18:36:19]   Phase 2: 25 epochs graduated AR
[18:36:31]   P2 pole_position Epoch 1/25 (steps=2) | loss=0.02742 lr=0.000500
[18:36:42]   P2 pole_position Epoch 2/25 (steps=2) | loss=0.02621 lr=0.000500
[18:36:54]   P2 pole_position Epoch 3/25 (steps=2) | loss=0.02502 lr=0.000500
[18:37:22]   P2 pole_position Epoch 4/25 (steps=4) | loss=0.03779 lr=0.000500
[18:37:51]   P2 pole_position Epoch 5/25 (steps=4) | loss=0.03543 lr=0.000500
[18:38:19]   P2 pole_position Epoch 6/25 (steps=4) | loss=0.03421 lr=0.000500
[18:39:31]   P2 pole_position Epoch 7/25 (steps=8) | loss=0.05263 lr=0.000500
[18:40:42]   P2 pole_position Epoch 8/25 (steps=8) | loss=0.05159 lr=0.000500
[18:41:53]   P2 pole_position Epoch 9/25 (steps=8) | loss=0.04987 lr=0.000500
[18:43:05]   P2 pole_position Epoch 10/25 (steps=8) | loss=0.04848 lr=0.000500
[18:44:17]   P2 pole_position Epoch 11/25 (steps=8) | loss=0.04744 lr=0.000500
[18:45:30]   P2 pole_position Epoch 12/25 (steps=8) | loss=0.04603 lr=0.000500
[18:46:42]   P2 pole_position Epoch 13/25 (steps=8) | loss=0.04495 lr=0.000493
[18:47:54]   P2 pole_position Epoch 14/25 (steps=8) | loss=0.04383 lr=0.000471
[18:49:05]   P2 pole_position Epoch 15/25 (steps=8) | loss=0.04233 lr=0.000437
[18:50:18]   P2 pole_position Epoch 16/25 (steps=8) | loss=0.04089 lr=0.000392
[18:51:30]   P2 pole_position Epoch 17/25 (steps=8) | loss=0.03911 lr=0.000339
[18:52:43]   P2 pole_position Epoch 18/25 (steps=8) | loss=0.03667 lr=0.000280
[18:53:55]   P2 pole_position Epoch 19/25 (steps=8) | loss=0.03494 lr=0.000220
[18:55:06]   P2 pole_position Epoch 20/25 (steps=8) | loss=0.03271 lr=0.000161
[18:56:18]   P2 pole_position Epoch 21/25 (steps=8) | loss=0.03049 lr=0.000108
[18:57:31]   P2 pole_position Epoch 22/25 (steps=8) | loss=0.02831 lr=0.000063
[18:58:44]   P2 pole_position Epoch 23/25 (steps=8) | loss=0.02653 lr=0.000029
[18:59:58]   P2 pole_position Epoch 24/25 (steps=8) | loss=0.02527 lr=0.000010
[19:01:11]   P2 pole_position Epoch 25/25 (steps=8) | loss=0.02460 lr=0.000010
[19:01:11]   pole_position training complete.
[19:01:11] Evaluating...
[19:02:25] Val SSIM=0.8626 | {'pong': 0.862, 'sonic': 0.7822, 'pole_position': 0.9435}
[19:02:25] Experiment dir: 12.7 MB
[19:02:25] Training complete.