appo-mujoco-pendulum / sf_log.txt
MattStammers's picture
Upload folder using huggingface_hub
68a9531
[2023-09-21 12:45:39,661][25893] Saving configuration to ./train_dir/Pendulum/config.json...
[2023-09-21 12:45:39,726][25893] Rollout worker 0 uses device cpu
[2023-09-21 12:45:39,727][25893] Rollout worker 1 uses device cpu
[2023-09-21 12:45:39,727][25893] Rollout worker 2 uses device cpu
[2023-09-21 12:45:39,728][25893] Rollout worker 3 uses device cpu
[2023-09-21 12:45:39,728][25893] Rollout worker 4 uses device cpu
[2023-09-21 12:45:39,729][25893] Rollout worker 5 uses device cpu
[2023-09-21 12:45:39,729][25893] Rollout worker 6 uses device cpu
[2023-09-21 12:45:39,730][25893] Rollout worker 7 uses device cpu
[2023-09-21 12:45:39,730][25893] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1
[2023-09-21 12:45:39,777][25893] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-21 12:45:39,777][25893] InferenceWorker_p0-w0: min num requests: 1
[2023-09-21 12:45:39,780][25893] Using GPUs [1] for process 1 (actually maps to GPUs [1])
[2023-09-21 12:45:39,781][25893] InferenceWorker_p1-w0: min num requests: 1
[2023-09-21 12:45:39,804][25893] Starting all processes...
[2023-09-21 12:45:39,804][25893] Starting process learner_proc0
[2023-09-21 12:45:39,807][25893] Starting process learner_proc1
[2023-09-21 12:45:39,854][25893] Starting all processes...
[2023-09-21 12:45:39,861][25893] Starting process inference_proc0-0
[2023-09-21 12:45:39,861][25893] Starting process inference_proc1-0
[2023-09-21 12:45:39,861][25893] Starting process rollout_proc0
[2023-09-21 12:45:39,862][25893] Starting process rollout_proc1
[2023-09-21 12:45:39,862][25893] Starting process rollout_proc2
[2023-09-21 12:45:39,869][25893] Starting process rollout_proc3
[2023-09-21 12:45:39,874][25893] Starting process rollout_proc4
[2023-09-21 12:45:39,874][25893] Starting process rollout_proc5
[2023-09-21 12:45:39,874][25893] Starting process rollout_proc6
[2023-09-21 12:45:39,876][25893] Starting process rollout_proc7
[2023-09-21 12:45:41,754][26628] Worker 6 uses CPU cores [24, 25, 26, 27]
[2023-09-21 12:45:41,756][26520] Using GPUs [1] for process 1 (actually maps to GPUs [1])
[2023-09-21 12:45:41,756][26520] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for learning process 1
[2023-09-21 12:45:41,771][26610] Worker 0 uses CPU cores [0, 1, 2, 3]
[2023-09-21 12:45:41,771][26630] Worker 5 uses CPU cores [20, 21, 22, 23]
[2023-09-21 12:45:41,774][26520] Num visible devices: 1
[2023-09-21 12:45:41,789][26611] Worker 1 uses CPU cores [4, 5, 6, 7]
[2023-09-21 12:45:41,802][26520] Starting seed is not provided
[2023-09-21 12:45:41,802][26520] Using GPUs [0] for process 1 (actually maps to GPUs [1])
[2023-09-21 12:45:41,802][26520] Initializing actor-critic model on device cuda:0
[2023-09-21 12:45:41,802][26520] RunningMeanStd input shape: (4,)
[2023-09-21 12:45:41,803][26520] RunningMeanStd input shape: (1,)
[2023-09-21 12:45:41,805][26631] Worker 2 uses CPU cores [8, 9, 10, 11]
[2023-09-21 12:45:41,805][26632] Worker 7 uses CPU cores [28, 29, 30, 31]
[2023-09-21 12:45:41,808][26609] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-21 12:45:41,809][26609] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2023-09-21 12:45:41,827][26609] Num visible devices: 1
[2023-09-21 12:45:41,873][26612] Worker 3 uses CPU cores [12, 13, 14, 15]
[2023-09-21 12:45:41,895][26520] Created Actor Critic model with architecture:
[2023-09-21 12:45:41,895][26520] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): MultiInputEncoder(
(encoders): ModuleDict(
(obs): MlpEncoder(
(mlp_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=Tanh)
(2): RecursiveScriptModule(original_name=Linear)
(3): RecursiveScriptModule(original_name=Tanh)
)
)
)
)
(core): ModelCoreIdentity()
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=64, out_features=1, bias=True)
(action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev(
(distribution_linear): Linear(in_features=64, out_features=1, bias=True)
)
)
[2023-09-21 12:45:41,959][26519] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-21 12:45:41,959][26519] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2023-09-21 12:45:41,977][26519] Num visible devices: 1
[2023-09-21 12:45:41,999][26519] Starting seed is not provided
[2023-09-21 12:45:41,999][26519] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-21 12:45:41,999][26519] Initializing actor-critic model on device cuda:0
[2023-09-21 12:45:41,999][26608] Using GPUs [1] for process 1 (actually maps to GPUs [1])
[2023-09-21 12:45:41,999][26608] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for inference process 1
[2023-09-21 12:45:41,999][26519] RunningMeanStd input shape: (4,)
[2023-09-21 12:45:42,000][26519] RunningMeanStd input shape: (1,)
[2023-09-21 12:45:42,018][26608] Num visible devices: 1
[2023-09-21 12:45:42,045][26519] Created Actor Critic model with architecture:
[2023-09-21 12:45:42,045][26519] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): MultiInputEncoder(
(encoders): ModuleDict(
(obs): MlpEncoder(
(mlp_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=Tanh)
(2): RecursiveScriptModule(original_name=Linear)
(3): RecursiveScriptModule(original_name=Tanh)
)
)
)
)
(core): ModelCoreIdentity()
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=64, out_features=1, bias=True)
(action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev(
(distribution_linear): Linear(in_features=64, out_features=1, bias=True)
)
)
[2023-09-21 12:45:42,148][26620] Worker 4 uses CPU cores [16, 17, 18, 19]
[2023-09-21 12:45:42,445][26520] Using optimizer <class 'torch.optim.adam.Adam'>
[2023-09-21 12:45:42,446][26520] No checkpoints found
[2023-09-21 12:45:42,446][26520] Did not load from checkpoint, starting from scratch!
[2023-09-21 12:45:42,446][26520] Initialized policy 1 weights for model version 0
[2023-09-21 12:45:42,448][26520] LearnerWorker_p1 finished initialization!
[2023-09-21 12:45:42,448][26520] Using GPUs [0] for process 1 (actually maps to GPUs [1])
[2023-09-21 12:45:42,595][26519] Using optimizer <class 'torch.optim.adam.Adam'>
[2023-09-21 12:45:42,596][26519] No checkpoints found
[2023-09-21 12:45:42,596][26519] Did not load from checkpoint, starting from scratch!
[2023-09-21 12:45:42,596][26519] Initialized policy 0 weights for model version 0
[2023-09-21 12:45:42,597][26519] LearnerWorker_p0 finished initialization!
[2023-09-21 12:45:42,598][26519] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-21 12:45:42,983][26608] RunningMeanStd input shape: (4,)
[2023-09-21 12:45:42,983][26608] RunningMeanStd input shape: (1,)
[2023-09-21 12:45:43,014][25893] Inference worker 1-0 is ready!
[2023-09-21 12:45:43,137][26609] RunningMeanStd input shape: (4,)
[2023-09-21 12:45:43,138][26609] RunningMeanStd input shape: (1,)
[2023-09-21 12:45:43,168][25893] Inference worker 0-0 is ready!
[2023-09-21 12:45:43,169][25893] All inference workers are ready! Signal rollout workers to start!
[2023-09-21 12:45:43,266][26631] Decorrelating experience for 0 frames...
[2023-09-21 12:45:43,266][26611] Decorrelating experience for 0 frames...
[2023-09-21 12:45:43,266][26628] Decorrelating experience for 0 frames...
[2023-09-21 12:45:43,266][26630] Decorrelating experience for 0 frames...
[2023-09-21 12:45:43,267][26631] Decorrelating experience for 64 frames...
[2023-09-21 12:45:43,267][26628] Decorrelating experience for 64 frames...
[2023-09-21 12:45:43,267][26611] Decorrelating experience for 64 frames...
[2023-09-21 12:45:43,267][26630] Decorrelating experience for 64 frames...
[2023-09-21 12:45:43,274][26631] Decorrelating experience for 128 frames...
[2023-09-21 12:45:43,274][26628] Decorrelating experience for 128 frames...
[2023-09-21 12:45:43,275][26611] Decorrelating experience for 128 frames...
[2023-09-21 12:45:43,275][26630] Decorrelating experience for 128 frames...
[2023-09-21 12:45:43,284][26632] Decorrelating experience for 0 frames...
[2023-09-21 12:45:43,284][26632] Decorrelating experience for 64 frames...
[2023-09-21 12:45:43,285][26612] Decorrelating experience for 0 frames...
[2023-09-21 12:45:43,285][26612] Decorrelating experience for 64 frames...
[2023-09-21 12:45:43,289][26631] Decorrelating experience for 192 frames...
[2023-09-21 12:45:43,289][26611] Decorrelating experience for 192 frames...
[2023-09-21 12:45:43,289][26628] Decorrelating experience for 192 frames...
[2023-09-21 12:45:43,289][26630] Decorrelating experience for 192 frames...
[2023-09-21 12:45:43,292][26632] Decorrelating experience for 128 frames...
[2023-09-21 12:45:43,293][26612] Decorrelating experience for 128 frames...
[2023-09-21 12:45:43,299][26610] Decorrelating experience for 0 frames...
[2023-09-21 12:45:43,300][26610] Decorrelating experience for 64 frames...
[2023-09-21 12:45:43,300][26620] Decorrelating experience for 0 frames...
[2023-09-21 12:45:43,301][26620] Decorrelating experience for 64 frames...
[2023-09-21 12:45:43,306][26632] Decorrelating experience for 192 frames...
[2023-09-21 12:45:43,308][26612] Decorrelating experience for 192 frames...
[2023-09-21 12:45:43,313][26610] Decorrelating experience for 128 frames...
[2023-09-21 12:45:43,314][26620] Decorrelating experience for 128 frames...
[2023-09-21 12:45:43,318][26630] Decorrelating experience for 256 frames...
[2023-09-21 12:45:43,318][26611] Decorrelating experience for 256 frames...
[2023-09-21 12:45:43,318][26628] Decorrelating experience for 256 frames...
[2023-09-21 12:45:43,319][26631] Decorrelating experience for 256 frames...
[2023-09-21 12:45:43,334][26632] Decorrelating experience for 256 frames...
[2023-09-21 12:45:43,337][26612] Decorrelating experience for 256 frames...
[2023-09-21 12:45:43,339][26610] Decorrelating experience for 192 frames...
[2023-09-21 12:45:43,340][26620] Decorrelating experience for 192 frames...
[2023-09-21 12:45:43,346][26628] Decorrelating experience for 320 frames...
[2023-09-21 12:45:43,346][26630] Decorrelating experience for 320 frames...
[2023-09-21 12:45:43,346][26611] Decorrelating experience for 320 frames...
[2023-09-21 12:45:43,347][26631] Decorrelating experience for 320 frames...
[2023-09-21 12:45:43,361][26632] Decorrelating experience for 320 frames...
[2023-09-21 12:45:43,365][26612] Decorrelating experience for 320 frames...
[2023-09-21 12:45:43,380][26630] Decorrelating experience for 384 frames...
[2023-09-21 12:45:43,381][26628] Decorrelating experience for 384 frames...
[2023-09-21 12:45:43,381][26611] Decorrelating experience for 384 frames...
[2023-09-21 12:45:43,381][26620] Decorrelating experience for 256 frames...
[2023-09-21 12:45:43,382][26631] Decorrelating experience for 384 frames...
[2023-09-21 12:45:43,388][26610] Decorrelating experience for 256 frames...
[2023-09-21 12:45:43,395][26632] Decorrelating experience for 384 frames...
[2023-09-21 12:45:43,400][26612] Decorrelating experience for 384 frames...
[2023-09-21 12:45:43,410][26620] Decorrelating experience for 320 frames...
[2023-09-21 12:45:43,416][26610] Decorrelating experience for 320 frames...
[2023-09-21 12:45:43,423][26628] Decorrelating experience for 448 frames...
[2023-09-21 12:45:43,423][26630] Decorrelating experience for 448 frames...
[2023-09-21 12:45:43,424][26631] Decorrelating experience for 448 frames...
[2023-09-21 12:45:43,424][26611] Decorrelating experience for 448 frames...
[2023-09-21 12:45:43,437][26632] Decorrelating experience for 448 frames...
[2023-09-21 12:45:43,442][26612] Decorrelating experience for 448 frames...
[2023-09-21 12:45:43,445][26620] Decorrelating experience for 384 frames...
[2023-09-21 12:45:43,450][26610] Decorrelating experience for 384 frames...
[2023-09-21 12:45:43,487][26620] Decorrelating experience for 448 frames...
[2023-09-21 12:45:43,508][26610] Decorrelating experience for 448 frames...
[2023-09-21 12:45:46,049][25893] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8192. Throughput: 0: nan, 1: nan. Samples: 6008. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 12:45:46,050][25893] Avg episode reward: [(0, '10.200'), (1, '13.010')]
[2023-09-21 12:45:51,050][25893] Fps is (10 sec: 11468.2, 60 sec: 11468.2, 300 sec: 11468.2). Total num frames: 65536. Throughput: 0: 4199.8, 1: 2891.1. Samples: 41464. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:45:51,051][25893] Avg episode reward: [(0, '106.460'), (1, '111.620')]
[2023-09-21 12:45:51,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000000064_32768.pth...
[2023-09-21 12:45:51,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000000064_32768.pth...
[2023-09-21 12:45:51,820][26609] Updated weights for policy 0, policy_version 80 (0.0014)
[2023-09-21 12:45:51,820][26608] Updated weights for policy 1, policy_version 80 (0.0012)
[2023-09-21 12:45:56,050][25893] Fps is (10 sec: 13107.0, 60 sec: 13107.0, 300 sec: 13107.0). Total num frames: 139264. Throughput: 0: 6483.3, 1: 5740.7. Samples: 128250. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:45:56,051][25893] Avg episode reward: [(0, '183.830'), (1, '242.370')]
[2023-09-21 12:45:57,395][26608] Updated weights for policy 1, policy_version 160 (0.0011)
[2023-09-21 12:45:57,396][26609] Updated weights for policy 0, policy_version 160 (0.0014)
[2023-09-21 12:45:59,765][25893] Heartbeat connected on Batcher_0
[2023-09-21 12:45:59,767][25893] Heartbeat connected on LearnerWorker_p0
[2023-09-21 12:45:59,770][25893] Heartbeat connected on Batcher_1
[2023-09-21 12:45:59,773][25893] Heartbeat connected on LearnerWorker_p1
[2023-09-21 12:45:59,780][25893] Heartbeat connected on InferenceWorker_p0-w0
[2023-09-21 12:45:59,783][25893] Heartbeat connected on InferenceWorker_p1-w0
[2023-09-21 12:45:59,784][25893] Heartbeat connected on RolloutWorker_w0
[2023-09-21 12:45:59,786][25893] Heartbeat connected on RolloutWorker_w1
[2023-09-21 12:45:59,790][25893] Heartbeat connected on RolloutWorker_w2
[2023-09-21 12:45:59,792][25893] Heartbeat connected on RolloutWorker_w3
[2023-09-21 12:45:59,794][25893] Heartbeat connected on RolloutWorker_w4
[2023-09-21 12:45:59,797][25893] Heartbeat connected on RolloutWorker_w5
[2023-09-21 12:45:59,800][25893] Heartbeat connected on RolloutWorker_w6
[2023-09-21 12:45:59,803][25893] Heartbeat connected on RolloutWorker_w7
[2023-09-21 12:46:01,050][25893] Fps is (10 sec: 13926.6, 60 sec: 13107.1, 300 sec: 13107.1). Total num frames: 204800. Throughput: 0: 5798.1, 1: 5334.0. Samples: 172990. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:46:01,050][25893] Avg episode reward: [(0, '476.430'), (1, '606.040')]
[2023-09-21 12:46:01,060][26520] Saving new best policy, reward=606.040!
[2023-09-21 12:46:01,072][26519] Saving new best policy, reward=476.430!
[2023-09-21 12:46:03,241][26609] Updated weights for policy 0, policy_version 240 (0.0015)
[2023-09-21 12:46:03,242][26608] Updated weights for policy 1, policy_version 240 (0.0014)
[2023-09-21 12:46:06,050][25893] Fps is (10 sec: 14745.7, 60 sec: 13926.3, 300 sec: 13926.3). Total num frames: 286720. Throughput: 0: 6494.3, 1: 6121.8. Samples: 258330. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 12:46:06,050][25893] Avg episode reward: [(0, '887.110'), (1, '957.030')]
[2023-09-21 12:46:06,058][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000000280_143360.pth...
[2023-09-21 12:46:06,058][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000000280_143360.pth...
[2023-09-21 12:46:06,065][26520] Saving new best policy, reward=957.030!
[2023-09-21 12:46:06,066][26519] Saving new best policy, reward=887.110!
[2023-09-21 12:46:08,435][26609] Updated weights for policy 0, policy_version 320 (0.0015)
[2023-09-21 12:46:08,435][26608] Updated weights for policy 1, policy_version 320 (0.0013)
[2023-09-21 12:46:11,050][25893] Fps is (10 sec: 15564.9, 60 sec: 14090.2, 300 sec: 14090.2). Total num frames: 360448. Throughput: 0: 7067.9, 1: 6802.5. Samples: 352768. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:46:11,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:46:11,051][26519] Saving new best policy, reward=1000.000!
[2023-09-21 12:46:11,051][26520] Saving new best policy, reward=1000.000!
[2023-09-21 12:46:14,084][26608] Updated weights for policy 1, policy_version 400 (0.0014)
[2023-09-21 12:46:14,085][26609] Updated weights for policy 0, policy_version 400 (0.0015)
[2023-09-21 12:46:16,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14199.4, 300 sec: 14199.4). Total num frames: 434176. Throughput: 0: 6597.9, 1: 7130.2. Samples: 417854. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 12:46:16,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:46:19,364][26609] Updated weights for policy 0, policy_version 480 (0.0013)
[2023-09-21 12:46:19,365][26608] Updated weights for policy 1, policy_version 480 (0.0014)
[2023-09-21 12:46:21,050][25893] Fps is (10 sec: 15564.7, 60 sec: 14511.5, 300 sec: 14511.5). Total num frames: 516096. Throughput: 0: 6999.8, 1: 6783.9. Samples: 488442. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:46:21,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:46:21,059][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000000504_258048.pth...
[2023-09-21 12:46:21,059][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000000504_258048.pth...
[2023-09-21 12:46:21,064][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000000064_32768.pth
[2023-09-21 12:46:21,064][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000000064_32768.pth
[2023-09-21 12:46:24,662][26609] Updated weights for policy 0, policy_version 560 (0.0014)
[2023-09-21 12:46:24,662][26608] Updated weights for policy 1, policy_version 560 (0.0015)
[2023-09-21 12:46:26,050][25893] Fps is (10 sec: 15564.5, 60 sec: 14540.7, 300 sec: 14540.7). Total num frames: 589824. Throughput: 0: 7271.1, 1: 7104.1. Samples: 581024. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 12:46:26,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:46:29,991][26608] Updated weights for policy 1, policy_version 640 (0.0013)
[2023-09-21 12:46:29,991][26609] Updated weights for policy 0, policy_version 640 (0.0015)
[2023-09-21 12:46:31,055][25893] Fps is (10 sec: 15556.5, 60 sec: 14743.8, 300 sec: 14743.8). Total num frames: 671744. Throughput: 0: 6963.4, 1: 6795.5. Samples: 625232. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 12:46:31,056][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:46:35,710][26608] Updated weights for policy 1, policy_version 720 (0.0014)
[2023-09-21 12:46:35,710][26609] Updated weights for policy 0, policy_version 720 (0.0015)
[2023-09-21 12:46:36,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14581.7, 300 sec: 14581.7). Total num frames: 737280. Throughput: 0: 7478.3, 1: 7465.4. Samples: 713930. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:46:36,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:46:36,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000000720_368640.pth...
[2023-09-21 12:46:36,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000000720_368640.pth...
[2023-09-21 12:46:36,068][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000000280_143360.pth
[2023-09-21 12:46:36,068][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000000280_143360.pth
[2023-09-21 12:46:41,050][25893] Fps is (10 sec: 13933.8, 60 sec: 14596.6, 300 sec: 14596.6). Total num frames: 811008. Throughput: 0: 7493.8, 1: 7510.1. Samples: 803426. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:46:41,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:46:41,184][26609] Updated weights for policy 0, policy_version 800 (0.0014)
[2023-09-21 12:46:41,184][26608] Updated weights for policy 1, policy_version 800 (0.0012)
[2023-09-21 12:46:46,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14609.0, 300 sec: 14609.0). Total num frames: 884736. Throughput: 0: 7482.1, 1: 7996.1. Samples: 869510. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:46:46,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:46:46,655][26609] Updated weights for policy 0, policy_version 880 (0.0014)
[2023-09-21 12:46:46,655][26608] Updated weights for policy 1, policy_version 880 (0.0013)
[2023-09-21 12:46:51,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14882.2, 300 sec: 14619.5). Total num frames: 958464. Throughput: 0: 7530.8, 1: 7529.0. Samples: 936018. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:46:51,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:46:51,056][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000000936_479232.pth...
[2023-09-21 12:46:51,056][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000000936_479232.pth...
[2023-09-21 12:46:51,061][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000000504_258048.pth
[2023-09-21 12:46:51,061][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000000504_258048.pth
[2023-09-21 12:46:52,443][26609] Updated weights for policy 0, policy_version 960 (0.0015)
[2023-09-21 12:46:52,443][26608] Updated weights for policy 1, policy_version 960 (0.0016)
[2023-09-21 12:46:56,050][25893] Fps is (10 sec: 15565.1, 60 sec: 15018.7, 300 sec: 14745.6). Total num frames: 1040384. Throughput: 0: 7502.5, 1: 7485.1. Samples: 1027210. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:46:56,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:46:57,555][26608] Updated weights for policy 1, policy_version 1040 (0.0013)
[2023-09-21 12:46:57,556][26609] Updated weights for policy 0, policy_version 1040 (0.0012)
[2023-09-21 12:47:01,050][25893] Fps is (10 sec: 14745.4, 60 sec: 15018.6, 300 sec: 14636.3). Total num frames: 1105920. Throughput: 0: 7530.1, 1: 7031.0. Samples: 1073104. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 12:47:01,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:47:03,386][26609] Updated weights for policy 0, policy_version 1120 (0.0017)
[2023-09-21 12:47:03,386][26608] Updated weights for policy 1, policy_version 1120 (0.0011)
[2023-09-21 12:47:06,050][25893] Fps is (10 sec: 13926.0, 60 sec: 14882.1, 300 sec: 14643.1). Total num frames: 1179648. Throughput: 0: 7442.1, 1: 7442.9. Samples: 1158270. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:47:06,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:47:06,058][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000001152_589824.pth...
[2023-09-21 12:47:06,058][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000001152_589824.pth...
[2023-09-21 12:47:06,066][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000000720_368640.pth
[2023-09-21 12:47:06,067][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000000720_368640.pth
[2023-09-21 12:47:08,667][26608] Updated weights for policy 1, policy_version 1200 (0.0013)
[2023-09-21 12:47:08,668][26609] Updated weights for policy 0, policy_version 1200 (0.0014)
[2023-09-21 12:47:11,050][25893] Fps is (10 sec: 15564.8, 60 sec: 15018.6, 300 sec: 14745.6). Total num frames: 1261568. Throughput: 0: 7463.8, 1: 7463.4. Samples: 1252748. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:47:11,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:47:14,334][26608] Updated weights for policy 1, policy_version 1280 (0.0011)
[2023-09-21 12:47:14,335][26609] Updated weights for policy 0, policy_version 1280 (0.0015)
[2023-09-21 12:47:16,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14882.1, 300 sec: 14654.5). Total num frames: 1327104. Throughput: 0: 7424.3, 1: 7449.0. Samples: 1294452. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 12:47:16,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:47:20,004][26608] Updated weights for policy 1, policy_version 1360 (0.0010)
[2023-09-21 12:47:20,005][26609] Updated weights for policy 0, policy_version 1360 (0.0013)
[2023-09-21 12:47:21,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14882.1, 300 sec: 14745.6). Total num frames: 1409024. Throughput: 0: 7416.4, 1: 7415.7. Samples: 1381376. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 12:47:21,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:47:21,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000001376_704512.pth...
[2023-09-21 12:47:21,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000001376_704512.pth...
[2023-09-21 12:47:21,064][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000000936_479232.pth
[2023-09-21 12:47:21,065][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000000936_479232.pth
[2023-09-21 12:47:25,344][26608] Updated weights for policy 1, policy_version 1440 (0.0014)
[2023-09-21 12:47:25,344][26609] Updated weights for policy 0, policy_version 1440 (0.0011)
[2023-09-21 12:47:26,050][25893] Fps is (10 sec: 15565.0, 60 sec: 14882.2, 300 sec: 14745.6). Total num frames: 1482752. Throughput: 0: 7430.9, 1: 7422.6. Samples: 1471832. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 12:47:26,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:47:30,693][26608] Updated weights for policy 1, policy_version 1520 (0.0013)
[2023-09-21 12:47:30,694][26609] Updated weights for policy 0, policy_version 1520 (0.0016)
[2023-09-21 12:47:31,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14746.9, 300 sec: 14745.6). Total num frames: 1556480. Throughput: 0: 7440.5, 1: 7450.4. Samples: 1539602. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:47:31,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:47:36,050][25893] Fps is (10 sec: 14745.3, 60 sec: 14882.1, 300 sec: 14745.6). Total num frames: 1630208. Throughput: 0: 7477.6, 1: 7477.3. Samples: 1608994. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:47:36,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:47:36,058][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000001592_815104.pth...
[2023-09-21 12:47:36,058][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000001592_815104.pth...
[2023-09-21 12:47:36,061][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000001152_589824.pth
[2023-09-21 12:47:36,066][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000001152_589824.pth
[2023-09-21 12:47:36,130][26608] Updated weights for policy 1, policy_version 1600 (0.0014)
[2023-09-21 12:47:36,131][26609] Updated weights for policy 0, policy_version 1600 (0.0015)
[2023-09-21 12:47:41,050][25893] Fps is (10 sec: 15565.0, 60 sec: 15018.7, 300 sec: 14816.8). Total num frames: 1712128. Throughput: 0: 7503.5, 1: 7506.6. Samples: 1702666. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:47:41,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:47:41,292][26609] Updated weights for policy 0, policy_version 1680 (0.0015)
[2023-09-21 12:47:41,292][26608] Updated weights for policy 1, policy_version 1680 (0.0015)
[2023-09-21 12:47:46,050][25893] Fps is (10 sec: 15565.1, 60 sec: 15018.7, 300 sec: 14813.9). Total num frames: 1785856. Throughput: 0: 7516.2, 1: 7995.7. Samples: 1771136. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 12:47:46,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:47:46,797][26609] Updated weights for policy 0, policy_version 1760 (0.0014)
[2023-09-21 12:47:46,797][26608] Updated weights for policy 1, policy_version 1760 (0.0014)
[2023-09-21 12:47:51,050][25893] Fps is (10 sec: 15564.7, 60 sec: 15155.2, 300 sec: 14876.7). Total num frames: 1867776. Throughput: 0: 7565.6, 1: 7569.3. Samples: 1839338. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:47:51,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:47:51,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000001824_933888.pth...
[2023-09-21 12:47:51,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000001824_933888.pth...
[2023-09-21 12:47:51,063][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000001376_704512.pth
[2023-09-21 12:47:51,064][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000001376_704512.pth
[2023-09-21 12:47:52,088][26608] Updated weights for policy 1, policy_version 1840 (0.0016)
[2023-09-21 12:47:52,088][26609] Updated weights for policy 0, policy_version 1840 (0.0014)
[2023-09-21 12:47:56,050][25893] Fps is (10 sec: 15564.9, 60 sec: 15018.7, 300 sec: 14871.6). Total num frames: 1941504. Throughput: 0: 7531.7, 1: 7520.8. Samples: 1930110. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 12:47:56,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:47:57,684][26609] Updated weights for policy 0, policy_version 1920 (0.0012)
[2023-09-21 12:47:57,685][26608] Updated weights for policy 1, policy_version 1920 (0.0014)
[2023-09-21 12:48:01,050][25893] Fps is (10 sec: 14745.5, 60 sec: 15155.2, 300 sec: 14866.9). Total num frames: 2015232. Throughput: 0: 7549.4, 1: 8056.5. Samples: 1996716. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 12:48:01,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:48:02,996][26609] Updated weights for policy 0, policy_version 2000 (0.0013)
[2023-09-21 12:48:02,996][26608] Updated weights for policy 1, policy_version 2000 (0.0014)
[2023-09-21 12:48:06,050][25893] Fps is (10 sec: 14745.3, 60 sec: 15155.2, 300 sec: 14862.6). Total num frames: 2088960. Throughput: 0: 7592.7, 1: 7601.5. Samples: 2065114. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:48:06,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:48:06,059][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000002040_1044480.pth...
[2023-09-21 12:48:06,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000002040_1044480.pth...
[2023-09-21 12:48:06,067][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000001592_815104.pth
[2023-09-21 12:48:06,067][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000001592_815104.pth
[2023-09-21 12:48:08,300][26608] Updated weights for policy 1, policy_version 2080 (0.0013)
[2023-09-21 12:48:08,301][26609] Updated weights for policy 0, policy_version 2080 (0.0014)
[2023-09-21 12:48:11,050][25893] Fps is (10 sec: 14745.6, 60 sec: 15018.7, 300 sec: 14858.6). Total num frames: 2162688. Throughput: 0: 7581.4, 1: 7594.3. Samples: 2154742. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 12:48:11,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:48:14,024][26609] Updated weights for policy 0, policy_version 2160 (0.0015)
[2023-09-21 12:48:14,024][26608] Updated weights for policy 1, policy_version 2160 (0.0014)
[2023-09-21 12:48:16,050][25893] Fps is (10 sec: 14745.6, 60 sec: 15155.2, 300 sec: 14854.8). Total num frames: 2236416. Throughput: 0: 7604.4, 1: 7538.5. Samples: 2221036. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:48:16,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:48:19,554][26609] Updated weights for policy 0, policy_version 2240 (0.0014)
[2023-09-21 12:48:19,556][26608] Updated weights for policy 1, policy_version 2240 (0.0017)
[2023-09-21 12:48:21,050][25893] Fps is (10 sec: 14745.6, 60 sec: 15018.7, 300 sec: 14851.3). Total num frames: 2310144. Throughput: 0: 7522.5, 1: 7535.3. Samples: 2286592. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 12:48:21,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:48:21,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000002256_1155072.pth...
[2023-09-21 12:48:21,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000002256_1155072.pth...
[2023-09-21 12:48:21,063][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000001824_933888.pth
[2023-09-21 12:48:21,068][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000001824_933888.pth
[2023-09-21 12:48:25,101][26608] Updated weights for policy 1, policy_version 2320 (0.0015)
[2023-09-21 12:48:25,101][26609] Updated weights for policy 0, policy_version 2320 (0.0015)
[2023-09-21 12:48:26,050][25893] Fps is (10 sec: 14745.7, 60 sec: 15018.6, 300 sec: 14848.0). Total num frames: 2383872. Throughput: 0: 7474.3, 1: 7490.8. Samples: 2376098. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:48:26,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:48:30,477][26608] Updated weights for policy 1, policy_version 2400 (0.0012)
[2023-09-21 12:48:30,478][26609] Updated weights for policy 0, policy_version 2400 (0.0012)
[2023-09-21 12:48:31,050][25893] Fps is (10 sec: 14745.9, 60 sec: 15018.7, 300 sec: 14844.9). Total num frames: 2457600. Throughput: 0: 7497.7, 1: 7006.0. Samples: 2423800. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 12:48:31,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:48:35,816][26608] Updated weights for policy 1, policy_version 2480 (0.0012)
[2023-09-21 12:48:35,817][26609] Updated weights for policy 0, policy_version 2480 (0.0014)
[2023-09-21 12:48:36,050][25893] Fps is (10 sec: 15564.6, 60 sec: 15155.2, 300 sec: 14890.1). Total num frames: 2539520. Throughput: 0: 7491.8, 1: 7504.2. Samples: 2514160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:48:36,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:48:36,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000002480_1269760.pth...
[2023-09-21 12:48:36,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000002480_1269760.pth...
[2023-09-21 12:48:36,064][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000002040_1044480.pth
[2023-09-21 12:48:36,066][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000002040_1044480.pth
[2023-09-21 12:48:41,050][25893] Fps is (10 sec: 15564.6, 60 sec: 15018.6, 300 sec: 14886.0). Total num frames: 2613248. Throughput: 0: 7490.7, 1: 7505.8. Samples: 2604956. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:48:41,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:48:41,220][26609] Updated weights for policy 0, policy_version 2560 (0.0015)
[2023-09-21 12:48:41,220][26608] Updated weights for policy 1, policy_version 2560 (0.0015)
[2023-09-21 12:48:46,050][25893] Fps is (10 sec: 14745.7, 60 sec: 15018.6, 300 sec: 14882.1). Total num frames: 2686976. Throughput: 0: 7512.6, 1: 7502.0. Samples: 2672374. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 12:48:46,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:48:46,634][26608] Updated weights for policy 1, policy_version 2640 (0.0012)
[2023-09-21 12:48:46,634][26609] Updated weights for policy 0, policy_version 2640 (0.0011)
[2023-09-21 12:48:51,050][25893] Fps is (10 sec: 14745.4, 60 sec: 14882.1, 300 sec: 14878.4). Total num frames: 2760704. Throughput: 0: 7494.7, 1: 7485.9. Samples: 2739240. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:48:51,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:48:51,058][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000002696_1380352.pth...
[2023-09-21 12:48:51,058][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000002696_1380352.pth...
[2023-09-21 12:48:51,061][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000002256_1155072.pth
[2023-09-21 12:48:51,066][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000002256_1155072.pth
[2023-09-21 12:48:52,201][26608] Updated weights for policy 1, policy_version 2720 (0.0013)
[2023-09-21 12:48:52,201][26609] Updated weights for policy 0, policy_version 2720 (0.0016)
[2023-09-21 12:48:56,050][25893] Fps is (10 sec: 15155.2, 60 sec: 14950.4, 300 sec: 14896.5). Total num frames: 2838528. Throughput: 0: 7500.9, 1: 7476.6. Samples: 2828730. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 12:48:56,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:48:57,637][26609] Updated weights for policy 0, policy_version 2800 (0.0016)
[2023-09-21 12:48:57,637][26608] Updated weights for policy 1, policy_version 2800 (0.0016)
[2023-09-21 12:49:01,050][25893] Fps is (10 sec: 15564.9, 60 sec: 15018.7, 300 sec: 14913.6). Total num frames: 2916352. Throughput: 0: 7495.1, 1: 7560.1. Samples: 2898520. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 12:49:01,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:49:02,907][26608] Updated weights for policy 1, policy_version 2880 (0.0015)
[2023-09-21 12:49:02,908][26609] Updated weights for policy 0, policy_version 2880 (0.0014)
[2023-09-21 12:49:06,050][25893] Fps is (10 sec: 15155.5, 60 sec: 15018.7, 300 sec: 14909.4). Total num frames: 2990080. Throughput: 0: 7566.7, 1: 7562.7. Samples: 2967408. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 12:49:06,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:49:06,088][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000002928_1499136.pth...
[2023-09-21 12:49:06,090][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000002928_1499136.pth...
[2023-09-21 12:49:06,091][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000002480_1269760.pth
[2023-09-21 12:49:06,093][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000002480_1269760.pth
[2023-09-21 12:49:08,273][26608] Updated weights for policy 1, policy_version 2960 (0.0014)
[2023-09-21 12:49:08,273][26609] Updated weights for policy 0, policy_version 2960 (0.0015)
[2023-09-21 12:49:11,050][25893] Fps is (10 sec: 14745.6, 60 sec: 15018.7, 300 sec: 14905.4). Total num frames: 3063808. Throughput: 0: 7558.8, 1: 7553.5. Samples: 3056154. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 12:49:11,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:49:13,901][26609] Updated weights for policy 0, policy_version 3040 (0.0015)
[2023-09-21 12:49:13,901][26608] Updated weights for policy 1, policy_version 3040 (0.0015)
[2023-09-21 12:49:16,050][25893] Fps is (10 sec: 15564.8, 60 sec: 15155.3, 300 sec: 14940.6). Total num frames: 3145728. Throughput: 0: 7542.6, 1: 8066.5. Samples: 3126210. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 12:49:16,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 12:49:18,949][26608] Updated weights for policy 1, policy_version 3120 (0.0014)
[2023-09-21 12:49:18,950][26609] Updated weights for policy 0, policy_version 3120 (0.0014)
[2023-09-21 12:49:21,050][25893] Fps is (10 sec: 15564.8, 60 sec: 15155.2, 300 sec: 14936.1). Total num frames: 3219456. Throughput: 0: 7602.5, 1: 7585.8. Samples: 3197634. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 12:49:21,051][25893] Avg episode reward: [(0, '53.530'), (1, '1000.000')]
[2023-09-21 12:49:21,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000003144_1609728.pth...
[2023-09-21 12:49:21,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000003144_1609728.pth...
[2023-09-21 12:49:21,066][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000002696_1380352.pth
[2023-09-21 12:49:21,069][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000002696_1380352.pth
[2023-09-21 12:49:24,429][26608] Updated weights for policy 1, policy_version 3200 (0.0014)
[2023-09-21 12:49:24,429][26609] Updated weights for policy 0, policy_version 3200 (0.0016)
[2023-09-21 12:49:26,049][25893] Fps is (10 sec: 15564.8, 60 sec: 15291.8, 300 sec: 14969.0). Total num frames: 3301376. Throughput: 0: 7600.5, 1: 7577.8. Samples: 3287978. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:49:26,050][25893] Avg episode reward: [(0, '12.130'), (1, '1000.000')]
[2023-09-21 12:49:29,883][26608] Updated weights for policy 1, policy_version 3280 (0.0012)
[2023-09-21 12:49:29,883][26609] Updated weights for policy 0, policy_version 3280 (0.0014)
[2023-09-21 12:49:31,050][25893] Fps is (10 sec: 15564.9, 60 sec: 15291.7, 300 sec: 14964.0). Total num frames: 3375104. Throughput: 0: 7596.4, 1: 7588.1. Samples: 3355678. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 12:49:31,051][25893] Avg episode reward: [(0, '11.390'), (1, '1000.000')]
[2023-09-21 12:49:35,426][26608] Updated weights for policy 1, policy_version 3360 (0.0013)
[2023-09-21 12:49:35,428][26609] Updated weights for policy 0, policy_version 3360 (0.0015)
[2023-09-21 12:49:36,050][25893] Fps is (10 sec: 13925.9, 60 sec: 15018.7, 300 sec: 14923.7). Total num frames: 3440640. Throughput: 0: 7595.6, 1: 7586.4. Samples: 3422428. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:49:36,051][25893] Avg episode reward: [(0, '10.270'), (1, '1000.000')]
[2023-09-21 12:49:36,069][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000003368_1724416.pth...
[2023-09-21 12:49:36,072][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000002928_1499136.pth
[2023-09-21 12:49:36,076][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000003368_1724416.pth...
[2023-09-21 12:49:36,081][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000002928_1499136.pth
[2023-09-21 12:49:41,050][25893] Fps is (10 sec: 13926.4, 60 sec: 15018.7, 300 sec: 14919.9). Total num frames: 3514368. Throughput: 0: 7510.4, 1: 7519.3. Samples: 3505066. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 12:49:41,051][25893] Avg episode reward: [(0, '9.300'), (1, '1000.000')]
[2023-09-21 12:49:41,296][26609] Updated weights for policy 0, policy_version 3440 (0.0015)
[2023-09-21 12:49:41,296][26608] Updated weights for policy 1, policy_version 3440 (0.0015)
[2023-09-21 12:49:46,050][25893] Fps is (10 sec: 14745.8, 60 sec: 15018.7, 300 sec: 14916.3). Total num frames: 3588096. Throughput: 0: 7503.2, 1: 7443.6. Samples: 3571122. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:49:46,051][25893] Avg episode reward: [(0, '9.310'), (1, '1000.000')]
[2023-09-21 12:49:47,048][26608] Updated weights for policy 1, policy_version 3520 (0.0012)
[2023-09-21 12:49:47,049][26609] Updated weights for policy 0, policy_version 3520 (0.0011)
[2023-09-21 12:49:51,050][25893] Fps is (10 sec: 14745.8, 60 sec: 15018.7, 300 sec: 14912.8). Total num frames: 3661824. Throughput: 0: 7430.7, 1: 7437.2. Samples: 3636462. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 12:49:51,050][25893] Avg episode reward: [(0, '10.060'), (1, '1000.000')]
[2023-09-21 12:49:51,059][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000003576_1830912.pth...
[2023-09-21 12:49:51,059][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000003576_1830912.pth...
[2023-09-21 12:49:51,066][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000003144_1609728.pth
[2023-09-21 12:49:51,071][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000003144_1609728.pth
[2023-09-21 12:49:52,775][26608] Updated weights for policy 1, policy_version 3600 (0.0014)
[2023-09-21 12:49:52,776][26609] Updated weights for policy 0, policy_version 3600 (0.0014)
[2023-09-21 12:49:56,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14950.4, 300 sec: 14909.4). Total num frames: 3735552. Throughput: 0: 7426.1, 1: 7415.4. Samples: 3724024. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:49:56,051][25893] Avg episode reward: [(0, '9.720'), (1, '1000.000')]
[2023-09-21 12:49:58,240][26608] Updated weights for policy 1, policy_version 3680 (0.0010)
[2023-09-21 12:49:58,241][26609] Updated weights for policy 0, policy_version 3680 (0.0015)
[2023-09-21 12:50:01,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14882.2, 300 sec: 14906.2). Total num frames: 3809280. Throughput: 0: 7385.0, 1: 7341.7. Samples: 3788912. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 12:50:01,051][25893] Avg episode reward: [(0, '9.800'), (1, '1000.000')]
[2023-09-21 12:50:03,922][26608] Updated weights for policy 1, policy_version 3760 (0.0016)
[2023-09-21 12:50:03,923][26609] Updated weights for policy 0, policy_version 3760 (0.0011)
[2023-09-21 12:50:06,050][25893] Fps is (10 sec: 13926.6, 60 sec: 14745.6, 300 sec: 14871.6). Total num frames: 3874816. Throughput: 0: 7280.0, 1: 7288.0. Samples: 3853194. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:50:06,050][25893] Avg episode reward: [(0, '9.950'), (1, '1000.000')]
[2023-09-21 12:50:06,058][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000003784_1937408.pth...
[2023-09-21 12:50:06,058][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000003784_1937408.pth...
[2023-09-21 12:50:06,065][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000003368_1724416.pth
[2023-09-21 12:50:06,065][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000003368_1724416.pth
[2023-09-21 12:50:09,272][26608] Updated weights for policy 1, policy_version 3840 (0.0013)
[2023-09-21 12:50:09,272][26609] Updated weights for policy 0, policy_version 3840 (0.0013)
[2023-09-21 12:50:11,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14882.1, 300 sec: 14900.2). Total num frames: 3956736. Throughput: 0: 7288.9, 1: 7298.2. Samples: 3944402. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:50:11,051][25893] Avg episode reward: [(0, '9.460'), (1, '1000.000')]
[2023-09-21 12:50:15,049][26608] Updated weights for policy 1, policy_version 3920 (0.0014)
[2023-09-21 12:50:15,050][26609] Updated weights for policy 0, policy_version 3920 (0.0012)
[2023-09-21 12:50:16,050][25893] Fps is (10 sec: 14745.3, 60 sec: 14609.0, 300 sec: 14867.0). Total num frames: 4022272. Throughput: 0: 7263.6, 1: 7253.1. Samples: 4008930. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 12:50:16,051][25893] Avg episode reward: [(0, '9.520'), (1, '1000.000')]
[2023-09-21 12:50:20,527][26609] Updated weights for policy 0, policy_version 4000 (0.0013)
[2023-09-21 12:50:20,527][26608] Updated weights for policy 1, policy_version 4000 (0.0015)
[2023-09-21 12:50:21,050][25893] Fps is (10 sec: 14745.1, 60 sec: 14745.5, 300 sec: 14894.5). Total num frames: 4104192. Throughput: 0: 7248.3, 1: 7257.7. Samples: 4075204. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 12:50:21,051][25893] Avg episode reward: [(0, '10.210'), (1, '1000.000')]
[2023-09-21 12:50:21,062][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000004008_2052096.pth...
[2023-09-21 12:50:21,062][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000004008_2052096.pth...
[2023-09-21 12:50:21,070][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000003576_1830912.pth
[2023-09-21 12:50:21,070][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000003576_1830912.pth
[2023-09-21 12:50:26,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14472.5, 300 sec: 14862.6). Total num frames: 4169728. Throughput: 0: 7323.0, 1: 7322.7. Samples: 4164118. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:50:26,050][25893] Avg episode reward: [(0, '11.340'), (1, '1000.000')]
[2023-09-21 12:50:26,179][26609] Updated weights for policy 0, policy_version 4080 (0.0012)
[2023-09-21 12:50:26,179][26608] Updated weights for policy 1, policy_version 4080 (0.0016)
[2023-09-21 12:50:31,050][25893] Fps is (10 sec: 13927.1, 60 sec: 14472.6, 300 sec: 14860.6). Total num frames: 4243456. Throughput: 0: 7278.7, 1: 7280.5. Samples: 4226284. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:50:31,050][25893] Avg episode reward: [(0, '15.210'), (1, '1000.000')]
[2023-09-21 12:50:31,936][26608] Updated weights for policy 1, policy_version 4160 (0.0014)
[2023-09-21 12:50:31,936][26609] Updated weights for policy 0, policy_version 4160 (0.0015)
[2023-09-21 12:50:36,050][25893] Fps is (10 sec: 13926.3, 60 sec: 14472.6, 300 sec: 14830.3). Total num frames: 4308992. Throughput: 0: 7261.5, 1: 7256.6. Samples: 4289774. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 12:50:36,050][25893] Avg episode reward: [(0, '16.140'), (1, '1000.000')]
[2023-09-21 12:50:36,100][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000004216_2158592.pth...
[2023-09-21 12:50:36,100][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000004216_2158592.pth...
[2023-09-21 12:50:36,103][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000003784_1937408.pth
[2023-09-21 12:50:36,108][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000003784_1937408.pth
[2023-09-21 12:50:37,751][26609] Updated weights for policy 0, policy_version 4240 (0.0014)
[2023-09-21 12:50:37,751][26608] Updated weights for policy 1, policy_version 4240 (0.0014)
[2023-09-21 12:50:41,050][25893] Fps is (10 sec: 13926.1, 60 sec: 14472.5, 300 sec: 14828.9). Total num frames: 4382720. Throughput: 0: 7239.7, 1: 7242.7. Samples: 4375734. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:50:41,051][25893] Avg episode reward: [(0, '18.060'), (1, '1000.000')]
[2023-09-21 12:50:43,581][26609] Updated weights for policy 0, policy_version 4320 (0.0015)
[2023-09-21 12:50:43,581][26608] Updated weights for policy 1, policy_version 4320 (0.0015)
[2023-09-21 12:50:46,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14472.6, 300 sec: 14884.5). Total num frames: 4456448. Throughput: 0: 7227.8, 1: 7205.8. Samples: 4438422. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 12:50:46,050][25893] Avg episode reward: [(0, '17.510'), (1, '1000.000')]
[2023-09-21 12:50:49,163][26609] Updated weights for policy 0, policy_version 4400 (0.0013)
[2023-09-21 12:50:49,163][26608] Updated weights for policy 1, policy_version 4400 (0.0014)
[2023-09-21 12:50:51,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14472.5, 300 sec: 14884.5). Total num frames: 4530176. Throughput: 0: 7236.7, 1: 7250.9. Samples: 4505138. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:50:51,050][25893] Avg episode reward: [(0, '21.020'), (1, '1000.000')]
[2023-09-21 12:50:51,059][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000004424_2265088.pth...
[2023-09-21 12:50:51,059][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000004424_2265088.pth...
[2023-09-21 12:50:51,063][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000004008_2052096.pth
[2023-09-21 12:50:51,065][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000004008_2052096.pth
[2023-09-21 12:50:54,979][26608] Updated weights for policy 1, policy_version 4480 (0.0015)
[2023-09-21 12:50:54,980][26609] Updated weights for policy 0, policy_version 4480 (0.0014)
[2023-09-21 12:50:56,050][25893] Fps is (10 sec: 13926.2, 60 sec: 14336.0, 300 sec: 14884.4). Total num frames: 4595712. Throughput: 0: 7172.1, 1: 7164.2. Samples: 4589538. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:50:56,051][25893] Avg episode reward: [(0, '22.710'), (1, '1000.000')]
[2023-09-21 12:51:00,296][26609] Updated weights for policy 0, policy_version 4560 (0.0014)
[2023-09-21 12:51:00,296][26608] Updated weights for policy 1, policy_version 4560 (0.0012)
[2023-09-21 12:51:01,049][25893] Fps is (10 sec: 14745.8, 60 sec: 14472.6, 300 sec: 14884.5). Total num frames: 4677632. Throughput: 0: 7209.6, 1: 7216.7. Samples: 4658112. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:51:01,050][25893] Avg episode reward: [(0, '26.370'), (1, '1000.000')]
[2023-09-21 12:51:06,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14472.5, 300 sec: 14856.7). Total num frames: 4743168. Throughput: 0: 7227.2, 1: 7221.6. Samples: 4725392. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 12:51:06,051][25893] Avg episode reward: [(0, '35.800'), (1, '1000.000')]
[2023-09-21 12:51:06,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000004640_2375680.pth...
[2023-09-21 12:51:06,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000004640_2375680.pth...
[2023-09-21 12:51:06,062][26608] Updated weights for policy 1, policy_version 4640 (0.0015)
[2023-09-21 12:51:06,062][26609] Updated weights for policy 0, policy_version 4640 (0.0015)
[2023-09-21 12:51:06,063][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000004216_2158592.pth
[2023-09-21 12:51:06,063][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000004216_2158592.pth
[2023-09-21 12:51:11,050][25893] Fps is (10 sec: 13926.0, 60 sec: 14336.0, 300 sec: 14856.7). Total num frames: 4816896. Throughput: 0: 7192.1, 1: 7184.3. Samples: 4811060. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 12:51:11,051][25893] Avg episode reward: [(0, '50.760'), (1, '490.810')]
[2023-09-21 12:51:11,715][26608] Updated weights for policy 1, policy_version 4720 (0.0014)
[2023-09-21 12:51:11,715][26609] Updated weights for policy 0, policy_version 4720 (0.0015)
[2023-09-21 12:51:16,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14472.5, 300 sec: 14828.9). Total num frames: 4890624. Throughput: 0: 7204.5, 1: 7192.6. Samples: 4874158. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:51:16,051][25893] Avg episode reward: [(0, '56.290'), (1, '506.940')]
[2023-09-21 12:51:17,471][26609] Updated weights for policy 0, policy_version 4800 (0.0014)
[2023-09-21 12:51:17,472][26608] Updated weights for policy 1, policy_version 4800 (0.0016)
[2023-09-21 12:51:21,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14336.1, 300 sec: 14828.9). Total num frames: 4964352. Throughput: 0: 7217.9, 1: 7232.6. Samples: 4940046. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 12:51:21,050][25893] Avg episode reward: [(0, '34.810'), (1, '686.070')]
[2023-09-21 12:51:21,058][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000004848_2482176.pth...
[2023-09-21 12:51:21,058][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000004848_2482176.pth...
[2023-09-21 12:51:21,061][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000004424_2265088.pth
[2023-09-21 12:51:21,066][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000004424_2265088.pth
[2023-09-21 12:51:23,251][26609] Updated weights for policy 0, policy_version 4880 (0.0010)
[2023-09-21 12:51:23,252][26608] Updated weights for policy 1, policy_version 4880 (0.0016)
[2023-09-21 12:51:26,050][25893] Fps is (10 sec: 13926.4, 60 sec: 14336.0, 300 sec: 14773.6). Total num frames: 5029888. Throughput: 0: 7200.5, 1: 7198.0. Samples: 5023662. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:51:26,051][25893] Avg episode reward: [(0, '76.550'), (1, '964.410')]
[2023-09-21 12:51:29,010][26609] Updated weights for policy 0, policy_version 4960 (0.0012)
[2023-09-21 12:51:29,010][26608] Updated weights for policy 1, policy_version 4960 (0.0014)
[2023-09-21 12:51:31,050][25893] Fps is (10 sec: 13926.5, 60 sec: 14336.0, 300 sec: 14801.2). Total num frames: 5103616. Throughput: 0: 7214.3, 1: 7232.1. Samples: 5088512. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:51:31,050][25893] Avg episode reward: [(0, '72.110'), (1, '1000.000')]
[2023-09-21 12:51:34,486][26609] Updated weights for policy 0, policy_version 5040 (0.0011)
[2023-09-21 12:51:34,487][26608] Updated weights for policy 1, policy_version 5040 (0.0012)
[2023-09-21 12:51:36,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14472.5, 300 sec: 14801.1). Total num frames: 5177344. Throughput: 0: 7226.9, 1: 7208.8. Samples: 5154748. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:51:36,051][25893] Avg episode reward: [(0, '94.210'), (1, '1000.000')]
[2023-09-21 12:51:36,085][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000005064_2592768.pth...
[2023-09-21 12:51:36,089][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000004640_2375680.pth
[2023-09-21 12:51:36,095][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000005064_2592768.pth...
[2023-09-21 12:51:36,102][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000004640_2375680.pth
[2023-09-21 12:51:40,076][26608] Updated weights for policy 1, policy_version 5120 (0.0012)
[2023-09-21 12:51:40,077][26609] Updated weights for policy 0, policy_version 5120 (0.0009)
[2023-09-21 12:51:41,050][25893] Fps is (10 sec: 14745.6, 60 sec: 14472.6, 300 sec: 14801.1). Total num frames: 5251072. Throughput: 0: 7261.7, 1: 7275.7. Samples: 5243718. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:51:41,050][25893] Avg episode reward: [(0, '167.790'), (1, '1000.000')]
[2023-09-21 12:51:45,743][26608] Updated weights for policy 1, policy_version 5200 (0.0013)
[2023-09-21 12:51:45,744][26609] Updated weights for policy 0, policy_version 5200 (0.0014)
[2023-09-21 12:51:46,050][25893] Fps is (10 sec: 14745.9, 60 sec: 14472.5, 300 sec: 14801.1). Total num frames: 5324800. Throughput: 0: 7200.6, 1: 7228.0. Samples: 5307402. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 12:51:46,050][25893] Avg episode reward: [(0, '196.460'), (1, '19.460')]
[2023-09-21 12:51:51,050][25893] Fps is (10 sec: 14745.6, 60 sec: 14472.5, 300 sec: 14773.4). Total num frames: 5398528. Throughput: 0: 7197.5, 1: 7212.2. Samples: 5373824. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 12:51:51,050][25893] Avg episode reward: [(0, '281.250'), (1, '11.540')]
[2023-09-21 12:51:51,056][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000005272_2699264.pth...
[2023-09-21 12:51:51,056][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000005272_2699264.pth...
[2023-09-21 12:51:51,059][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000004848_2482176.pth
[2023-09-21 12:51:51,059][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000004848_2482176.pth
[2023-09-21 12:51:51,458][26609] Updated weights for policy 0, policy_version 5280 (0.0014)
[2023-09-21 12:51:51,458][26608] Updated weights for policy 1, policy_version 5280 (0.0014)
[2023-09-21 12:51:56,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14609.1, 300 sec: 14801.1). Total num frames: 5472256. Throughput: 0: 7200.5, 1: 7209.8. Samples: 5459518. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 12:51:56,051][25893] Avg episode reward: [(0, '605.080'), (1, '13.420')]
[2023-09-21 12:51:57,098][26608] Updated weights for policy 1, policy_version 5360 (0.0013)
[2023-09-21 12:51:57,099][26609] Updated weights for policy 0, policy_version 5360 (0.0016)
[2023-09-21 12:52:01,050][25893] Fps is (10 sec: 13926.0, 60 sec: 14335.9, 300 sec: 14773.4). Total num frames: 5537792. Throughput: 0: 7226.4, 1: 6774.7. Samples: 5504210. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:52:01,051][25893] Avg episode reward: [(0, '858.530'), (1, '11.490')]
[2023-09-21 12:52:02,837][26609] Updated weights for policy 0, policy_version 5440 (0.0015)
[2023-09-21 12:52:02,838][26608] Updated weights for policy 1, policy_version 5440 (0.0015)
[2023-09-21 12:52:06,050][25893] Fps is (10 sec: 13926.3, 60 sec: 14472.6, 300 sec: 14745.6). Total num frames: 5611520. Throughput: 0: 7228.4, 1: 7212.3. Samples: 5589878. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:52:06,051][25893] Avg episode reward: [(0, '736.400'), (1, '10.190')]
[2023-09-21 12:52:06,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000005480_2805760.pth...
[2023-09-21 12:52:06,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000005480_2805760.pth...
[2023-09-21 12:52:06,066][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000005064_2592768.pth
[2023-09-21 12:52:06,071][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000005064_2592768.pth
[2023-09-21 12:52:08,586][26608] Updated weights for policy 1, policy_version 5520 (0.0014)
[2023-09-21 12:52:08,586][26609] Updated weights for policy 0, policy_version 5520 (0.0017)
[2023-09-21 12:52:11,050][25893] Fps is (10 sec: 14745.9, 60 sec: 14472.6, 300 sec: 14773.4). Total num frames: 5685248. Throughput: 0: 7238.9, 1: 7232.9. Samples: 5674894. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:52:11,051][25893] Avg episode reward: [(0, '758.270'), (1, '7.380')]
[2023-09-21 12:52:14,313][26608] Updated weights for policy 1, policy_version 5600 (0.0015)
[2023-09-21 12:52:14,313][26609] Updated weights for policy 0, policy_version 5600 (0.0012)
[2023-09-21 12:52:16,049][25893] Fps is (10 sec: 14746.0, 60 sec: 14472.6, 300 sec: 14745.6). Total num frames: 5758976. Throughput: 0: 7221.3, 1: 7230.6. Samples: 5738842. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:52:16,050][25893] Avg episode reward: [(0, '930.040'), (1, '7.370')]
[2023-09-21 12:52:19,914][26608] Updated weights for policy 1, policy_version 5680 (0.0013)
[2023-09-21 12:52:19,915][26609] Updated weights for policy 0, policy_version 5680 (0.0014)
[2023-09-21 12:52:21,050][25893] Fps is (10 sec: 14745.4, 60 sec: 14472.5, 300 sec: 14745.6). Total num frames: 5832704. Throughput: 0: 7223.0, 1: 7221.2. Samples: 5804738. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:52:21,051][25893] Avg episode reward: [(0, '1000.000'), (1, '7.280')]
[2023-09-21 12:52:21,065][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000005696_2916352.pth...
[2023-09-21 12:52:21,065][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000005696_2916352.pth...
[2023-09-21 12:52:21,071][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000005272_2699264.pth
[2023-09-21 12:52:21,073][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000005272_2699264.pth
[2023-09-21 12:52:25,599][26609] Updated weights for policy 0, policy_version 5760 (0.0012)
[2023-09-21 12:52:25,600][26608] Updated weights for policy 1, policy_version 5760 (0.0014)
[2023-09-21 12:52:26,050][25893] Fps is (10 sec: 13926.3, 60 sec: 14472.6, 300 sec: 14717.8). Total num frames: 5898240. Throughput: 0: 7206.1, 1: 7197.8. Samples: 5891892. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:52:26,050][25893] Avg episode reward: [(0, '1000.000'), (1, '7.930')]
[2023-09-21 12:52:31,050][25893] Fps is (10 sec: 13926.6, 60 sec: 14472.5, 300 sec: 14717.8). Total num frames: 5971968. Throughput: 0: 7204.7, 1: 7135.4. Samples: 5952704. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:52:31,051][25893] Avg episode reward: [(0, '1000.000'), (1, '8.480')]
[2023-09-21 12:52:31,547][26609] Updated weights for policy 0, policy_version 5840 (0.0015)
[2023-09-21 12:52:31,548][26608] Updated weights for policy 1, policy_version 5840 (0.0015)
[2023-09-21 12:52:36,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14472.6, 300 sec: 14690.1). Total num frames: 6045696. Throughput: 0: 7184.1, 1: 7178.6. Samples: 6020144. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:52:36,050][25893] Avg episode reward: [(0, '1000.000'), (1, '9.760')]
[2023-09-21 12:52:36,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000005904_3022848.pth...
[2023-09-21 12:52:36,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000005904_3022848.pth...
[2023-09-21 12:52:36,065][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000005480_2805760.pth
[2023-09-21 12:52:36,082][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000005480_2805760.pth
[2023-09-21 12:52:37,334][26609] Updated weights for policy 0, policy_version 5920 (0.0015)
[2023-09-21 12:52:37,335][26608] Updated weights for policy 1, policy_version 5920 (0.0013)
[2023-09-21 12:52:41,050][25893] Fps is (10 sec: 13926.4, 60 sec: 14336.0, 300 sec: 14662.3). Total num frames: 6111232. Throughput: 0: 7143.2, 1: 7159.4. Samples: 6103138. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 12:52:41,051][25893] Avg episode reward: [(0, '1000.000'), (1, '10.480')]
[2023-09-21 12:52:42,823][26609] Updated weights for policy 0, policy_version 6000 (0.0012)
[2023-09-21 12:52:42,823][26608] Updated weights for policy 1, policy_version 6000 (0.0016)
[2023-09-21 12:52:46,050][25893] Fps is (10 sec: 13926.5, 60 sec: 14336.0, 300 sec: 14634.5). Total num frames: 6184960. Throughput: 0: 7165.0, 1: 7150.2. Samples: 6148392. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:52:46,050][25893] Avg episode reward: [(0, '1000.000'), (1, '10.440')]
[2023-09-21 12:52:48,215][26609] Updated weights for policy 0, policy_version 6080 (0.0012)
[2023-09-21 12:52:48,216][26608] Updated weights for policy 1, policy_version 6080 (0.0014)
[2023-09-21 12:52:51,050][25893] Fps is (10 sec: 15564.9, 60 sec: 14472.5, 300 sec: 14662.3). Total num frames: 6266880. Throughput: 0: 7233.6, 1: 7225.8. Samples: 6240552. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:52:51,050][25893] Avg episode reward: [(0, '1000.000'), (1, '10.260')]
[2023-09-21 12:52:51,057][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000006120_3133440.pth...
[2023-09-21 12:52:51,057][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000006120_3133440.pth...
[2023-09-21 12:52:51,066][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000005696_2916352.pth
[2023-09-21 12:52:51,068][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000005696_2916352.pth
[2023-09-21 12:52:53,615][26609] Updated weights for policy 0, policy_version 6160 (0.0013)
[2023-09-21 12:52:53,615][26608] Updated weights for policy 1, policy_version 6160 (0.0015)
[2023-09-21 12:52:56,050][25893] Fps is (10 sec: 15564.8, 60 sec: 14472.6, 300 sec: 14662.3). Total num frames: 6340608. Throughput: 0: 7274.9, 1: 7283.0. Samples: 6329994. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:52:56,050][25893] Avg episode reward: [(0, '1000.000'), (1, '10.940')]
[2023-09-21 12:52:59,318][26609] Updated weights for policy 0, policy_version 6240 (0.0015)
[2023-09-21 12:52:59,319][26608] Updated weights for policy 1, policy_version 6240 (0.0010)
[2023-09-21 12:53:01,049][25893] Fps is (10 sec: 14745.7, 60 sec: 14609.2, 300 sec: 14662.3). Total num frames: 6414336. Throughput: 0: 7282.4, 1: 7304.7. Samples: 6395266. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:53:01,050][25893] Avg episode reward: [(0, '1000.000'), (1, '10.870')]
[2023-09-21 12:53:04,574][26609] Updated weights for policy 0, policy_version 6320 (0.0013)
[2023-09-21 12:53:04,574][26608] Updated weights for policy 1, policy_version 6320 (0.0013)
[2023-09-21 12:53:06,049][25893] Fps is (10 sec: 14745.7, 60 sec: 14609.1, 300 sec: 14662.3). Total num frames: 6488064. Throughput: 0: 7328.0, 1: 7336.2. Samples: 6464624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:53:06,050][25893] Avg episode reward: [(0, '1000.000'), (1, '10.810')]
[2023-09-21 12:53:06,056][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000006336_3244032.pth...
[2023-09-21 12:53:06,056][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000006336_3244032.pth...
[2023-09-21 12:53:06,061][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000005904_3022848.pth
[2023-09-21 12:53:06,062][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000005904_3022848.pth
[2023-09-21 12:53:09,849][26609] Updated weights for policy 0, policy_version 6400 (0.0014)
[2023-09-21 12:53:09,849][26608] Updated weights for policy 1, policy_version 6400 (0.0015)
[2023-09-21 12:53:11,050][25893] Fps is (10 sec: 15564.7, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 6569984. Throughput: 0: 7408.9, 1: 7401.0. Samples: 6558338. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 12:53:11,050][25893] Avg episode reward: [(0, '1000.000'), (1, '9.530')]
[2023-09-21 12:53:15,386][26608] Updated weights for policy 1, policy_version 6480 (0.0009)
[2023-09-21 12:53:15,386][26609] Updated weights for policy 0, policy_version 6480 (0.0015)
[2023-09-21 12:53:16,050][25893] Fps is (10 sec: 15564.6, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 6643712. Throughput: 0: 7438.2, 1: 7482.5. Samples: 6624136. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 12:53:16,051][25893] Avg episode reward: [(0, '1000.000'), (1, '9.750')]
[2023-09-21 12:53:20,954][26609] Updated weights for policy 0, policy_version 6560 (0.0014)
[2023-09-21 12:53:20,954][26608] Updated weights for policy 1, policy_version 6560 (0.0015)
[2023-09-21 12:53:21,050][25893] Fps is (10 sec: 14745.3, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 6717440. Throughput: 0: 7462.9, 1: 7453.7. Samples: 6691390. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:53:21,051][25893] Avg episode reward: [(0, '1000.000'), (1, '11.000')]
[2023-09-21 12:53:21,062][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000006560_3358720.pth...
[2023-09-21 12:53:21,062][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000006560_3358720.pth...
[2023-09-21 12:53:21,070][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000006120_3133440.pth
[2023-09-21 12:53:21,071][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000006120_3133440.pth
[2023-09-21 12:53:26,049][25893] Fps is (10 sec: 14745.9, 60 sec: 14882.2, 300 sec: 14690.1). Total num frames: 6791168. Throughput: 0: 7529.0, 1: 7513.0. Samples: 6780024. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:53:26,050][25893] Avg episode reward: [(0, '1000.000'), (1, '10.330')]
[2023-09-21 12:53:26,540][26609] Updated weights for policy 0, policy_version 6640 (0.0016)
[2023-09-21 12:53:26,540][26608] Updated weights for policy 1, policy_version 6640 (0.0013)
[2023-09-21 12:53:31,050][25893] Fps is (10 sec: 13926.5, 60 sec: 14745.6, 300 sec: 14634.5). Total num frames: 6856704. Throughput: 0: 7480.9, 1: 7962.2. Samples: 6843330. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:53:31,051][25893] Avg episode reward: [(0, '1000.000'), (1, '11.750')]
[2023-09-21 12:53:32,298][26609] Updated weights for policy 0, policy_version 6720 (0.0013)
[2023-09-21 12:53:32,298][26608] Updated weights for policy 1, policy_version 6720 (0.0015)
[2023-09-21 12:53:36,050][25893] Fps is (10 sec: 13925.9, 60 sec: 14745.6, 300 sec: 14634.5). Total num frames: 6930432. Throughput: 0: 7392.1, 1: 7408.7. Samples: 6906594. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:53:36,051][25893] Avg episode reward: [(0, '1000.000'), (1, '11.530')]
[2023-09-21 12:53:36,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000006768_3465216.pth...
[2023-09-21 12:53:36,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000006768_3465216.pth...
[2023-09-21 12:53:36,067][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000006336_3244032.pth
[2023-09-21 12:53:36,068][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000006336_3244032.pth
[2023-09-21 12:53:37,850][26609] Updated weights for policy 0, policy_version 6800 (0.0012)
[2023-09-21 12:53:37,851][26608] Updated weights for policy 1, policy_version 6800 (0.0014)
[2023-09-21 12:53:41,050][25893] Fps is (10 sec: 14745.6, 60 sec: 14882.1, 300 sec: 14634.5). Total num frames: 7004160. Throughput: 0: 7418.0, 1: 7413.0. Samples: 6997392. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:53:41,051][25893] Avg episode reward: [(0, '1000.000'), (1, '11.190')]
[2023-09-21 12:53:43,423][26609] Updated weights for policy 0, policy_version 6880 (0.0013)
[2023-09-21 12:53:43,424][26608] Updated weights for policy 1, policy_version 6880 (0.0014)
[2023-09-21 12:53:46,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14882.1, 300 sec: 14634.5). Total num frames: 7077888. Throughput: 0: 7429.1, 1: 7407.0. Samples: 7062896. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:53:46,051][25893] Avg episode reward: [(0, '1000.000'), (1, '13.000')]
[2023-09-21 12:53:48,981][26608] Updated weights for policy 1, policy_version 6960 (0.0013)
[2023-09-21 12:53:48,982][26609] Updated weights for policy 0, policy_version 6960 (0.0015)
[2023-09-21 12:53:51,049][25893] Fps is (10 sec: 14745.9, 60 sec: 14745.6, 300 sec: 14620.6). Total num frames: 7151616. Throughput: 0: 7382.3, 1: 7377.8. Samples: 7128830. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:53:51,050][25893] Avg episode reward: [(0, '1000.000'), (1, '11.490')]
[2023-09-21 12:53:51,057][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000006984_3575808.pth...
[2023-09-21 12:53:51,057][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000006984_3575808.pth...
[2023-09-21 12:53:51,062][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000006560_3358720.pth
[2023-09-21 12:53:51,064][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000006560_3358720.pth
[2023-09-21 12:53:54,663][26608] Updated weights for policy 1, policy_version 7040 (0.0014)
[2023-09-21 12:53:54,663][26609] Updated weights for policy 0, policy_version 7040 (0.0017)
[2023-09-21 12:53:56,050][25893] Fps is (10 sec: 14745.6, 60 sec: 14745.6, 300 sec: 14606.8). Total num frames: 7225344. Throughput: 0: 7303.5, 1: 7323.6. Samples: 7216556. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 12:53:56,051][25893] Avg episode reward: [(0, '1000.000'), (1, '11.690')]
[2023-09-21 12:54:00,104][26609] Updated weights for policy 0, policy_version 7120 (0.0015)
[2023-09-21 12:54:00,105][26608] Updated weights for policy 1, policy_version 7120 (0.0016)
[2023-09-21 12:54:01,050][25893] Fps is (10 sec: 14745.3, 60 sec: 14745.6, 300 sec: 14606.7). Total num frames: 7299072. Throughput: 0: 7313.1, 1: 7314.3. Samples: 7282366. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:54:01,051][25893] Avg episode reward: [(0, '1000.000'), (1, '12.350')]
[2023-09-21 12:54:06,029][26608] Updated weights for policy 1, policy_version 7200 (0.0012)
[2023-09-21 12:54:06,029][26609] Updated weights for policy 0, policy_version 7200 (0.0014)
[2023-09-21 12:54:06,054][25893] Fps is (10 sec: 14739.2, 60 sec: 14744.5, 300 sec: 14606.5). Total num frames: 7372800. Throughput: 0: 7282.1, 1: 7294.8. Samples: 7347410. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 12:54:06,055][25893] Avg episode reward: [(0, '1000.000'), (1, '11.690')]
[2023-09-21 12:54:06,059][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000007200_3686400.pth...
[2023-09-21 12:54:06,059][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000007200_3686400.pth...
[2023-09-21 12:54:06,062][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000006768_3465216.pth
[2023-09-21 12:54:06,062][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000006768_3465216.pth
[2023-09-21 12:54:11,050][25893] Fps is (10 sec: 13926.4, 60 sec: 14472.5, 300 sec: 14551.2). Total num frames: 7438336. Throughput: 0: 7241.4, 1: 7239.8. Samples: 7431682. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 12:54:11,051][25893] Avg episode reward: [(0, '1000.000'), (1, '10.100')]
[2023-09-21 12:54:11,675][26608] Updated weights for policy 1, policy_version 7280 (0.0014)
[2023-09-21 12:54:11,675][26609] Updated weights for policy 0, policy_version 7280 (0.0014)
[2023-09-21 12:54:16,049][25893] Fps is (10 sec: 13932.7, 60 sec: 14472.6, 300 sec: 14551.2). Total num frames: 7512064. Throughput: 0: 7289.8, 1: 7296.7. Samples: 7499722. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:54:16,050][25893] Avg episode reward: [(0, '1000.000'), (1, '10.200')]
[2023-09-21 12:54:17,156][26608] Updated weights for policy 1, policy_version 7360 (0.0013)
[2023-09-21 12:54:17,156][26609] Updated weights for policy 0, policy_version 7360 (0.0016)
[2023-09-21 12:54:21,050][25893] Fps is (10 sec: 15564.8, 60 sec: 14609.1, 300 sec: 14551.2). Total num frames: 7593984. Throughput: 0: 7356.0, 1: 7343.9. Samples: 7568090. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:54:21,051][25893] Avg episode reward: [(0, '1000.000'), (1, '10.580')]
[2023-09-21 12:54:21,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000007416_3796992.pth...
[2023-09-21 12:54:21,062][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000007416_3796992.pth...
[2023-09-21 12:54:21,066][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000006984_3575808.pth
[2023-09-21 12:54:21,067][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000006984_3575808.pth
[2023-09-21 12:54:22,458][26609] Updated weights for policy 0, policy_version 7440 (0.0014)
[2023-09-21 12:54:22,458][26608] Updated weights for policy 1, policy_version 7440 (0.0013)
[2023-09-21 12:54:26,050][25893] Fps is (10 sec: 15564.5, 60 sec: 14609.0, 300 sec: 14551.2). Total num frames: 7667712. Throughput: 0: 7347.8, 1: 7366.4. Samples: 7659528. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:54:26,051][25893] Avg episode reward: [(0, '1000.000'), (1, '11.640')]
[2023-09-21 12:54:27,937][26608] Updated weights for policy 1, policy_version 7520 (0.0014)
[2023-09-21 12:54:27,937][26609] Updated weights for policy 0, policy_version 7520 (0.0014)
[2023-09-21 12:54:31,049][25893] Fps is (10 sec: 14745.9, 60 sec: 14745.6, 300 sec: 14579.0). Total num frames: 7741440. Throughput: 0: 7364.7, 1: 7363.4. Samples: 7725662. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:54:31,050][25893] Avg episode reward: [(0, '1000.000'), (1, '12.270')]
[2023-09-21 12:54:33,446][26609] Updated weights for policy 0, policy_version 7600 (0.0014)
[2023-09-21 12:54:33,446][26608] Updated weights for policy 1, policy_version 7600 (0.0012)
[2023-09-21 12:54:36,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14745.6, 300 sec: 14579.0). Total num frames: 7815168. Throughput: 0: 7350.7, 1: 7367.0. Samples: 7791128. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:54:36,050][25893] Avg episode reward: [(0, '1000.000'), (1, '13.040')]
[2023-09-21 12:54:36,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000007632_3907584.pth...
[2023-09-21 12:54:36,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000007632_3907584.pth...
[2023-09-21 12:54:36,063][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000007200_3686400.pth
[2023-09-21 12:54:36,068][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000007200_3686400.pth
[2023-09-21 12:54:39,382][26608] Updated weights for policy 1, policy_version 7680 (0.0015)
[2023-09-21 12:54:39,382][26609] Updated weights for policy 0, policy_version 7680 (0.0015)
[2023-09-21 12:54:41,050][25893] Fps is (10 sec: 13926.2, 60 sec: 14609.1, 300 sec: 14551.2). Total num frames: 7880704. Throughput: 0: 7326.2, 1: 7316.1. Samples: 7875460. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 12:54:41,050][25893] Avg episode reward: [(0, '1000.000'), (1, '11.710')]
[2023-09-21 12:54:44,838][26609] Updated weights for policy 0, policy_version 7760 (0.0013)
[2023-09-21 12:54:44,839][26608] Updated weights for policy 1, policy_version 7760 (0.0013)
[2023-09-21 12:54:46,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14745.6, 300 sec: 14579.0). Total num frames: 7962624. Throughput: 0: 7343.4, 1: 7319.9. Samples: 7942210. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 12:54:46,050][25893] Avg episode reward: [(0, '1000.000'), (1, '11.590')]
[2023-09-21 12:54:50,474][26609] Updated weights for policy 0, policy_version 7840 (0.0015)
[2023-09-21 12:54:50,475][26608] Updated weights for policy 1, policy_version 7840 (0.0015)
[2023-09-21 12:54:51,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14609.1, 300 sec: 14551.2). Total num frames: 8028160. Throughput: 0: 7359.1, 1: 7351.1. Samples: 8009304. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 12:54:51,050][25893] Avg episode reward: [(0, '1000.000'), (1, '12.830')]
[2023-09-21 12:54:51,078][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000007848_4018176.pth...
[2023-09-21 12:54:51,081][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000007416_3796992.pth
[2023-09-21 12:54:51,096][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000007848_4018176.pth...
[2023-09-21 12:54:51,099][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000007416_3796992.pth
[2023-09-21 12:54:56,050][25893] Fps is (10 sec: 13926.3, 60 sec: 14609.1, 300 sec: 14551.2). Total num frames: 8101888. Throughput: 0: 7360.3, 1: 7369.6. Samples: 8094524. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:54:56,051][25893] Avg episode reward: [(0, '1000.000'), (1, '10.120')]
[2023-09-21 12:54:56,239][26609] Updated weights for policy 0, policy_version 7920 (0.0015)
[2023-09-21 12:54:56,239][26608] Updated weights for policy 1, policy_version 7920 (0.0015)
[2023-09-21 12:55:01,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14609.1, 300 sec: 14579.0). Total num frames: 8175616. Throughput: 0: 7350.4, 1: 7320.2. Samples: 8159900. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:55:01,050][25893] Avg episode reward: [(0, '1000.000'), (1, '12.680')]
[2023-09-21 12:55:01,918][26609] Updated weights for policy 0, policy_version 8000 (0.0017)
[2023-09-21 12:55:01,918][26608] Updated weights for policy 1, policy_version 8000 (0.0015)
[2023-09-21 12:55:06,050][25893] Fps is (10 sec: 14745.4, 60 sec: 14610.1, 300 sec: 14551.2). Total num frames: 8249344. Throughput: 0: 7310.5, 1: 7311.4. Samples: 8226074. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:55:06,051][25893] Avg episode reward: [(0, '1000.000'), (1, '12.970')]
[2023-09-21 12:55:06,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000008056_4124672.pth...
[2023-09-21 12:55:06,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000008056_4124672.pth...
[2023-09-21 12:55:06,070][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000007632_3907584.pth
[2023-09-21 12:55:06,073][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000007632_3907584.pth
[2023-09-21 12:55:07,374][26609] Updated weights for policy 0, policy_version 8080 (0.0010)
[2023-09-21 12:55:07,375][26608] Updated weights for policy 1, policy_version 8080 (0.0011)
[2023-09-21 12:55:11,050][25893] Fps is (10 sec: 15564.6, 60 sec: 14882.2, 300 sec: 14606.8). Total num frames: 8331264. Throughput: 0: 7311.6, 1: 7289.3. Samples: 8316570. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:55:11,051][25893] Avg episode reward: [(0, '1000.000'), (1, '13.820')]
[2023-09-21 12:55:12,688][26608] Updated weights for policy 1, policy_version 8160 (0.0014)
[2023-09-21 12:55:12,688][26609] Updated weights for policy 0, policy_version 8160 (0.0010)
[2023-09-21 12:55:16,050][25893] Fps is (10 sec: 15565.1, 60 sec: 14882.1, 300 sec: 14579.0). Total num frames: 8404992. Throughput: 0: 7323.8, 1: 7318.0. Samples: 8384548. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:55:16,051][25893] Avg episode reward: [(0, '1000.000'), (1, '13.090')]
[2023-09-21 12:55:18,443][26608] Updated weights for policy 1, policy_version 8240 (0.0013)
[2023-09-21 12:55:18,444][26609] Updated weights for policy 0, policy_version 8240 (0.0015)
[2023-09-21 12:55:21,050][25893] Fps is (10 sec: 13926.2, 60 sec: 14609.0, 300 sec: 14579.0). Total num frames: 8470528. Throughput: 0: 7281.3, 1: 7281.7. Samples: 8446466. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 12:55:21,051][25893] Avg episode reward: [(0, '1000.000'), (1, '14.440')]
[2023-09-21 12:55:21,062][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000008272_4235264.pth...
[2023-09-21 12:55:21,062][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000008272_4235264.pth...
[2023-09-21 12:55:21,069][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000007848_4018176.pth
[2023-09-21 12:55:21,070][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000007848_4018176.pth
[2023-09-21 12:55:24,401][26608] Updated weights for policy 1, policy_version 8320 (0.0015)
[2023-09-21 12:55:24,401][26609] Updated weights for policy 0, policy_version 8320 (0.0014)
[2023-09-21 12:55:26,050][25893] Fps is (10 sec: 13926.3, 60 sec: 14609.1, 300 sec: 14579.0). Total num frames: 8544256. Throughput: 0: 7277.5, 1: 7274.7. Samples: 8530310. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:55:26,051][25893] Avg episode reward: [(0, '1000.000'), (1, '14.860')]
[2023-09-21 12:55:29,841][26608] Updated weights for policy 1, policy_version 8400 (0.0016)
[2023-09-21 12:55:29,841][26609] Updated weights for policy 0, policy_version 8400 (0.0013)
[2023-09-21 12:55:31,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14609.0, 300 sec: 14606.8). Total num frames: 8617984. Throughput: 0: 7293.1, 1: 7302.9. Samples: 8599034. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:55:31,051][25893] Avg episode reward: [(0, '1000.000'), (1, '14.070')]
[2023-09-21 12:55:35,465][26609] Updated weights for policy 0, policy_version 8480 (0.0016)
[2023-09-21 12:55:35,465][26608] Updated weights for policy 1, policy_version 8480 (0.0016)
[2023-09-21 12:55:36,049][25893] Fps is (10 sec: 14745.9, 60 sec: 14609.1, 300 sec: 14606.8). Total num frames: 8691712. Throughput: 0: 7275.3, 1: 7276.9. Samples: 8664154. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:55:36,050][25893] Avg episode reward: [(0, '1000.000'), (1, '14.330')]
[2023-09-21 12:55:36,057][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000008488_4345856.pth...
[2023-09-21 12:55:36,058][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000008488_4345856.pth...
[2023-09-21 12:55:36,061][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000008056_4124672.pth
[2023-09-21 12:55:36,067][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000008056_4124672.pth
[2023-09-21 12:55:40,873][26608] Updated weights for policy 1, policy_version 8560 (0.0012)
[2023-09-21 12:55:40,874][26609] Updated weights for policy 0, policy_version 8560 (0.0015)
[2023-09-21 12:55:41,049][25893] Fps is (10 sec: 14745.9, 60 sec: 14745.6, 300 sec: 14606.8). Total num frames: 8765440. Throughput: 0: 7343.7, 1: 7326.7. Samples: 8754692. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:55:41,050][25893] Avg episode reward: [(0, '1000.000'), (1, '14.520')]
[2023-09-21 12:55:46,050][25893] Fps is (10 sec: 14745.2, 60 sec: 14609.0, 300 sec: 14606.7). Total num frames: 8839168. Throughput: 0: 7331.2, 1: 7383.1. Samples: 8822046. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:55:46,051][25893] Avg episode reward: [(0, '942.680'), (1, '14.440')]
[2023-09-21 12:55:46,358][26609] Updated weights for policy 0, policy_version 8640 (0.0012)
[2023-09-21 12:55:46,358][26608] Updated weights for policy 1, policy_version 8640 (0.0015)
[2023-09-21 12:55:48,065][26519] KL-divergence is very high: 107.6700
[2023-09-21 12:55:48,679][26519] KL-divergence is very high: 148.7846
[2023-09-21 12:55:48,684][26519] KL-divergence is very high: 207.9661
[2023-09-21 12:55:48,688][26519] KL-divergence is very high: 237.3876
[2023-09-21 12:55:51,050][25893] Fps is (10 sec: 14745.1, 60 sec: 14745.5, 300 sec: 14634.5). Total num frames: 8912896. Throughput: 0: 7333.7, 1: 7336.8. Samples: 8886244. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:55:51,051][25893] Avg episode reward: [(0, '130.670'), (1, '12.020')]
[2023-09-21 12:55:51,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000008704_4456448.pth...
[2023-09-21 12:55:51,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000008704_4456448.pth...
[2023-09-21 12:55:51,065][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000008272_4235264.pth
[2023-09-21 12:55:51,067][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000008272_4235264.pth
[2023-09-21 12:55:52,216][26609] Updated weights for policy 0, policy_version 8720 (0.0010)
[2023-09-21 12:55:52,217][26608] Updated weights for policy 1, policy_version 8720 (0.0015)
[2023-09-21 12:55:56,049][25893] Fps is (10 sec: 14746.0, 60 sec: 14745.7, 300 sec: 14606.8). Total num frames: 8986624. Throughput: 0: 7329.7, 1: 7337.9. Samples: 8976608. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 12:55:56,050][25893] Avg episode reward: [(0, '59.630'), (1, '14.370')]
[2023-09-21 12:55:57,660][26608] Updated weights for policy 1, policy_version 8800 (0.0014)
[2023-09-21 12:55:57,661][26609] Updated weights for policy 0, policy_version 8800 (0.0014)
[2023-09-21 12:56:01,050][25893] Fps is (10 sec: 14746.0, 60 sec: 14745.6, 300 sec: 14634.5). Total num frames: 9060352. Throughput: 0: 7276.9, 1: 7281.8. Samples: 9039684. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 12:56:01,050][25893] Avg episode reward: [(0, '35.240'), (1, '14.720')]
[2023-09-21 12:56:03,252][26608] Updated weights for policy 1, policy_version 8880 (0.0011)
[2023-09-21 12:56:03,253][26609] Updated weights for policy 0, policy_version 8880 (0.0016)
[2023-09-21 12:56:06,049][25893] Fps is (10 sec: 13926.3, 60 sec: 14609.2, 300 sec: 14606.8). Total num frames: 9125888. Throughput: 0: 7338.5, 1: 7317.4. Samples: 9105974. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 12:56:06,050][25893] Avg episode reward: [(0, '25.370'), (1, '13.720')]
[2023-09-21 12:56:06,055][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000008912_4562944.pth...
[2023-09-21 12:56:06,055][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000008912_4562944.pth...
[2023-09-21 12:56:06,062][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000008488_4345856.pth
[2023-09-21 12:56:06,062][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000008488_4345856.pth
[2023-09-21 12:56:08,878][26608] Updated weights for policy 1, policy_version 8960 (0.0013)
[2023-09-21 12:56:08,878][26609] Updated weights for policy 0, policy_version 8960 (0.0015)
[2023-09-21 12:56:11,050][25893] Fps is (10 sec: 13926.2, 60 sec: 14472.5, 300 sec: 14606.8). Total num frames: 9199616. Throughput: 0: 7376.3, 1: 7372.1. Samples: 9193988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:56:11,051][25893] Avg episode reward: [(0, '17.710'), (1, '13.180')]
[2023-09-21 12:56:14,437][26608] Updated weights for policy 1, policy_version 9040 (0.0011)
[2023-09-21 12:56:14,438][26609] Updated weights for policy 0, policy_version 9040 (0.0014)
[2023-09-21 12:56:16,049][25893] Fps is (10 sec: 14745.5, 60 sec: 14472.6, 300 sec: 14606.8). Total num frames: 9273344. Throughput: 0: 7349.0, 1: 7333.0. Samples: 9259724. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:56:16,050][25893] Avg episode reward: [(0, '15.040'), (1, '11.290')]
[2023-09-21 12:56:20,058][26609] Updated weights for policy 0, policy_version 9120 (0.0013)
[2023-09-21 12:56:20,058][26608] Updated weights for policy 1, policy_version 9120 (0.0015)
[2023-09-21 12:56:21,050][25893] Fps is (10 sec: 14745.6, 60 sec: 14609.1, 300 sec: 14634.5). Total num frames: 9347072. Throughput: 0: 7336.0, 1: 7328.4. Samples: 9324054. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:56:21,050][25893] Avg episode reward: [(0, '17.000'), (1, '13.210')]
[2023-09-21 12:56:21,092][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000009136_4677632.pth...
[2023-09-21 12:56:21,094][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000009136_4677632.pth...
[2023-09-21 12:56:21,095][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000008704_4456448.pth
[2023-09-21 12:56:21,097][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000008704_4456448.pth
[2023-09-21 12:56:25,606][26608] Updated weights for policy 1, policy_version 9200 (0.0016)
[2023-09-21 12:56:25,606][26609] Updated weights for policy 0, policy_version 9200 (0.0013)
[2023-09-21 12:56:26,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14609.1, 300 sec: 14634.5). Total num frames: 9420800. Throughput: 0: 7311.2, 1: 7327.2. Samples: 9413420. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:56:26,050][25893] Avg episode reward: [(0, '26.280'), (1, '14.100')]
[2023-09-21 12:56:30,823][26609] Updated weights for policy 0, policy_version 9280 (0.0012)
[2023-09-21 12:56:30,824][26608] Updated weights for policy 1, policy_version 9280 (0.0013)
[2023-09-21 12:56:31,050][25893] Fps is (10 sec: 15564.5, 60 sec: 14745.6, 300 sec: 14662.3). Total num frames: 9502720. Throughput: 0: 7345.7, 1: 7366.8. Samples: 9484110. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:56:31,051][25893] Avg episode reward: [(0, '27.990'), (1, '13.260')]
[2023-09-21 12:56:36,050][25893] Fps is (10 sec: 15564.2, 60 sec: 14745.5, 300 sec: 14662.3). Total num frames: 9576448. Throughput: 0: 7372.8, 1: 7364.8. Samples: 9549436. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:56:36,051][25893] Avg episode reward: [(0, '46.770'), (1, '13.570')]
[2023-09-21 12:56:36,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000009352_4788224.pth...
[2023-09-21 12:56:36,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000009352_4788224.pth...
[2023-09-21 12:56:36,068][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000008912_4562944.pth
[2023-09-21 12:56:36,069][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000008912_4562944.pth
[2023-09-21 12:56:36,462][26608] Updated weights for policy 1, policy_version 9360 (0.0015)
[2023-09-21 12:56:36,462][26609] Updated weights for policy 0, policy_version 9360 (0.0014)
[2023-09-21 12:56:41,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14745.5, 300 sec: 14662.3). Total num frames: 9650176. Throughput: 0: 7364.4, 1: 7355.8. Samples: 9639022. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 12:56:41,051][25893] Avg episode reward: [(0, '47.210'), (1, '13.650')]
[2023-09-21 12:56:42,036][26608] Updated weights for policy 1, policy_version 9440 (0.0015)
[2023-09-21 12:56:42,037][26609] Updated weights for policy 0, policy_version 9440 (0.0015)
[2023-09-21 12:56:46,050][25893] Fps is (10 sec: 14746.0, 60 sec: 14745.6, 300 sec: 14662.3). Total num frames: 9723904. Throughput: 0: 7397.6, 1: 7431.9. Samples: 9707014. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 12:56:46,051][25893] Avg episode reward: [(0, '49.330'), (1, '15.210')]
[2023-09-21 12:56:47,347][26608] Updated weights for policy 1, policy_version 9520 (0.0012)
[2023-09-21 12:56:47,347][26609] Updated weights for policy 0, policy_version 9520 (0.0012)
[2023-09-21 12:56:51,050][25893] Fps is (10 sec: 14745.6, 60 sec: 14745.6, 300 sec: 14662.3). Total num frames: 9797632. Throughput: 0: 7409.2, 1: 7427.8. Samples: 9773642. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:56:51,051][25893] Avg episode reward: [(0, '54.790'), (1, '15.210')]
[2023-09-21 12:56:51,058][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000009568_4898816.pth...
[2023-09-21 12:56:51,058][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000009568_4898816.pth...
[2023-09-21 12:56:51,062][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000009136_4677632.pth
[2023-09-21 12:56:51,063][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000009136_4677632.pth
[2023-09-21 12:56:52,807][26609] Updated weights for policy 0, policy_version 9600 (0.0013)
[2023-09-21 12:56:52,807][26608] Updated weights for policy 1, policy_version 9600 (0.0016)
[2023-09-21 12:56:56,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 9871360. Throughput: 0: 7473.3, 1: 7471.0. Samples: 9866478. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:56:56,050][25893] Avg episode reward: [(0, '57.750'), (1, '14.670')]
[2023-09-21 12:56:58,187][26609] Updated weights for policy 0, policy_version 9680 (0.0011)
[2023-09-21 12:56:58,188][26608] Updated weights for policy 1, policy_version 9680 (0.0015)
[2023-09-21 12:57:01,050][25893] Fps is (10 sec: 14745.6, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 9945088. Throughput: 0: 7476.2, 1: 7010.3. Samples: 9911618. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:57:01,051][25893] Avg episode reward: [(0, '71.410'), (1, '15.330')]
[2023-09-21 12:57:04,147][26608] Updated weights for policy 1, policy_version 9760 (0.0016)
[2023-09-21 12:57:04,147][26609] Updated weights for policy 0, policy_version 9760 (0.0017)
[2023-09-21 12:57:06,050][25893] Fps is (10 sec: 14745.2, 60 sec: 14882.1, 300 sec: 14690.1). Total num frames: 10018816. Throughput: 0: 7433.4, 1: 7450.7. Samples: 9993844. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:57:06,051][25893] Avg episode reward: [(0, '87.390'), (1, '15.840')]
[2023-09-21 12:57:06,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000009784_5009408.pth...
[2023-09-21 12:57:06,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000009784_5009408.pth...
[2023-09-21 12:57:06,067][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000009352_4788224.pth
[2023-09-21 12:57:06,067][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000009352_4788224.pth
[2023-09-21 12:57:09,554][26608] Updated weights for policy 1, policy_version 9840 (0.0015)
[2023-09-21 12:57:09,554][26609] Updated weights for policy 0, policy_version 9840 (0.0014)
[2023-09-21 12:57:11,049][25893] Fps is (10 sec: 14745.9, 60 sec: 14882.2, 300 sec: 14690.1). Total num frames: 10092544. Throughput: 0: 7455.3, 1: 7463.0. Samples: 10084742. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:57:11,050][25893] Avg episode reward: [(0, '109.590'), (1, '15.160')]
[2023-09-21 12:57:15,108][26609] Updated weights for policy 0, policy_version 9920 (0.0013)
[2023-09-21 12:57:15,109][26608] Updated weights for policy 1, policy_version 9920 (0.0014)
[2023-09-21 12:57:16,050][25893] Fps is (10 sec: 14746.0, 60 sec: 14882.1, 300 sec: 14690.1). Total num frames: 10166272. Throughput: 0: 7451.1, 1: 7407.6. Samples: 10152744. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:57:16,050][25893] Avg episode reward: [(0, '167.000'), (1, '16.100')]
[2023-09-21 12:57:21,049][25893] Fps is (10 sec: 13926.4, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 10231808. Throughput: 0: 7384.4, 1: 7396.2. Samples: 10214556. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:57:21,050][25893] Avg episode reward: [(0, '216.460'), (1, '14.670')]
[2023-09-21 12:57:21,092][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000010000_5120000.pth...
[2023-09-21 12:57:21,095][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000009568_4898816.pth
[2023-09-21 12:57:21,110][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000010000_5120000.pth...
[2023-09-21 12:57:21,111][26608] Updated weights for policy 1, policy_version 10000 (0.0016)
[2023-09-21 12:57:21,112][26609] Updated weights for policy 0, policy_version 10000 (0.0015)
[2023-09-21 12:57:21,114][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000009568_4898816.pth
[2023-09-21 12:57:26,050][25893] Fps is (10 sec: 13926.2, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 10305536. Throughput: 0: 7277.8, 1: 7287.1. Samples: 10294442. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 12:57:26,051][25893] Avg episode reward: [(0, '428.500'), (1, '14.530')]
[2023-09-21 12:57:27,109][26608] Updated weights for policy 1, policy_version 10080 (0.0012)
[2023-09-21 12:57:27,110][26609] Updated weights for policy 0, policy_version 10080 (0.0014)
[2023-09-21 12:57:31,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14609.1, 300 sec: 14690.1). Total num frames: 10379264. Throughput: 0: 7268.9, 1: 7235.5. Samples: 10359708. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 12:57:31,050][25893] Avg episode reward: [(0, '631.170'), (1, '15.600')]
[2023-09-21 12:57:32,794][26608] Updated weights for policy 1, policy_version 10160 (0.0015)
[2023-09-21 12:57:32,794][26609] Updated weights for policy 0, policy_version 10160 (0.0016)
[2023-09-21 12:57:36,050][25893] Fps is (10 sec: 13926.2, 60 sec: 14472.6, 300 sec: 14690.1). Total num frames: 10444800. Throughput: 0: 7215.1, 1: 7203.5. Samples: 10422480. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 12:57:36,051][25893] Avg episode reward: [(0, '821.840'), (1, '15.660')]
[2023-09-21 12:57:36,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000010200_5222400.pth...
[2023-09-21 12:57:36,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000010200_5222400.pth...
[2023-09-21 12:57:36,064][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000009784_5009408.pth
[2023-09-21 12:57:36,066][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000009784_5009408.pth
[2023-09-21 12:57:38,335][26608] Updated weights for policy 1, policy_version 10240 (0.0015)
[2023-09-21 12:57:38,335][26609] Updated weights for policy 0, policy_version 10240 (0.0016)
[2023-09-21 12:57:41,050][25893] Fps is (10 sec: 13926.3, 60 sec: 14472.6, 300 sec: 14690.1). Total num frames: 10518528. Throughput: 0: 7154.1, 1: 7171.1. Samples: 10511110. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 12:57:41,051][25893] Avg episode reward: [(0, '985.870'), (1, '13.790')]
[2023-09-21 12:57:44,092][26608] Updated weights for policy 1, policy_version 10320 (0.0014)
[2023-09-21 12:57:44,093][26609] Updated weights for policy 0, policy_version 10320 (0.0015)
[2023-09-21 12:57:46,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14472.5, 300 sec: 14662.3). Total num frames: 10592256. Throughput: 0: 7157.6, 1: 7580.0. Samples: 10574814. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:57:46,051][25893] Avg episode reward: [(0, '1000.000'), (1, '12.390')]
[2023-09-21 12:57:50,405][26609] Updated weights for policy 0, policy_version 10400 (0.0014)
[2023-09-21 12:57:50,406][26608] Updated weights for policy 1, policy_version 10400 (0.0012)
[2023-09-21 12:57:51,050][25893] Fps is (10 sec: 13926.2, 60 sec: 14336.0, 300 sec: 14634.5). Total num frames: 10657792. Throughput: 0: 7103.6, 1: 7109.0. Samples: 10633410. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:57:51,051][25893] Avg episode reward: [(0, '1000.000'), (1, '13.940')]
[2023-09-21 12:57:51,059][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000010408_5328896.pth...
[2023-09-21 12:57:51,059][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000010408_5328896.pth...
[2023-09-21 12:57:51,064][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000010000_5120000.pth
[2023-09-21 12:57:51,064][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000010000_5120000.pth
[2023-09-21 12:57:56,049][25893] Fps is (10 sec: 13107.5, 60 sec: 14199.5, 300 sec: 14606.8). Total num frames: 10723328. Throughput: 0: 7015.2, 1: 7009.5. Samples: 10715854. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:57:56,050][25893] Avg episode reward: [(0, '1000.000'), (1, '14.610')]
[2023-09-21 12:57:56,261][26609] Updated weights for policy 0, policy_version 10480 (0.0009)
[2023-09-21 12:57:56,262][26608] Updated weights for policy 1, policy_version 10480 (0.0015)
[2023-09-21 12:58:01,049][25893] Fps is (10 sec: 13926.9, 60 sec: 14199.5, 300 sec: 14606.8). Total num frames: 10797056. Throughput: 0: 6988.2, 1: 6984.2. Samples: 10781502. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:58:01,050][25893] Avg episode reward: [(0, '1000.000'), (1, '15.050')]
[2023-09-21 12:58:01,905][26608] Updated weights for policy 1, policy_version 10560 (0.0017)
[2023-09-21 12:58:01,905][26609] Updated weights for policy 0, policy_version 10560 (0.0015)
[2023-09-21 12:58:06,049][25893] Fps is (10 sec: 14745.6, 60 sec: 14199.5, 300 sec: 14579.0). Total num frames: 10870784. Throughput: 0: 7008.8, 1: 7009.6. Samples: 10845382. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:58:06,050][25893] Avg episode reward: [(0, '1000.000'), (1, '16.210')]
[2023-09-21 12:58:06,057][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000010616_5435392.pth...
[2023-09-21 12:58:06,057][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000010616_5435392.pth...
[2023-09-21 12:58:06,063][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000010200_5222400.pth
[2023-09-21 12:58:06,064][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000010200_5222400.pth
[2023-09-21 12:58:07,563][26609] Updated weights for policy 0, policy_version 10640 (0.0010)
[2023-09-21 12:58:07,563][26608] Updated weights for policy 1, policy_version 10640 (0.0014)
[2023-09-21 12:58:11,049][25893] Fps is (10 sec: 14745.5, 60 sec: 14199.5, 300 sec: 14579.0). Total num frames: 10944512. Throughput: 0: 7103.9, 1: 7095.5. Samples: 10933414. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:58:11,050][25893] Avg episode reward: [(0, '1000.000'), (1, '16.090')]
[2023-09-21 12:58:13,271][26608] Updated weights for policy 1, policy_version 10720 (0.0014)
[2023-09-21 12:58:13,271][26609] Updated weights for policy 0, policy_version 10720 (0.0015)
[2023-09-21 12:58:16,049][25893] Fps is (10 sec: 13926.4, 60 sec: 14062.9, 300 sec: 14551.2). Total num frames: 11010048. Throughput: 0: 7086.4, 1: 7085.9. Samples: 10997460. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:58:16,050][25893] Avg episode reward: [(0, '1000.000'), (1, '13.760')]
[2023-09-21 12:58:19,013][26608] Updated weights for policy 1, policy_version 10800 (0.0015)
[2023-09-21 12:58:19,013][26609] Updated weights for policy 0, policy_version 10800 (0.0010)
[2023-09-21 12:58:21,049][25893] Fps is (10 sec: 13926.4, 60 sec: 14199.5, 300 sec: 14551.2). Total num frames: 11083776. Throughput: 0: 7086.5, 1: 7088.1. Samples: 11060334. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:58:21,050][25893] Avg episode reward: [(0, '1000.000'), (1, '15.110')]
[2023-09-21 12:58:21,059][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000010824_5541888.pth...
[2023-09-21 12:58:21,059][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000010824_5541888.pth...
[2023-09-21 12:58:21,067][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000010408_5328896.pth
[2023-09-21 12:58:21,067][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000010408_5328896.pth
[2023-09-21 12:58:24,534][26609] Updated weights for policy 0, policy_version 10880 (0.0014)
[2023-09-21 12:58:24,535][26608] Updated weights for policy 1, policy_version 10880 (0.0016)
[2023-09-21 12:58:26,050][25893] Fps is (10 sec: 14745.4, 60 sec: 14199.5, 300 sec: 14579.0). Total num frames: 11157504. Throughput: 0: 7099.4, 1: 7099.8. Samples: 11150072. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:58:26,051][25893] Avg episode reward: [(0, '1000.000'), (1, '15.730')]
[2023-09-21 12:58:29,956][26609] Updated weights for policy 0, policy_version 10960 (0.0013)
[2023-09-21 12:58:29,956][26608] Updated weights for policy 1, policy_version 10960 (0.0012)
[2023-09-21 12:58:31,050][25893] Fps is (10 sec: 14745.3, 60 sec: 14199.4, 300 sec: 14579.0). Total num frames: 11231232. Throughput: 0: 7118.1, 1: 7168.6. Samples: 11217716. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 12:58:31,051][25893] Avg episode reward: [(0, '1000.000'), (1, '16.360')]
[2023-09-21 12:58:35,853][26609] Updated weights for policy 0, policy_version 11040 (0.0015)
[2023-09-21 12:58:35,853][26608] Updated weights for policy 1, policy_version 11040 (0.0010)
[2023-09-21 12:58:36,050][25893] Fps is (10 sec: 14745.6, 60 sec: 14336.0, 300 sec: 14579.0). Total num frames: 11304960. Throughput: 0: 7206.8, 1: 7194.1. Samples: 11281448. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 12:58:36,050][25893] Avg episode reward: [(0, '1000.000'), (1, '16.520')]
[2023-09-21 12:58:36,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000011040_5652480.pth...
[2023-09-21 12:58:36,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000011040_5652480.pth...
[2023-09-21 12:58:36,065][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000010616_5435392.pth
[2023-09-21 12:58:36,069][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000010616_5435392.pth
[2023-09-21 12:58:41,049][25893] Fps is (10 sec: 14745.9, 60 sec: 14336.0, 300 sec: 14579.0). Total num frames: 11378688. Throughput: 0: 7286.3, 1: 7282.5. Samples: 11371450. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:58:41,050][25893] Avg episode reward: [(0, '1000.000'), (1, '17.480')]
[2023-09-21 12:58:41,221][26609] Updated weights for policy 0, policy_version 11120 (0.0015)
[2023-09-21 12:58:41,221][26608] Updated weights for policy 1, policy_version 11120 (0.0013)
[2023-09-21 12:58:46,050][25893] Fps is (10 sec: 15564.7, 60 sec: 14472.5, 300 sec: 14606.7). Total num frames: 11460608. Throughput: 0: 7313.5, 1: 7325.6. Samples: 11440262. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:58:46,051][25893] Avg episode reward: [(0, '1000.000'), (1, '17.130')]
[2023-09-21 12:58:46,537][26609] Updated weights for policy 0, policy_version 11200 (0.0015)
[2023-09-21 12:58:46,537][26608] Updated weights for policy 1, policy_version 11200 (0.0013)
[2023-09-21 12:58:51,049][25893] Fps is (10 sec: 15564.9, 60 sec: 14609.1, 300 sec: 14606.8). Total num frames: 11534336. Throughput: 0: 7372.1, 1: 7366.9. Samples: 11508638. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:58:51,050][25893] Avg episode reward: [(0, '1000.000'), (1, '18.100')]
[2023-09-21 12:58:51,055][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000011264_5767168.pth...
[2023-09-21 12:58:51,055][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000011264_5767168.pth...
[2023-09-21 12:58:51,059][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000010824_5541888.pth
[2023-09-21 12:58:51,060][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000010824_5541888.pth
[2023-09-21 12:58:52,137][26609] Updated weights for policy 0, policy_version 11280 (0.0015)
[2023-09-21 12:58:52,137][26608] Updated weights for policy 1, policy_version 11280 (0.0014)
[2023-09-21 12:58:56,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14745.6, 300 sec: 14606.8). Total num frames: 11608064. Throughput: 0: 7365.3, 1: 7366.4. Samples: 11596342. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:58:56,050][25893] Avg episode reward: [(0, '1000.000'), (1, '18.250')]
[2023-09-21 12:58:57,599][26608] Updated weights for policy 1, policy_version 11360 (0.0014)
[2023-09-21 12:58:57,599][26609] Updated weights for policy 0, policy_version 11360 (0.0016)
[2023-09-21 12:59:01,050][25893] Fps is (10 sec: 13926.0, 60 sec: 14609.0, 300 sec: 14579.2). Total num frames: 11673600. Throughput: 0: 7379.1, 1: 7372.9. Samples: 11661304. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:59:01,051][25893] Avg episode reward: [(0, '1000.000'), (1, '19.410')]
[2023-09-21 12:59:03,229][26609] Updated weights for policy 0, policy_version 11440 (0.0013)
[2023-09-21 12:59:03,230][26608] Updated weights for policy 1, policy_version 11440 (0.0014)
[2023-09-21 12:59:06,050][25893] Fps is (10 sec: 14745.6, 60 sec: 14745.6, 300 sec: 14634.5). Total num frames: 11755520. Throughput: 0: 7421.8, 1: 7416.9. Samples: 11728074. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:59:06,050][25893] Avg episode reward: [(0, '1000.000'), (1, '20.320')]
[2023-09-21 12:59:06,057][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000011480_5877760.pth...
[2023-09-21 12:59:06,057][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000011480_5877760.pth...
[2023-09-21 12:59:06,061][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000011040_5652480.pth
[2023-09-21 12:59:06,063][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000011040_5652480.pth
[2023-09-21 12:59:09,121][26609] Updated weights for policy 0, policy_version 11520 (0.0015)
[2023-09-21 12:59:09,121][26608] Updated weights for policy 1, policy_version 11520 (0.0016)
[2023-09-21 12:59:11,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14609.0, 300 sec: 14606.7). Total num frames: 11821056. Throughput: 0: 7356.6, 1: 7359.1. Samples: 11812284. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 12:59:11,051][25893] Avg episode reward: [(0, '1000.000'), (1, '23.350')]
[2023-09-21 12:59:14,569][26609] Updated weights for policy 0, policy_version 11600 (0.0014)
[2023-09-21 12:59:14,569][26608] Updated weights for policy 1, policy_version 11600 (0.0016)
[2023-09-21 12:59:16,049][25893] Fps is (10 sec: 13926.5, 60 sec: 14745.6, 300 sec: 14579.0). Total num frames: 11894784. Throughput: 0: 7335.5, 1: 7383.8. Samples: 11880080. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 12:59:16,050][25893] Avg episode reward: [(0, '1000.000'), (1, '27.730')]
[2023-09-21 12:59:19,811][26609] Updated weights for policy 0, policy_version 11680 (0.0014)
[2023-09-21 12:59:19,812][26608] Updated weights for policy 1, policy_version 11680 (0.0016)
[2023-09-21 12:59:21,050][25893] Fps is (10 sec: 15564.9, 60 sec: 14882.1, 300 sec: 14606.8). Total num frames: 11976704. Throughput: 0: 7435.3, 1: 7424.1. Samples: 11950122. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 12:59:21,051][25893] Avg episode reward: [(0, '1000.000'), (1, '39.400')]
[2023-09-21 12:59:21,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000011696_5988352.pth...
[2023-09-21 12:59:21,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000011696_5988352.pth...
[2023-09-21 12:59:21,063][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000011264_5767168.pth
[2023-09-21 12:59:21,065][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000011264_5767168.pth
[2023-09-21 12:59:25,230][26609] Updated weights for policy 0, policy_version 11760 (0.0014)
[2023-09-21 12:59:25,230][26608] Updated weights for policy 1, policy_version 11760 (0.0014)
[2023-09-21 12:59:26,050][25893] Fps is (10 sec: 15564.5, 60 sec: 14882.1, 300 sec: 14606.7). Total num frames: 12050432. Throughput: 0: 7443.4, 1: 7432.2. Samples: 12040850. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 12:59:26,051][25893] Avg episode reward: [(0, '1000.000'), (1, '37.550')]
[2023-09-21 12:59:30,379][26608] Updated weights for policy 1, policy_version 11840 (0.0012)
[2023-09-21 12:59:30,379][26609] Updated weights for policy 0, policy_version 11840 (0.0012)
[2023-09-21 12:59:31,050][25893] Fps is (10 sec: 15565.0, 60 sec: 15018.7, 300 sec: 14634.5). Total num frames: 12132352. Throughput: 0: 7450.1, 1: 7486.5. Samples: 12112408. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:59:31,050][25893] Avg episode reward: [(0, '1000.000'), (1, '54.660')]
[2023-09-21 12:59:36,019][26608] Updated weights for policy 1, policy_version 11920 (0.0013)
[2023-09-21 12:59:36,019][26609] Updated weights for policy 0, policy_version 11920 (0.0011)
[2023-09-21 12:59:36,050][25893] Fps is (10 sec: 15564.7, 60 sec: 15018.6, 300 sec: 14662.3). Total num frames: 12206080. Throughput: 0: 7463.7, 1: 7463.2. Samples: 12180354. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 12:59:36,051][25893] Avg episode reward: [(0, '1000.000'), (1, '41.450')]
[2023-09-21 12:59:36,058][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000011920_6103040.pth...
[2023-09-21 12:59:36,058][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000011920_6103040.pth...
[2023-09-21 12:59:36,063][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000011480_5877760.pth
[2023-09-21 12:59:36,066][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000011480_5877760.pth
[2023-09-21 12:59:41,050][25893] Fps is (10 sec: 14745.6, 60 sec: 15018.6, 300 sec: 14634.5). Total num frames: 12279808. Throughput: 0: 7461.8, 1: 7468.8. Samples: 12268218. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 12:59:41,050][25893] Avg episode reward: [(0, '1000.000'), (1, '57.650')]
[2023-09-21 12:59:41,554][26608] Updated weights for policy 1, policy_version 12000 (0.0012)
[2023-09-21 12:59:41,554][26609] Updated weights for policy 0, policy_version 12000 (0.0015)
[2023-09-21 12:59:46,050][25893] Fps is (10 sec: 13926.3, 60 sec: 14745.6, 300 sec: 14634.5). Total num frames: 12345344. Throughput: 0: 7460.0, 1: 7460.1. Samples: 12332710. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 12:59:46,051][25893] Avg episode reward: [(0, '1000.000'), (1, '55.550')]
[2023-09-21 12:59:47,251][26609] Updated weights for policy 0, policy_version 12080 (0.0015)
[2023-09-21 12:59:47,252][26608] Updated weights for policy 1, policy_version 12080 (0.0015)
[2023-09-21 12:59:51,050][25893] Fps is (10 sec: 13926.1, 60 sec: 14745.5, 300 sec: 14634.5). Total num frames: 12419072. Throughput: 0: 7445.8, 1: 7443.8. Samples: 12398108. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 12:59:51,051][25893] Avg episode reward: [(0, '1000.000'), (1, '62.020')]
[2023-09-21 12:59:51,062][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000012136_6213632.pth...
[2023-09-21 12:59:51,065][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000011696_5988352.pth
[2023-09-21 12:59:51,076][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000012136_6213632.pth...
[2023-09-21 12:59:51,079][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000011696_5988352.pth
[2023-09-21 12:59:52,832][26608] Updated weights for policy 1, policy_version 12160 (0.0013)
[2023-09-21 12:59:52,832][26609] Updated weights for policy 0, policy_version 12160 (0.0014)
[2023-09-21 12:59:56,050][25893] Fps is (10 sec: 14745.9, 60 sec: 14745.6, 300 sec: 14634.5). Total num frames: 12492800. Throughput: 0: 7478.2, 1: 7477.5. Samples: 12485286. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 12:59:56,050][25893] Avg episode reward: [(0, '1000.000'), (1, '70.480')]
[2023-09-21 12:59:58,348][26609] Updated weights for policy 0, policy_version 12240 (0.0015)
[2023-09-21 12:59:58,349][26608] Updated weights for policy 1, policy_version 12240 (0.0012)
[2023-09-21 13:00:01,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14882.1, 300 sec: 14634.5). Total num frames: 12566528. Throughput: 0: 7494.9, 1: 7437.0. Samples: 12552022. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:00:01,051][25893] Avg episode reward: [(0, '1000.000'), (1, '78.330')]
[2023-09-21 13:00:04,147][26608] Updated weights for policy 1, policy_version 12320 (0.0012)
[2023-09-21 13:00:04,149][26609] Updated weights for policy 0, policy_version 12320 (0.0012)
[2023-09-21 13:00:06,050][25893] Fps is (10 sec: 14745.1, 60 sec: 14745.5, 300 sec: 14606.7). Total num frames: 12640256. Throughput: 0: 7381.2, 1: 7401.7. Samples: 12615356. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:00:06,051][25893] Avg episode reward: [(0, '1000.000'), (1, '85.640')]
[2023-09-21 13:00:06,062][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000012344_6320128.pth...
[2023-09-21 13:00:06,063][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000012344_6320128.pth...
[2023-09-21 13:00:06,067][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000011920_6103040.pth
[2023-09-21 13:00:06,070][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000011920_6103040.pth
[2023-09-21 13:00:09,558][26609] Updated weights for policy 0, policy_version 12400 (0.0014)
[2023-09-21 13:00:09,558][26608] Updated weights for policy 1, policy_version 12400 (0.0013)
[2023-09-21 13:00:11,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14882.2, 300 sec: 14606.8). Total num frames: 12713984. Throughput: 0: 7381.0, 1: 7402.3. Samples: 12706100. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:00:11,051][25893] Avg episode reward: [(0, '1000.000'), (1, '41.270')]
[2023-09-21 13:00:15,165][26608] Updated weights for policy 1, policy_version 12480 (0.0016)
[2023-09-21 13:00:15,165][26609] Updated weights for policy 0, policy_version 12480 (0.0015)
[2023-09-21 13:00:16,050][25893] Fps is (10 sec: 14746.0, 60 sec: 14882.1, 300 sec: 14634.5). Total num frames: 12787712. Throughput: 0: 7341.2, 1: 7300.9. Samples: 12771304. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:00:16,051][25893] Avg episode reward: [(0, '1000.000'), (1, '67.430')]
[2023-09-21 13:00:20,528][26608] Updated weights for policy 1, policy_version 12560 (0.0013)
[2023-09-21 13:00:20,528][26609] Updated weights for policy 0, policy_version 12560 (0.0014)
[2023-09-21 13:00:21,050][25893] Fps is (10 sec: 15564.5, 60 sec: 14882.1, 300 sec: 14662.3). Total num frames: 12869632. Throughput: 0: 7317.9, 1: 7313.3. Samples: 12838758. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:00:21,051][25893] Avg episode reward: [(0, '1000.000'), (1, '89.640')]
[2023-09-21 13:00:21,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000012568_6434816.pth...
[2023-09-21 13:00:21,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000012568_6434816.pth...
[2023-09-21 13:00:21,068][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000012136_6213632.pth
[2023-09-21 13:00:21,068][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000012136_6213632.pth
[2023-09-21 13:00:26,049][25893] Fps is (10 sec: 14746.0, 60 sec: 14745.6, 300 sec: 14634.5). Total num frames: 12935168. Throughput: 0: 7311.5, 1: 7320.7. Samples: 12926666. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:00:26,050][25893] Avg episode reward: [(0, '1000.000'), (1, '114.080')]
[2023-09-21 13:00:26,144][26609] Updated weights for policy 0, policy_version 12640 (0.0012)
[2023-09-21 13:00:26,144][26608] Updated weights for policy 1, policy_version 12640 (0.0014)
[2023-09-21 13:00:31,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14745.6, 300 sec: 14662.3). Total num frames: 13017088. Throughput: 0: 7375.4, 1: 7388.5. Samples: 12997086. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:00:31,051][25893] Avg episode reward: [(0, '1000.000'), (1, '79.530')]
[2023-09-21 13:00:31,521][26608] Updated weights for policy 1, policy_version 12720 (0.0013)
[2023-09-21 13:00:31,521][26609] Updated weights for policy 0, policy_version 12720 (0.0014)
[2023-09-21 13:00:36,050][25893] Fps is (10 sec: 14745.1, 60 sec: 14609.1, 300 sec: 14634.5). Total num frames: 13082624. Throughput: 0: 7384.9, 1: 7390.2. Samples: 13062990. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:00:36,051][25893] Avg episode reward: [(0, '1000.000'), (1, '101.920')]
[2023-09-21 13:00:36,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000012776_6541312.pth...
[2023-09-21 13:00:36,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000012776_6541312.pth...
[2023-09-21 13:00:36,065][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000012344_6320128.pth
[2023-09-21 13:00:36,067][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000012344_6320128.pth
[2023-09-21 13:00:37,293][26609] Updated weights for policy 0, policy_version 12800 (0.0013)
[2023-09-21 13:00:37,293][26608] Updated weights for policy 1, policy_version 12800 (0.0015)
[2023-09-21 13:00:41,050][25893] Fps is (10 sec: 13926.5, 60 sec: 14609.0, 300 sec: 14634.5). Total num frames: 13156352. Throughput: 0: 7376.1, 1: 7372.8. Samples: 13148990. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:00:41,051][25893] Avg episode reward: [(0, '1000.000'), (1, '122.370')]
[2023-09-21 13:00:42,748][26609] Updated weights for policy 0, policy_version 12880 (0.0009)
[2023-09-21 13:00:42,749][26608] Updated weights for policy 1, policy_version 12880 (0.0012)
[2023-09-21 13:00:46,050][25893] Fps is (10 sec: 14745.9, 60 sec: 14745.7, 300 sec: 14634.5). Total num frames: 13230080. Throughput: 0: 7370.5, 1: 6899.6. Samples: 13194172. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:00:46,050][25893] Avg episode reward: [(0, '1000.000'), (1, '121.180')]
[2023-09-21 13:00:48,137][26608] Updated weights for policy 1, policy_version 12960 (0.0012)
[2023-09-21 13:00:48,137][26609] Updated weights for policy 0, policy_version 12960 (0.0011)
[2023-09-21 13:00:51,050][25893] Fps is (10 sec: 15564.5, 60 sec: 14882.1, 300 sec: 14662.3). Total num frames: 13312000. Throughput: 0: 7465.1, 1: 7462.5. Samples: 13287098. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:00:51,051][25893] Avg episode reward: [(0, '1000.000'), (1, '117.380')]
[2023-09-21 13:00:51,062][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000013000_6656000.pth...
[2023-09-21 13:00:51,062][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000013000_6656000.pth...
[2023-09-21 13:00:51,068][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000012568_6434816.pth
[2023-09-21 13:00:51,068][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000012568_6434816.pth
[2023-09-21 13:00:53,320][26609] Updated weights for policy 0, policy_version 13040 (0.0014)
[2023-09-21 13:00:53,320][26608] Updated weights for policy 1, policy_version 13040 (0.0013)
[2023-09-21 13:00:56,050][25893] Fps is (10 sec: 15564.8, 60 sec: 14882.1, 300 sec: 14662.3). Total num frames: 13385728. Throughput: 0: 7487.3, 1: 7472.2. Samples: 13379278. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:00:56,050][25893] Avg episode reward: [(0, '1000.000'), (1, '130.340')]
[2023-09-21 13:00:58,729][26608] Updated weights for policy 1, policy_version 13120 (0.0012)
[2023-09-21 13:00:58,730][26609] Updated weights for policy 0, policy_version 13120 (0.0014)
[2023-09-21 13:01:01,050][25893] Fps is (10 sec: 15565.3, 60 sec: 15018.7, 300 sec: 14717.8). Total num frames: 13467648. Throughput: 0: 7513.4, 1: 7526.7. Samples: 13448104. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:01:01,050][25893] Avg episode reward: [(0, '1000.000'), (1, '161.220')]
[2023-09-21 13:01:04,290][26609] Updated weights for policy 0, policy_version 13200 (0.0015)
[2023-09-21 13:01:04,290][26608] Updated weights for policy 1, policy_version 13200 (0.0015)
[2023-09-21 13:01:06,050][25893] Fps is (10 sec: 15564.8, 60 sec: 15018.8, 300 sec: 14717.8). Total num frames: 13541376. Throughput: 0: 7512.5, 1: 7518.1. Samples: 13515132. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:01:06,050][25893] Avg episode reward: [(0, '1000.000'), (1, '168.480')]
[2023-09-21 13:01:06,056][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000013224_6770688.pth...
[2023-09-21 13:01:06,056][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000013224_6770688.pth...
[2023-09-21 13:01:06,061][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000012776_6541312.pth
[2023-09-21 13:01:06,061][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000012776_6541312.pth
[2023-09-21 13:01:09,699][26608] Updated weights for policy 1, policy_version 13280 (0.0012)
[2023-09-21 13:01:09,700][26609] Updated weights for policy 0, policy_version 13280 (0.0015)
[2023-09-21 13:01:11,050][25893] Fps is (10 sec: 14745.6, 60 sec: 15018.7, 300 sec: 14717.8). Total num frames: 13615104. Throughput: 0: 7550.2, 1: 7541.6. Samples: 13605798. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:01:11,050][25893] Avg episode reward: [(0, '1000.000'), (1, '190.990')]
[2023-09-21 13:01:15,253][26608] Updated weights for policy 1, policy_version 13360 (0.0010)
[2023-09-21 13:01:15,253][26609] Updated weights for policy 0, policy_version 13360 (0.0013)
[2023-09-21 13:01:16,050][25893] Fps is (10 sec: 14745.4, 60 sec: 15018.7, 300 sec: 14717.8). Total num frames: 13688832. Throughput: 0: 7505.5, 1: 7470.2. Samples: 13670994. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:01:16,051][25893] Avg episode reward: [(0, '1000.000'), (1, '250.190')]
[2023-09-21 13:01:20,807][26608] Updated weights for policy 1, policy_version 13440 (0.0012)
[2023-09-21 13:01:20,807][26609] Updated weights for policy 0, policy_version 13440 (0.0010)
[2023-09-21 13:01:21,050][25893] Fps is (10 sec: 14745.2, 60 sec: 14882.1, 300 sec: 14717.8). Total num frames: 13762560. Throughput: 0: 7487.0, 1: 7480.2. Samples: 13736516. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:01:21,051][25893] Avg episode reward: [(0, '1000.000'), (1, '338.940')]
[2023-09-21 13:01:21,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000013440_6881280.pth...
[2023-09-21 13:01:21,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000013440_6881280.pth...
[2023-09-21 13:01:21,065][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000013000_6656000.pth
[2023-09-21 13:01:21,068][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000013000_6656000.pth
[2023-09-21 13:01:26,050][25893] Fps is (10 sec: 14745.8, 60 sec: 15018.6, 300 sec: 14690.1). Total num frames: 13836288. Throughput: 0: 7533.7, 1: 7525.2. Samples: 13826640. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:01:26,050][25893] Avg episode reward: [(0, '1000.000'), (1, '441.780')]
[2023-09-21 13:01:26,264][26609] Updated weights for policy 0, policy_version 13520 (0.0015)
[2023-09-21 13:01:26,264][26608] Updated weights for policy 1, policy_version 13520 (0.0014)
[2023-09-21 13:01:31,050][25893] Fps is (10 sec: 13926.5, 60 sec: 14745.6, 300 sec: 14662.3). Total num frames: 13901824. Throughput: 0: 7487.1, 1: 7940.3. Samples: 13888406. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:01:31,051][25893] Avg episode reward: [(0, '1000.000'), (1, '544.990')]
[2023-09-21 13:01:32,177][26609] Updated weights for policy 0, policy_version 13600 (0.0014)
[2023-09-21 13:01:32,177][26608] Updated weights for policy 1, policy_version 13600 (0.0015)
[2023-09-21 13:01:36,050][25893] Fps is (10 sec: 13926.2, 60 sec: 14882.1, 300 sec: 14662.3). Total num frames: 13975552. Throughput: 0: 7412.9, 1: 7403.9. Samples: 13953852. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:01:36,051][25893] Avg episode reward: [(0, '1000.000'), (1, '765.290')]
[2023-09-21 13:01:36,087][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000013656_6991872.pth...
[2023-09-21 13:01:36,090][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000013224_6770688.pth
[2023-09-21 13:01:36,097][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000013656_6991872.pth...
[2023-09-21 13:01:36,100][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000013224_6770688.pth
[2023-09-21 13:01:37,664][26609] Updated weights for policy 0, policy_version 13680 (0.0011)
[2023-09-21 13:01:37,665][26608] Updated weights for policy 1, policy_version 13680 (0.0014)
[2023-09-21 13:01:41,050][25893] Fps is (10 sec: 15565.1, 60 sec: 15018.7, 300 sec: 14690.1). Total num frames: 14057472. Throughput: 0: 7415.5, 1: 7415.8. Samples: 14046688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:01:41,050][25893] Avg episode reward: [(0, '1000.000'), (1, '844.470')]
[2023-09-21 13:01:43,108][26609] Updated weights for policy 0, policy_version 13760 (0.0015)
[2023-09-21 13:01:43,108][26608] Updated weights for policy 1, policy_version 13760 (0.0015)
[2023-09-21 13:01:46,050][25893] Fps is (10 sec: 15565.1, 60 sec: 15018.7, 300 sec: 14690.1). Total num frames: 14131200. Throughput: 0: 7382.5, 1: 7429.1. Samples: 14114628. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:01:46,050][25893] Avg episode reward: [(0, '1000.000'), (1, '659.700')]
[2023-09-21 13:01:48,427][26609] Updated weights for policy 0, policy_version 13840 (0.0012)
[2023-09-21 13:01:48,427][26608] Updated weights for policy 1, policy_version 13840 (0.0014)
[2023-09-21 13:01:51,050][25893] Fps is (10 sec: 14745.6, 60 sec: 14882.2, 300 sec: 14690.1). Total num frames: 14204928. Throughput: 0: 7396.5, 1: 7403.7. Samples: 14181138. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:01:51,050][25893] Avg episode reward: [(0, '1000.000'), (1, '364.120')]
[2023-09-21 13:01:51,055][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000013872_7102464.pth...
[2023-09-21 13:01:51,055][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000013872_7102464.pth...
[2023-09-21 13:01:51,060][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000013440_6881280.pth
[2023-09-21 13:01:51,061][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000013440_6881280.pth
[2023-09-21 13:01:54,000][26609] Updated weights for policy 0, policy_version 13920 (0.0015)
[2023-09-21 13:01:54,001][26608] Updated weights for policy 1, policy_version 13920 (0.0014)
[2023-09-21 13:01:56,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14882.1, 300 sec: 14690.1). Total num frames: 14278656. Throughput: 0: 7406.1, 1: 7398.3. Samples: 14271998. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:01:56,051][25893] Avg episode reward: [(0, '1000.000'), (1, '434.580')]
[2023-09-21 13:01:59,803][26608] Updated weights for policy 1, policy_version 14000 (0.0012)
[2023-09-21 13:01:59,803][26609] Updated weights for policy 0, policy_version 14000 (0.0014)
[2023-09-21 13:02:01,049][25893] Fps is (10 sec: 14745.7, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 14352384. Throughput: 0: 7372.1, 1: 7355.2. Samples: 14333720. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:02:01,050][25893] Avg episode reward: [(0, '1000.000'), (1, '677.700')]
[2023-09-21 13:02:05,532][26609] Updated weights for policy 0, policy_version 14080 (0.0014)
[2023-09-21 13:02:05,533][26608] Updated weights for policy 1, policy_version 14080 (0.0014)
[2023-09-21 13:02:06,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 14426112. Throughput: 0: 7367.2, 1: 7364.0. Samples: 14399416. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:02:06,050][25893] Avg episode reward: [(0, '1000.000'), (1, '929.330')]
[2023-09-21 13:02:06,058][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000014088_7213056.pth...
[2023-09-21 13:02:06,058][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000014088_7213056.pth...
[2023-09-21 13:02:06,066][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000013656_6991872.pth
[2023-09-21 13:02:06,067][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000013656_6991872.pth
[2023-09-21 13:02:10,897][26608] Updated weights for policy 1, policy_version 14160 (0.0012)
[2023-09-21 13:02:10,898][26609] Updated weights for policy 0, policy_version 14160 (0.0015)
[2023-09-21 13:02:11,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 14499840. Throughput: 0: 7349.8, 1: 7342.2. Samples: 14487778. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:02:11,050][25893] Avg episode reward: [(0, '1000.000'), (1, '999.800')]
[2023-09-21 13:02:16,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14745.6, 300 sec: 14717.8). Total num frames: 14573568. Throughput: 0: 7373.3, 1: 7404.0. Samples: 14553380. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:02:16,050][25893] Avg episode reward: [(0, '1000.000'), (1, '994.450')]
[2023-09-21 13:02:16,503][26609] Updated weights for policy 0, policy_version 14240 (0.0014)
[2023-09-21 13:02:16,503][26608] Updated weights for policy 1, policy_version 14240 (0.0013)
[2023-09-21 13:02:21,050][25893] Fps is (10 sec: 14745.3, 60 sec: 14745.6, 300 sec: 14717.8). Total num frames: 14647296. Throughput: 0: 7441.0, 1: 7444.2. Samples: 14623686. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:02:21,051][25893] Avg episode reward: [(0, '1000.000'), (1, '994.650')]
[2023-09-21 13:02:21,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000014304_7323648.pth...
[2023-09-21 13:02:21,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000014304_7323648.pth...
[2023-09-21 13:02:21,065][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000013872_7102464.pth
[2023-09-21 13:02:21,065][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000013872_7102464.pth
[2023-09-21 13:02:21,669][26609] Updated weights for policy 0, policy_version 14320 (0.0012)
[2023-09-21 13:02:21,669][26608] Updated weights for policy 1, policy_version 14320 (0.0013)
[2023-09-21 13:02:26,050][25893] Fps is (10 sec: 15564.5, 60 sec: 14882.1, 300 sec: 14745.6). Total num frames: 14729216. Throughput: 0: 7443.2, 1: 7442.6. Samples: 14716550. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:02:26,051][25893] Avg episode reward: [(0, '1000.000'), (1, '999.570')]
[2023-09-21 13:02:27,100][26609] Updated weights for policy 0, policy_version 14400 (0.0011)
[2023-09-21 13:02:27,100][26608] Updated weights for policy 1, policy_version 14400 (0.0012)
[2023-09-21 13:02:31,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14882.1, 300 sec: 14745.6). Total num frames: 14794752. Throughput: 0: 7447.1, 1: 7344.4. Samples: 14780246. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:02:31,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:02:32,880][26608] Updated weights for policy 1, policy_version 14480 (0.0015)
[2023-09-21 13:02:32,881][26609] Updated weights for policy 0, policy_version 14480 (0.0013)
[2023-09-21 13:02:36,050][25893] Fps is (10 sec: 14745.5, 60 sec: 15018.7, 300 sec: 14773.4). Total num frames: 14876672. Throughput: 0: 7400.1, 1: 7388.8. Samples: 14846640. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:02:36,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:02:36,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000014528_7438336.pth...
[2023-09-21 13:02:36,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000014528_7438336.pth...
[2023-09-21 13:02:36,066][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000014088_7213056.pth
[2023-09-21 13:02:36,067][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000014088_7213056.pth
[2023-09-21 13:02:38,304][26608] Updated weights for policy 1, policy_version 14560 (0.0014)
[2023-09-21 13:02:38,304][26609] Updated weights for policy 0, policy_version 14560 (0.0013)
[2023-09-21 13:02:41,050][25893] Fps is (10 sec: 15564.9, 60 sec: 14882.1, 300 sec: 14773.4). Total num frames: 14950400. Throughput: 0: 7377.3, 1: 7380.1. Samples: 14936084. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:02:41,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:02:43,903][26608] Updated weights for policy 1, policy_version 14640 (0.0013)
[2023-09-21 13:02:43,904][26609] Updated weights for policy 0, policy_version 14640 (0.0015)
[2023-09-21 13:02:46,050][25893] Fps is (10 sec: 13926.5, 60 sec: 14745.6, 300 sec: 14773.4). Total num frames: 15015936. Throughput: 0: 7407.0, 1: 7422.6. Samples: 15001058. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:02:46,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:02:49,510][26609] Updated weights for policy 0, policy_version 14720 (0.0014)
[2023-09-21 13:02:49,510][26608] Updated weights for policy 1, policy_version 14720 (0.0012)
[2023-09-21 13:02:51,050][25893] Fps is (10 sec: 13926.4, 60 sec: 14745.6, 300 sec: 14801.1). Total num frames: 15089664. Throughput: 0: 7418.0, 1: 7423.7. Samples: 15067292. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:02:51,051][25893] Avg episode reward: [(0, '1000.000'), (1, '856.660')]
[2023-09-21 13:02:51,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000014736_7544832.pth...
[2023-09-21 13:02:51,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000014736_7544832.pth...
[2023-09-21 13:02:51,066][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000014304_7323648.pth
[2023-09-21 13:02:51,069][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000014304_7323648.pth
[2023-09-21 13:02:55,089][26609] Updated weights for policy 0, policy_version 14800 (0.0013)
[2023-09-21 13:02:55,090][26608] Updated weights for policy 1, policy_version 14800 (0.0015)
[2023-09-21 13:02:56,049][25893] Fps is (10 sec: 14745.9, 60 sec: 14745.6, 300 sec: 14801.1). Total num frames: 15163392. Throughput: 0: 7411.3, 1: 7432.8. Samples: 15155760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:02:56,050][25893] Avg episode reward: [(0, '1000.000'), (1, '675.550')]
[2023-09-21 13:03:00,640][26609] Updated weights for policy 0, policy_version 14880 (0.0012)
[2023-09-21 13:03:00,641][26608] Updated weights for policy 1, policy_version 14880 (0.0014)
[2023-09-21 13:03:01,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14745.5, 300 sec: 14801.1). Total num frames: 15237120. Throughput: 0: 7423.0, 1: 7445.8. Samples: 15222478. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:03:01,051][25893] Avg episode reward: [(0, '1000.000'), (1, '765.960')]
[2023-09-21 13:03:06,050][25893] Fps is (10 sec: 14745.2, 60 sec: 14745.5, 300 sec: 14801.1). Total num frames: 15310848. Throughput: 0: 7382.2, 1: 7376.1. Samples: 15287808. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:03:06,051][25893] Avg episode reward: [(0, '1000.000'), (1, '908.330')]
[2023-09-21 13:03:06,059][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000014952_7655424.pth...
[2023-09-21 13:03:06,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000014952_7655424.pth...
[2023-09-21 13:03:06,068][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000014528_7438336.pth
[2023-09-21 13:03:06,084][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000014528_7438336.pth
[2023-09-21 13:03:06,347][26609] Updated weights for policy 0, policy_version 14960 (0.0013)
[2023-09-21 13:03:06,348][26608] Updated weights for policy 1, policy_version 14960 (0.0011)
[2023-09-21 13:03:11,050][25893] Fps is (10 sec: 14745.6, 60 sec: 14745.6, 300 sec: 14828.9). Total num frames: 15384576. Throughput: 0: 7314.3, 1: 7304.2. Samples: 15374382. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:03:11,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:03:11,845][26608] Updated weights for policy 1, policy_version 15040 (0.0015)
[2023-09-21 13:03:11,845][26609] Updated weights for policy 0, policy_version 15040 (0.0012)
[2023-09-21 13:03:16,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14745.5, 300 sec: 14828.9). Total num frames: 15458304. Throughput: 0: 7346.4, 1: 7410.1. Samples: 15444292. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:03:16,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:03:17,167][26608] Updated weights for policy 1, policy_version 15120 (0.0015)
[2023-09-21 13:03:17,167][26609] Updated weights for policy 0, policy_version 15120 (0.0013)
[2023-09-21 13:03:21,050][25893] Fps is (10 sec: 15564.7, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 15540224. Throughput: 0: 7410.5, 1: 7405.9. Samples: 15513376. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:03:21,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:03:21,058][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000015176_7770112.pth...
[2023-09-21 13:03:21,058][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000015176_7770112.pth...
[2023-09-21 13:03:21,063][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000014736_7544832.pth
[2023-09-21 13:03:21,065][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000014736_7544832.pth
[2023-09-21 13:03:22,427][26608] Updated weights for policy 1, policy_version 15200 (0.0013)
[2023-09-21 13:03:22,428][26609] Updated weights for policy 0, policy_version 15200 (0.0015)
[2023-09-21 13:03:26,050][25893] Fps is (10 sec: 15564.8, 60 sec: 14745.6, 300 sec: 14856.7). Total num frames: 15613952. Throughput: 0: 7432.3, 1: 7451.3. Samples: 15605848. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:03:26,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:03:27,853][26608] Updated weights for policy 1, policy_version 15280 (0.0012)
[2023-09-21 13:03:27,854][26609] Updated weights for policy 0, policy_version 15280 (0.0014)
[2023-09-21 13:03:31,050][25893] Fps is (10 sec: 15564.9, 60 sec: 15018.7, 300 sec: 14884.4). Total num frames: 15695872. Throughput: 0: 7450.6, 1: 7505.0. Samples: 15674060. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:03:31,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:03:33,337][26609] Updated weights for policy 0, policy_version 15360 (0.0016)
[2023-09-21 13:03:33,338][26608] Updated weights for policy 1, policy_version 15360 (0.0017)
[2023-09-21 13:03:36,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14745.7, 300 sec: 14856.7). Total num frames: 15761408. Throughput: 0: 7472.3, 1: 7468.3. Samples: 15739620. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:03:36,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:03:36,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000015392_7880704.pth...
[2023-09-21 13:03:36,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000015392_7880704.pth...
[2023-09-21 13:03:36,067][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000014952_7655424.pth
[2023-09-21 13:03:36,067][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000014952_7655424.pth
[2023-09-21 13:03:38,798][26608] Updated weights for policy 1, policy_version 15440 (0.0013)
[2023-09-21 13:03:38,798][26609] Updated weights for policy 0, policy_version 15440 (0.0016)
[2023-09-21 13:03:41,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 15843328. Throughput: 0: 7516.2, 1: 7500.1. Samples: 15831494. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:03:41,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:03:44,369][26609] Updated weights for policy 0, policy_version 15520 (0.0013)
[2023-09-21 13:03:44,369][26608] Updated weights for policy 1, policy_version 15520 (0.0015)
[2023-09-21 13:03:46,049][25893] Fps is (10 sec: 14745.7, 60 sec: 14882.2, 300 sec: 14828.9). Total num frames: 15908864. Throughput: 0: 7488.4, 1: 7475.6. Samples: 15895856. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:03:46,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:03:50,414][26609] Updated weights for policy 0, policy_version 15600 (0.0017)
[2023-09-21 13:03:50,415][26608] Updated weights for policy 1, policy_version 15600 (0.0016)
[2023-09-21 13:03:51,050][25893] Fps is (10 sec: 13926.2, 60 sec: 14882.1, 300 sec: 14828.9). Total num frames: 15982592. Throughput: 0: 7433.0, 1: 7436.7. Samples: 15956944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:03:51,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:03:51,059][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000015608_7991296.pth...
[2023-09-21 13:03:51,059][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000015608_7991296.pth...
[2023-09-21 13:03:51,064][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000015176_7770112.pth
[2023-09-21 13:03:51,065][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000015176_7770112.pth
[2023-09-21 13:03:55,606][26609] Updated weights for policy 0, policy_version 15680 (0.0012)
[2023-09-21 13:03:55,606][26608] Updated weights for policy 1, policy_version 15680 (0.0012)
[2023-09-21 13:03:56,050][25893] Fps is (10 sec: 14745.3, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 16056320. Throughput: 0: 7484.7, 1: 7503.7. Samples: 16048858. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:03:56,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:04:01,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14882.1, 300 sec: 14828.9). Total num frames: 16130048. Throughput: 0: 7447.6, 1: 7457.0. Samples: 16114998. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:04:01,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:04:01,175][26608] Updated weights for policy 1, policy_version 15760 (0.0013)
[2023-09-21 13:04:01,176][26609] Updated weights for policy 0, policy_version 15760 (0.0016)
[2023-09-21 13:04:06,050][25893] Fps is (10 sec: 14745.4, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 16203776. Throughput: 0: 7401.2, 1: 7415.5. Samples: 16180124. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:04:06,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:04:06,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000015824_8101888.pth...
[2023-09-21 13:04:06,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000015824_8101888.pth...
[2023-09-21 13:04:06,066][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000015392_7880704.pth
[2023-09-21 13:04:06,069][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000015392_7880704.pth
[2023-09-21 13:04:06,723][26609] Updated weights for policy 0, policy_version 15840 (0.0013)
[2023-09-21 13:04:06,723][26608] Updated weights for policy 1, policy_version 15840 (0.0012)
[2023-09-21 13:04:11,050][25893] Fps is (10 sec: 15565.1, 60 sec: 15018.7, 300 sec: 14884.4). Total num frames: 16285696. Throughput: 0: 7429.7, 1: 7406.6. Samples: 16273480. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:04:11,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:04:11,975][26608] Updated weights for policy 1, policy_version 15920 (0.0014)
[2023-09-21 13:04:11,975][26609] Updated weights for policy 0, policy_version 15920 (0.0015)
[2023-09-21 13:04:16,050][25893] Fps is (10 sec: 15565.1, 60 sec: 15018.7, 300 sec: 14856.7). Total num frames: 16359424. Throughput: 0: 7422.4, 1: 7435.5. Samples: 16342664. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:04:16,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:04:17,223][26609] Updated weights for policy 0, policy_version 16000 (0.0013)
[2023-09-21 13:04:17,223][26608] Updated weights for policy 1, policy_version 16000 (0.0015)
[2023-09-21 13:04:21,050][25893] Fps is (10 sec: 15564.4, 60 sec: 15018.7, 300 sec: 14884.4). Total num frames: 16441344. Throughput: 0: 7471.4, 1: 7476.5. Samples: 16412278. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:04:21,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:04:21,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000016056_8220672.pth...
[2023-09-21 13:04:21,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000016056_8220672.pth...
[2023-09-21 13:04:21,067][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000015608_7991296.pth
[2023-09-21 13:04:21,067][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000015608_7991296.pth
[2023-09-21 13:04:22,646][26609] Updated weights for policy 0, policy_version 16080 (0.0013)
[2023-09-21 13:04:22,647][26608] Updated weights for policy 1, policy_version 16080 (0.0014)
[2023-09-21 13:04:26,050][25893] Fps is (10 sec: 15564.7, 60 sec: 15018.7, 300 sec: 14856.7). Total num frames: 16515072. Throughput: 0: 7475.7, 1: 7470.6. Samples: 16504078. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:04:26,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:04:28,163][26608] Updated weights for policy 1, policy_version 16160 (0.0011)
[2023-09-21 13:04:28,164][26609] Updated weights for policy 0, policy_version 16160 (0.0015)
[2023-09-21 13:04:31,050][25893] Fps is (10 sec: 13926.4, 60 sec: 14745.6, 300 sec: 14828.9). Total num frames: 16580608. Throughput: 0: 7479.6, 1: 6994.6. Samples: 16547198. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:04:31,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:04:33,994][26608] Updated weights for policy 1, policy_version 16240 (0.0014)
[2023-09-21 13:04:33,994][26609] Updated weights for policy 0, policy_version 16240 (0.0016)
[2023-09-21 13:04:36,050][25893] Fps is (10 sec: 13926.2, 60 sec: 14882.1, 300 sec: 14828.9). Total num frames: 16654336. Throughput: 0: 7493.4, 1: 7491.0. Samples: 16631242. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:04:36,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:04:36,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000016264_8327168.pth...
[2023-09-21 13:04:36,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000016264_8327168.pth...
[2023-09-21 13:04:36,066][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000015824_8101888.pth
[2023-09-21 13:04:36,066][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000015824_8101888.pth
[2023-09-21 13:04:39,411][26608] Updated weights for policy 1, policy_version 16320 (0.0012)
[2023-09-21 13:04:39,411][26609] Updated weights for policy 0, policy_version 16320 (0.0014)
[2023-09-21 13:04:41,050][25893] Fps is (10 sec: 15564.8, 60 sec: 14882.1, 300 sec: 14884.5). Total num frames: 16736256. Throughput: 0: 7493.4, 1: 7473.9. Samples: 16722386. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:04:41,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:04:44,811][26608] Updated weights for policy 1, policy_version 16400 (0.0013)
[2023-09-21 13:04:44,812][26609] Updated weights for policy 0, policy_version 16400 (0.0016)
[2023-09-21 13:04:46,050][25893] Fps is (10 sec: 15564.9, 60 sec: 15018.6, 300 sec: 14884.4). Total num frames: 16809984. Throughput: 0: 7509.6, 1: 7499.1. Samples: 16790388. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:04:46,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:04:50,264][26609] Updated weights for policy 0, policy_version 16480 (0.0011)
[2023-09-21 13:04:50,265][26608] Updated weights for policy 1, policy_version 16480 (0.0015)
[2023-09-21 13:04:51,050][25893] Fps is (10 sec: 14745.9, 60 sec: 15018.7, 300 sec: 14884.5). Total num frames: 16883712. Throughput: 0: 7527.7, 1: 7511.4. Samples: 16856880. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:04:51,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:04:51,057][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000016488_8441856.pth...
[2023-09-21 13:04:51,058][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000016488_8441856.pth...
[2023-09-21 13:04:51,063][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000016056_8220672.pth
[2023-09-21 13:04:51,066][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000016056_8220672.pth
[2023-09-21 13:04:55,638][26608] Updated weights for policy 1, policy_version 16560 (0.0013)
[2023-09-21 13:04:55,639][26609] Updated weights for policy 0, policy_version 16560 (0.0013)
[2023-09-21 13:04:56,050][25893] Fps is (10 sec: 14745.4, 60 sec: 15018.6, 300 sec: 14884.4). Total num frames: 16957440. Throughput: 0: 7494.3, 1: 7514.8. Samples: 16948894. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:04:56,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:05:01,050][25893] Fps is (10 sec: 14745.3, 60 sec: 15018.7, 300 sec: 14884.5). Total num frames: 17031168. Throughput: 0: 7516.9, 1: 7459.8. Samples: 17016618. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:05:01,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:05:01,250][26609] Updated weights for policy 0, policy_version 16640 (0.0013)
[2023-09-21 13:05:01,250][26608] Updated weights for policy 1, policy_version 16640 (0.0015)
[2023-09-21 13:05:06,050][25893] Fps is (10 sec: 14745.7, 60 sec: 15018.7, 300 sec: 14884.4). Total num frames: 17104896. Throughput: 0: 7409.1, 1: 7423.4. Samples: 17079742. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:05:06,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:05:06,058][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000016704_8552448.pth...
[2023-09-21 13:05:06,058][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000016704_8552448.pth...
[2023-09-21 13:05:06,063][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000016264_8327168.pth
[2023-09-21 13:05:06,064][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000016264_8327168.pth
[2023-09-21 13:05:06,943][26608] Updated weights for policy 1, policy_version 16720 (0.0016)
[2023-09-21 13:05:06,943][26609] Updated weights for policy 0, policy_version 16720 (0.0015)
[2023-09-21 13:05:11,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14882.1, 300 sec: 14884.5). Total num frames: 17178624. Throughput: 0: 7387.6, 1: 7399.0. Samples: 17169474. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:05:11,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:05:12,587][26608] Updated weights for policy 1, policy_version 16800 (0.0013)
[2023-09-21 13:05:12,588][26609] Updated weights for policy 0, policy_version 16800 (0.0015)
[2023-09-21 13:05:16,050][25893] Fps is (10 sec: 13926.7, 60 sec: 14745.6, 300 sec: 14828.9). Total num frames: 17244160. Throughput: 0: 7372.8, 1: 7842.6. Samples: 17231888. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:05:16,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:05:18,414][26609] Updated weights for policy 0, policy_version 16880 (0.0012)
[2023-09-21 13:05:18,414][26608] Updated weights for policy 1, policy_version 16880 (0.0012)
[2023-09-21 13:05:21,050][25893] Fps is (10 sec: 13926.4, 60 sec: 14609.1, 300 sec: 14856.7). Total num frames: 17317888. Throughput: 0: 7370.9, 1: 7372.5. Samples: 17294694. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:05:21,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:05:21,058][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000016912_8658944.pth...
[2023-09-21 13:05:21,058][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000016912_8658944.pth...
[2023-09-21 13:05:21,062][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000016488_8441856.pth
[2023-09-21 13:05:21,065][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000016488_8441856.pth
[2023-09-21 13:05:23,988][26609] Updated weights for policy 0, policy_version 16960 (0.0014)
[2023-09-21 13:05:23,989][26608] Updated weights for policy 1, policy_version 16960 (0.0017)
[2023-09-21 13:05:26,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14609.1, 300 sec: 14828.9). Total num frames: 17391616. Throughput: 0: 7350.6, 1: 7363.5. Samples: 17384520. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:05:26,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:05:29,647][26609] Updated weights for policy 0, policy_version 17040 (0.0015)
[2023-09-21 13:05:29,647][26608] Updated weights for policy 1, policy_version 17040 (0.0016)
[2023-09-21 13:05:31,050][25893] Fps is (10 sec: 14745.8, 60 sec: 14745.7, 300 sec: 14856.7). Total num frames: 17465344. Throughput: 0: 7309.1, 1: 7347.0. Samples: 17449908. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:05:31,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:05:34,977][26608] Updated weights for policy 1, policy_version 17120 (0.0013)
[2023-09-21 13:05:34,977][26609] Updated weights for policy 0, policy_version 17120 (0.0016)
[2023-09-21 13:05:36,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14745.6, 300 sec: 14856.7). Total num frames: 17539072. Throughput: 0: 7334.5, 1: 7335.3. Samples: 17517028. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:05:36,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:05:36,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000017128_8769536.pth...
[2023-09-21 13:05:36,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000017128_8769536.pth...
[2023-09-21 13:05:36,067][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000016704_8552448.pth
[2023-09-21 13:05:36,073][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000016704_8552448.pth
[2023-09-21 13:05:40,642][26609] Updated weights for policy 0, policy_version 17200 (0.0010)
[2023-09-21 13:05:40,643][26608] Updated weights for policy 1, policy_version 17200 (0.0016)
[2023-09-21 13:05:41,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14609.1, 300 sec: 14856.7). Total num frames: 17612800. Throughput: 0: 7298.1, 1: 7293.2. Samples: 17605498. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:05:41,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:05:45,991][26609] Updated weights for policy 0, policy_version 17280 (0.0013)
[2023-09-21 13:05:45,991][26608] Updated weights for policy 1, policy_version 17280 (0.0013)
[2023-09-21 13:05:46,050][25893] Fps is (10 sec: 15564.9, 60 sec: 14745.6, 300 sec: 14856.7). Total num frames: 17694720. Throughput: 0: 7297.8, 1: 7317.2. Samples: 17674294. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:05:46,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:05:51,050][25893] Fps is (10 sec: 15154.9, 60 sec: 14677.3, 300 sec: 14842.8). Total num frames: 17764352. Throughput: 0: 7350.7, 1: 7332.0. Samples: 17740464. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:05:51,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:05:51,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000017352_8884224.pth...
[2023-09-21 13:05:51,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000017352_8884224.pth...
[2023-09-21 13:05:51,063][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000016912_8658944.pth
[2023-09-21 13:05:51,065][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000016912_8658944.pth
[2023-09-21 13:05:51,642][26608] Updated weights for policy 1, policy_version 17360 (0.0015)
[2023-09-21 13:05:51,643][26609] Updated weights for policy 0, policy_version 17360 (0.0014)
[2023-09-21 13:05:56,050][25893] Fps is (10 sec: 13926.7, 60 sec: 14609.1, 300 sec: 14801.1). Total num frames: 17833984. Throughput: 0: 7281.8, 1: 7286.4. Samples: 17825040. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:05:56,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:05:57,491][26608] Updated weights for policy 1, policy_version 17440 (0.0017)
[2023-09-21 13:05:57,491][26609] Updated weights for policy 0, policy_version 17440 (0.0015)
[2023-09-21 13:06:01,050][25893] Fps is (10 sec: 14336.1, 60 sec: 14609.1, 300 sec: 14801.1). Total num frames: 17907712. Throughput: 0: 7294.1, 1: 7285.8. Samples: 17887984. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:06:01,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:06:03,248][26609] Updated weights for policy 0, policy_version 17520 (0.0014)
[2023-09-21 13:06:03,248][26608] Updated weights for policy 1, policy_version 17520 (0.0014)
[2023-09-21 13:06:06,050][25893] Fps is (10 sec: 14745.3, 60 sec: 14609.1, 300 sec: 14801.1). Total num frames: 17981440. Throughput: 0: 7323.0, 1: 7314.1. Samples: 17953364. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:06:06,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:06:06,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000017560_8990720.pth...
[2023-09-21 13:06:06,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000017560_8990720.pth...
[2023-09-21 13:06:06,067][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000017128_8769536.pth
[2023-09-21 13:06:06,067][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000017128_8769536.pth
[2023-09-21 13:06:08,714][26608] Updated weights for policy 1, policy_version 17600 (0.0013)
[2023-09-21 13:06:08,714][26609] Updated weights for policy 0, policy_version 17600 (0.0014)
[2023-09-21 13:06:11,050][25893] Fps is (10 sec: 14745.9, 60 sec: 14609.1, 300 sec: 14801.1). Total num frames: 18055168. Throughput: 0: 7322.4, 1: 7311.8. Samples: 18043058. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:06:11,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:06:14,153][26608] Updated weights for policy 1, policy_version 17680 (0.0010)
[2023-09-21 13:06:14,154][26609] Updated weights for policy 0, policy_version 17680 (0.0013)
[2023-09-21 13:06:16,050][25893] Fps is (10 sec: 14745.9, 60 sec: 14745.6, 300 sec: 14801.1). Total num frames: 18128896. Throughput: 0: 7360.5, 1: 7336.1. Samples: 18111254. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:06:16,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:06:19,424][26609] Updated weights for policy 0, policy_version 17760 (0.0015)
[2023-09-21 13:06:19,426][26608] Updated weights for policy 1, policy_version 17760 (0.0017)
[2023-09-21 13:06:21,050][25893] Fps is (10 sec: 15564.4, 60 sec: 14882.1, 300 sec: 14828.9). Total num frames: 18210816. Throughput: 0: 7380.4, 1: 7381.3. Samples: 18181304. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:06:21,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:06:21,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000017784_9105408.pth...
[2023-09-21 13:06:21,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000017784_9105408.pth...
[2023-09-21 13:06:21,063][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000017352_8884224.pth
[2023-09-21 13:06:21,068][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000017352_8884224.pth
[2023-09-21 13:06:24,967][26608] Updated weights for policy 1, policy_version 17840 (0.0015)
[2023-09-21 13:06:24,968][26609] Updated weights for policy 0, policy_version 17840 (0.0015)
[2023-09-21 13:06:26,050][25893] Fps is (10 sec: 15564.8, 60 sec: 14882.2, 300 sec: 14856.7). Total num frames: 18284544. Throughput: 0: 7387.6, 1: 7379.6. Samples: 18270022. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:06:26,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:06:30,349][26609] Updated weights for policy 0, policy_version 17920 (0.0012)
[2023-09-21 13:06:30,351][26608] Updated weights for policy 1, policy_version 17920 (0.0016)
[2023-09-21 13:06:31,049][25893] Fps is (10 sec: 14746.0, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 18358272. Throughput: 0: 7383.8, 1: 7406.2. Samples: 18339840. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:06:31,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:06:35,569][26609] Updated weights for policy 0, policy_version 18000 (0.0012)
[2023-09-21 13:06:35,570][26608] Updated weights for policy 1, policy_version 18000 (0.0016)
[2023-09-21 13:06:36,050][25893] Fps is (10 sec: 14745.2, 60 sec: 14882.1, 300 sec: 14828.9). Total num frames: 18432000. Throughput: 0: 7441.6, 1: 7441.8. Samples: 18410218. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:06:36,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:06:36,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000018000_9216000.pth...
[2023-09-21 13:06:36,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000018000_9216000.pth...
[2023-09-21 13:06:36,066][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000017560_8990720.pth
[2023-09-21 13:06:36,068][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000017560_8990720.pth
[2023-09-21 13:06:41,050][25893] Fps is (10 sec: 14745.4, 60 sec: 14882.1, 300 sec: 14828.9). Total num frames: 18505728. Throughput: 0: 7464.6, 1: 7468.3. Samples: 18497024. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:06:41,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:06:41,278][26609] Updated weights for policy 0, policy_version 18080 (0.0015)
[2023-09-21 13:06:41,278][26608] Updated weights for policy 1, policy_version 18080 (0.0012)
[2023-09-21 13:06:46,050][25893] Fps is (10 sec: 15565.0, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 18587648. Throughput: 0: 7514.2, 1: 7033.1. Samples: 18542612. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:06:46,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:06:46,530][26608] Updated weights for policy 1, policy_version 18160 (0.0013)
[2023-09-21 13:06:46,530][26609] Updated weights for policy 0, policy_version 18160 (0.0013)
[2023-09-21 13:06:51,050][25893] Fps is (10 sec: 15564.8, 60 sec: 14950.4, 300 sec: 14856.7). Total num frames: 18661376. Throughput: 0: 7572.8, 1: 7579.8. Samples: 18635232. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:06:51,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:06:51,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000018224_9330688.pth...
[2023-09-21 13:06:51,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000018224_9330688.pth...
[2023-09-21 13:06:51,068][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000017784_9105408.pth
[2023-09-21 13:06:51,068][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000017784_9105408.pth
[2023-09-21 13:06:51,898][26609] Updated weights for policy 0, policy_version 18240 (0.0013)
[2023-09-21 13:06:51,898][26608] Updated weights for policy 1, policy_version 18240 (0.0015)
[2023-09-21 13:06:56,050][25893] Fps is (10 sec: 14745.6, 60 sec: 15018.6, 300 sec: 14856.7). Total num frames: 18735104. Throughput: 0: 7582.4, 1: 7597.9. Samples: 18726174. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:06:56,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:06:57,724][26608] Updated weights for policy 1, policy_version 18320 (0.0014)
[2023-09-21 13:06:57,724][26609] Updated weights for policy 0, policy_version 18320 (0.0017)
[2023-09-21 13:07:01,050][25893] Fps is (10 sec: 14745.7, 60 sec: 15018.7, 300 sec: 14856.7). Total num frames: 18808832. Throughput: 0: 7528.8, 1: 7534.1. Samples: 18789084. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:07:01,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:07:03,107][26608] Updated weights for policy 1, policy_version 18400 (0.0013)
[2023-09-21 13:07:03,108][26609] Updated weights for policy 0, policy_version 18400 (0.0014)
[2023-09-21 13:07:06,050][25893] Fps is (10 sec: 14745.7, 60 sec: 15018.7, 300 sec: 14856.7). Total num frames: 18882560. Throughput: 0: 7502.7, 1: 7519.3. Samples: 18857290. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:07:06,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:07:06,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000018440_9441280.pth...
[2023-09-21 13:07:06,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000018440_9441280.pth...
[2023-09-21 13:07:06,067][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000018000_9216000.pth
[2023-09-21 13:07:06,068][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000018000_9216000.pth
[2023-09-21 13:07:08,660][26609] Updated weights for policy 0, policy_version 18480 (0.0015)
[2023-09-21 13:07:08,660][26608] Updated weights for policy 1, policy_version 18480 (0.0012)
[2023-09-21 13:07:11,050][25893] Fps is (10 sec: 14745.6, 60 sec: 15018.7, 300 sec: 14856.7). Total num frames: 18956288. Throughput: 0: 7483.2, 1: 7476.2. Samples: 18943196. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:07:11,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:07:14,470][26609] Updated weights for policy 0, policy_version 18560 (0.0013)
[2023-09-21 13:07:14,471][26608] Updated weights for policy 1, policy_version 18560 (0.0013)
[2023-09-21 13:07:16,050][25893] Fps is (10 sec: 13926.2, 60 sec: 14882.1, 300 sec: 14828.9). Total num frames: 19021824. Throughput: 0: 7443.9, 1: 7405.9. Samples: 19008084. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:07:16,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:07:20,110][26609] Updated weights for policy 0, policy_version 18640 (0.0015)
[2023-09-21 13:07:20,110][26608] Updated weights for policy 1, policy_version 18640 (0.0013)
[2023-09-21 13:07:21,050][25893] Fps is (10 sec: 13926.4, 60 sec: 14745.7, 300 sec: 14801.1). Total num frames: 19095552. Throughput: 0: 7347.5, 1: 7358.3. Samples: 19071974. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:07:21,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:07:21,056][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000018648_9547776.pth...
[2023-09-21 13:07:21,056][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000018648_9547776.pth...
[2023-09-21 13:07:21,066][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000018224_9330688.pth
[2023-09-21 13:07:21,066][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000018224_9330688.pth
[2023-09-21 13:07:26,050][25893] Fps is (10 sec: 13926.3, 60 sec: 14609.0, 300 sec: 14801.1). Total num frames: 19161088. Throughput: 0: 7308.3, 1: 7297.9. Samples: 19154306. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:07:26,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:07:26,285][26608] Updated weights for policy 1, policy_version 18720 (0.0009)
[2023-09-21 13:07:26,286][26609] Updated weights for policy 0, policy_version 18720 (0.0012)
[2023-09-21 13:07:31,050][25893] Fps is (10 sec: 13926.2, 60 sec: 14609.0, 300 sec: 14773.4). Total num frames: 19234816. Throughput: 0: 7254.4, 1: 7760.8. Samples: 19218300. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:07:31,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:07:31,910][26608] Updated weights for policy 1, policy_version 18800 (0.0012)
[2023-09-21 13:07:31,910][26609] Updated weights for policy 0, policy_version 18800 (0.0015)
[2023-09-21 13:07:36,050][25893] Fps is (10 sec: 14745.6, 60 sec: 14609.1, 300 sec: 14773.4). Total num frames: 19308544. Throughput: 0: 7231.3, 1: 7228.1. Samples: 19285906. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:07:36,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:07:36,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000018856_9654272.pth...
[2023-09-21 13:07:36,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000018856_9654272.pth...
[2023-09-21 13:07:36,063][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000018440_9441280.pth
[2023-09-21 13:07:36,065][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000018440_9441280.pth
[2023-09-21 13:07:37,303][26609] Updated weights for policy 0, policy_version 18880 (0.0015)
[2023-09-21 13:07:37,304][26608] Updated weights for policy 1, policy_version 18880 (0.0013)
[2023-09-21 13:07:41,050][25893] Fps is (10 sec: 14745.6, 60 sec: 14609.1, 300 sec: 14801.1). Total num frames: 19382272. Throughput: 0: 7217.4, 1: 7209.2. Samples: 19375370. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:07:41,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:07:42,903][26608] Updated weights for policy 1, policy_version 18960 (0.0012)
[2023-09-21 13:07:42,904][26609] Updated weights for policy 0, policy_version 18960 (0.0016)
[2023-09-21 13:07:46,050][25893] Fps is (10 sec: 14745.7, 60 sec: 14472.5, 300 sec: 14801.1). Total num frames: 19456000. Throughput: 0: 7245.8, 1: 7225.7. Samples: 19440308. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:07:46,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:07:48,246][26609] Updated weights for policy 0, policy_version 19040 (0.0012)
[2023-09-21 13:07:48,246][26608] Updated weights for policy 1, policy_version 19040 (0.0014)
[2023-09-21 13:07:51,050][25893] Fps is (10 sec: 15565.0, 60 sec: 14609.1, 300 sec: 14828.9). Total num frames: 19537920. Throughput: 0: 7256.4, 1: 7245.9. Samples: 19509892. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:07:51,050][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:07:51,059][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000019080_9768960.pth...
[2023-09-21 13:07:51,059][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000019080_9768960.pth...
[2023-09-21 13:07:51,062][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000018648_9547776.pth
[2023-09-21 13:07:51,066][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000018648_9547776.pth
[2023-09-21 13:07:53,988][26609] Updated weights for policy 0, policy_version 19120 (0.0015)
[2023-09-21 13:07:53,988][26608] Updated weights for policy 1, policy_version 19120 (0.0014)
[2023-09-21 13:07:56,050][25893] Fps is (10 sec: 14745.6, 60 sec: 14472.5, 300 sec: 14801.1). Total num frames: 19603456. Throughput: 0: 7256.4, 1: 7262.6. Samples: 19596554. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:07:56,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:07:59,302][26608] Updated weights for policy 1, policy_version 19200 (0.0012)
[2023-09-21 13:07:59,302][26609] Updated weights for policy 0, policy_version 19200 (0.0011)
[2023-09-21 13:08:01,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14609.0, 300 sec: 14828.9). Total num frames: 19685376. Throughput: 0: 7299.6, 1: 7327.0. Samples: 19666282. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:08:01,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:08:04,390][26609] Updated weights for policy 0, policy_version 19280 (0.0010)
[2023-09-21 13:08:04,391][26608] Updated weights for policy 1, policy_version 19280 (0.0012)
[2023-09-21 13:08:06,050][25893] Fps is (10 sec: 16383.9, 60 sec: 14745.6, 300 sec: 14856.7). Total num frames: 19767296. Throughput: 0: 7414.5, 1: 7402.1. Samples: 19738720. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:08:06,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:08:06,061][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000019304_9883648.pth...
[2023-09-21 13:08:06,061][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000019304_9883648.pth...
[2023-09-21 13:08:06,067][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000018856_9654272.pth
[2023-09-21 13:08:06,069][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000018856_9654272.pth
[2023-09-21 13:08:09,865][26608] Updated weights for policy 1, policy_version 19360 (0.0015)
[2023-09-21 13:08:09,866][26609] Updated weights for policy 0, policy_version 19360 (0.0015)
[2023-09-21 13:08:11,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14609.0, 300 sec: 14828.9). Total num frames: 19832832. Throughput: 0: 7484.5, 1: 7484.3. Samples: 19827898. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:08:11,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:08:15,920][26609] Updated weights for policy 0, policy_version 19440 (0.0013)
[2023-09-21 13:08:15,920][26608] Updated weights for policy 1, policy_version 19440 (0.0012)
[2023-09-21 13:08:16,050][25893] Fps is (10 sec: 13926.6, 60 sec: 14745.6, 300 sec: 14801.1). Total num frames: 19906560. Throughput: 0: 7454.1, 1: 7431.6. Samples: 19888154. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:08:16,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:08:21,050][25893] Fps is (10 sec: 14745.5, 60 sec: 14745.5, 300 sec: 14801.1). Total num frames: 19980288. Throughput: 0: 7415.0, 1: 7418.9. Samples: 19953430. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:08:21,051][25893] Avg episode reward: [(0, '1000.000'), (1, '1000.000')]
[2023-09-21 13:08:21,060][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000019512_9990144.pth...
[2023-09-21 13:08:21,060][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000019512_9990144.pth...
[2023-09-21 13:08:21,065][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000019080_9768960.pth
[2023-09-21 13:08:21,067][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000019080_9768960.pth
[2023-09-21 13:08:21,524][26609] Updated weights for policy 0, policy_version 19520 (0.0012)
[2023-09-21 13:08:21,525][26608] Updated weights for policy 1, policy_version 19520 (0.0013)
[2023-09-21 13:08:23,273][26520] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000000
[2023-09-21 13:08:23,275][26631] Stopping RolloutWorker_w2...
[2023-09-21 13:08:23,275][26628] Stopping RolloutWorker_w6...
[2023-09-21 13:08:23,275][26631] Loop rollout_proc2_evt_loop terminating...
[2023-09-21 13:08:23,275][26612] Stopping RolloutWorker_w3...
[2023-09-21 13:08:23,275][26611] Stopping RolloutWorker_w1...
[2023-09-21 13:08:23,275][26620] Stopping RolloutWorker_w4...
[2023-09-21 13:08:23,275][26630] Stopping RolloutWorker_w5...
[2023-09-21 13:08:23,275][26519] Stopping Batcher_0...
[2023-09-21 13:08:23,275][25893] Component RolloutWorker_w6 stopped!
[2023-09-21 13:08:23,276][26628] Loop rollout_proc6_evt_loop terminating...
[2023-09-21 13:08:23,275][26610] Stopping RolloutWorker_w0...
[2023-09-21 13:08:23,275][26632] Stopping RolloutWorker_w7...
[2023-09-21 13:08:23,276][26520] Stopping Batcher_1...
[2023-09-21 13:08:23,276][26612] Loop rollout_proc3_evt_loop terminating...
[2023-09-21 13:08:23,276][26519] Loop batcher_evt_loop terminating...
[2023-09-21 13:08:23,276][26611] Loop rollout_proc1_evt_loop terminating...
[2023-09-21 13:08:23,276][26620] Loop rollout_proc4_evt_loop terminating...
[2023-09-21 13:08:23,276][26630] Loop rollout_proc5_evt_loop terminating...
[2023-09-21 13:08:23,276][26610] Loop rollout_proc0_evt_loop terminating...
[2023-09-21 13:08:23,276][26632] Loop rollout_proc7_evt_loop terminating...
[2023-09-21 13:08:23,276][25893] Component RolloutWorker_w3 stopped!
[2023-09-21 13:08:23,277][26520] Loop batcher_evt_loop terminating...
[2023-09-21 13:08:23,277][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000019544_10006528.pth...
[2023-09-21 13:08:23,277][25893] Component RolloutWorker_w2 stopped!
[2023-09-21 13:08:23,278][25893] Component RolloutWorker_w1 stopped!
[2023-09-21 13:08:23,278][25893] Component RolloutWorker_w4 stopped!
[2023-09-21 13:08:23,279][25893] Component RolloutWorker_w5 stopped!
[2023-09-21 13:08:23,279][25893] Component Batcher_0 stopped!
[2023-09-21 13:08:23,280][25893] Component RolloutWorker_w0 stopped!
[2023-09-21 13:08:23,280][26520] Removing ./train_dir/Pendulum/checkpoint_p1/checkpoint_000019304_9883648.pth
[2023-09-21 13:08:23,280][25893] Component Batcher_1 stopped!
[2023-09-21 13:08:23,281][26520] Saving ./train_dir/Pendulum/checkpoint_p1/checkpoint_000019544_10006528.pth...
[2023-09-21 13:08:23,281][25893] Component RolloutWorker_w7 stopped!
[2023-09-21 13:08:23,282][26519] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000000
[2023-09-21 13:08:23,283][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000019544_10006528.pth...
[2023-09-21 13:08:23,284][26520] Stopping LearnerWorker_p1...
[2023-09-21 13:08:23,284][26520] Loop learner_proc1_evt_loop terminating...
[2023-09-21 13:08:23,284][25893] Component LearnerWorker_p1 stopped!
[2023-09-21 13:08:23,286][26519] Removing ./train_dir/Pendulum/checkpoint_p0/checkpoint_000019304_9883648.pth
[2023-09-21 13:08:23,286][26519] Saving ./train_dir/Pendulum/checkpoint_p0/checkpoint_000019544_10006528.pth...
[2023-09-21 13:08:23,289][26519] Stopping LearnerWorker_p0...
[2023-09-21 13:08:23,290][26519] Loop learner_proc0_evt_loop terminating...
[2023-09-21 13:08:23,290][25893] Component LearnerWorker_p0 stopped!
[2023-09-21 13:08:23,321][26609] Weights refcount: 2 0
[2023-09-21 13:08:23,322][26609] Stopping InferenceWorker_p0-w0...
[2023-09-21 13:08:23,322][26609] Loop inference_proc0-0_evt_loop terminating...
[2023-09-21 13:08:23,322][26608] Weights refcount: 2 0
[2023-09-21 13:08:23,322][25893] Component InferenceWorker_p0-w0 stopped!
[2023-09-21 13:08:23,323][26608] Stopping InferenceWorker_p1-w0...
[2023-09-21 13:08:23,324][26608] Loop inference_proc1-0_evt_loop terminating...
[2023-09-21 13:08:23,324][25893] Component InferenceWorker_p1-w0 stopped!
[2023-09-21 13:08:23,324][25893] Waiting for process learner_proc0 to stop...
[2023-09-21 13:08:23,917][25893] Waiting for process learner_proc1 to stop...
[2023-09-21 13:08:23,917][25893] Waiting for process inference_proc0-0 to join...
[2023-09-21 13:08:23,918][25893] Waiting for process inference_proc1-0 to join...
[2023-09-21 13:08:24,009][25893] Waiting for process rollout_proc0 to join...
[2023-09-21 13:08:24,010][25893] Waiting for process rollout_proc1 to join...
[2023-09-21 13:08:24,010][25893] Waiting for process rollout_proc2 to join...
[2023-09-21 13:08:24,011][25893] Waiting for process rollout_proc3 to join...
[2023-09-21 13:08:24,011][25893] Waiting for process rollout_proc4 to join...
[2023-09-21 13:08:24,012][25893] Waiting for process rollout_proc5 to join...
[2023-09-21 13:08:24,013][25893] Waiting for process rollout_proc6 to join...
[2023-09-21 13:08:24,013][25893] Waiting for process rollout_proc7 to join...
[2023-09-21 13:08:24,014][25893] Batcher 0 profile tree view:
batching: 37.5176, releasing_batches: 3.5001
[2023-09-21 13:08:24,015][25893] Batcher 1 profile tree view:
batching: 37.3802, releasing_batches: 3.4058
[2023-09-21 13:08:24,015][25893] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0052
wait_policy_total: 197.8628
update_model: 17.5265
weight_update: 0.0012
one_step: 0.0014
handle_policy_step: 1064.8216
deserialize: 29.9158, stack: 6.6167, obs_to_device_normalize: 215.6081, forward: 524.6482, send_messages: 94.9276
prepare_outputs: 131.8160
to_cpu: 65.0939
[2023-09-21 13:08:24,016][25893] InferenceWorker_p1-w0 profile tree view:
wait_policy: 0.0051
wait_policy_total: 198.0068
update_model: 17.6309
weight_update: 0.0014
one_step: 0.0013
handle_policy_step: 1064.7487
deserialize: 29.8239, stack: 6.7081, obs_to_device_normalize: 218.6923, forward: 524.2511, send_messages: 93.1288
prepare_outputs: 131.0907
to_cpu: 65.3875
[2023-09-21 13:08:24,017][25893] Learner 0 profile tree view:
misc: 0.0147, prepare_batch: 20.7340
train: 105.3518
epoch_init: 0.0619, minibatch_init: 1.7278, losses_postprocess: 2.8779, kl_divergence: 1.3246, after_optimizer: 1.5495
calculate_losses: 31.4198
losses_init: 0.0547, forward_head: 3.5766, bptt_initial: 0.2025, bptt: 0.2229, tail: 11.7922, advantages_returns: 1.6187, losses: 12.0305
update: 64.2048
clip: 8.2786
[2023-09-21 13:08:24,018][25893] Learner 1 profile tree view:
misc: 0.0141, prepare_batch: 20.8063
train: 105.2304
epoch_init: 0.0608, minibatch_init: 1.6924, losses_postprocess: 2.8806, kl_divergence: 1.3348, after_optimizer: 1.5604
calculate_losses: 31.4406
losses_init: 0.0553, forward_head: 3.5353, bptt_initial: 0.2094, bptt: 0.2054, tail: 11.8301, advantages_returns: 1.6236, losses: 12.0565
update: 64.0295
clip: 8.2061
[2023-09-21 13:08:24,018][25893] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 1.6330, enqueue_policy_requests: 73.7755, complete_rollouts: 2.4388, env_step: 236.3816, overhead: 95.0424
save_policy_outputs: 176.1176
split_output_tensors: 60.5702
[2023-09-21 13:08:24,019][25893] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 1.6406, enqueue_policy_requests: 72.3951, complete_rollouts: 2.4315, env_step: 233.5837, overhead: 91.9116
save_policy_outputs: 172.1062
split_output_tensors: 59.3377
[2023-09-21 13:08:24,020][25893] Loop Runner_EvtLoop terminating...
[2023-09-21 13:08:24,020][25893] Runner profile tree view:
main_loop: 1364.2162
[2023-09-21 13:08:24,020][25893] Collected {1: 10006528, 0: 10006528}, FPS: 14670.0