| === Phase 1 Summary === | |
| Training: MaskablePPO vs Random Opponents | |
| Timesteps: 500,352 | |
| Final Training Reward: 237.0 | |
| Evaluation (100 episodes vs Random): | |
| === TIL-26-AE Phase 1 Evaluation Results === | |
| Model: phase1_final.zip (500k steps) | |
| Episodes: 100 | |
| Win Rate: 92.0% (92/100) | |
| Avg Reward: 180.1 | |
| Avg Episode Length: 200.0 | |
| Avg Bombs/Episode: 20.4 | |
| Survival Rate (198+ steps): 100.0% | |
| Checkpoints saved: ckpt_50000 to ckpt_400000 + phase1_final.zip | |