til-26-ae-agent / phase1_summary.txt
E-Rong's picture
Upload phase1_summary.txt with huggingface_hub
0b0cf6d verified
=== Phase 1 Summary ===
Training: MaskablePPO vs Random Opponents
Timesteps: 500,352
Final Training Reward: 237.0
Evaluation (100 episodes vs Random):
=== TIL-26-AE Phase 1 Evaluation Results ===
Model: phase1_final.zip (500k steps)
Episodes: 100
Win Rate: 92.0% (92/100)
Avg Reward: 180.1
Avg Episode Length: 200.0
Avg Bombs/Episode: 20.4
Survival Rate (198+ steps): 100.0%
Checkpoints saved: ckpt_50000 to ckpt_400000 + phase1_final.zip