JoshuaFreeman commited on
Commit
beba66a
·
verified ·
1 Parent(s): 2296e2d

Update README for v13b

Browse files
Files changed (1) hide show
  1. README.md +30 -17
README.md CHANGED
@@ -11,26 +11,38 @@ tags:
11
 
12
  PPO-trained agent for [OpenFront.io](https://openfront.io), a multiplayer territory control game.
13
 
 
 
 
 
14
  ## Training Details
15
 
16
  - **Algorithm:** PPO (Proximal Policy Optimization)
17
  - **Architecture:** Actor-Critic with shared backbone (512→512→256)
18
- - **Observation dim:** 80
19
- - **Max neighbors:** 16
20
- - **Maps:** plains, big_plains, world, giantworldmap, ocean_and_land, half_land_half_ocean, europe, europeclassic, northamerica, africa, asia, australia, southamerica, mediterranean, britannia, britanniaclassic, eastasia, oceania, pangaea, mena, aegean, alps, amazonriver, amazonriverwide, arctic, baikal, beringstrait, betweentwoseas, blacksea, bosphorusstraits, deglaciatedantarctica, falklandislands, faroeislands, fourislands, gatewaytotheatlantic, gulfofstlawrence, halkidiki, hawaii, iceland, italia, japan, lemnos, lisbon, manicouagan, niledelta, passage, sanfrancisco, straitofgibraltar, straitofhormuz, surrounded, thebox, theboxplus, tourney1, tourney2, tourney3, tourney4, tradersdream, twolakes, worldrotated, yenisei, achiran, mars, milkyway, montreal, newyorkcity, pluto, reglaciatedantarctica (random per episode)
21
- - **Opponents:** 2 Easy bots
22
- - **Parallel envs:** 8
23
- - **Learning rate:** 0.00015
24
- - **Rollout steps:** 512
25
- - **Updates trained:** 2750
26
- - **Global steps:** 11264000
27
- - **Best mean reward:** 2761.6309762120245
28
-
29
- ## Final Training Metrics
30
-
31
- - **Mean reward:** 1866.676198823452
32
- - **Mean episode length:** 35228.65
33
- - **Loss:** 17.78866958618164
 
 
 
 
 
 
 
 
34
 
35
  ## Usage
36
 
@@ -39,7 +51,8 @@ from train import ActorCritic
39
  import torch
40
 
41
  model = ActorCritic(obs_dim=80, max_neighbors=16, hidden_sizes=[512, 512, 256])
42
- model.load_state_dict(torch.load("best_model.pt", weights_only=True))
 
43
  model.eval()
44
  ```
45
 
 
11
 
12
  PPO-trained agent for [OpenFront.io](https://openfront.io), a multiplayer territory control game.
13
 
14
+ ## Model Version: v13b
15
+
16
+ Current best model trained with normalized elimination reward and winner bonus.
17
+
18
  ## Training Details
19
 
20
  - **Algorithm:** PPO (Proximal Policy Optimization)
21
  - **Architecture:** Actor-Critic with shared backbone (512→512→256)
22
+ - **Observation dim:** 80 (16 player stats + 16 neighbors × 4 features)
23
+ - **Action space:** MultiDiscrete [17 action types, 16 targets, 5 troop fractions]
24
+ - **Maps:** plains, big_plains, world, giantworldmap, ocean_and_land, half_land_half_ocean (random per episode)
25
+ - **Parallel envs:** 16
26
+ - **Learning rate:** 1.5e-4 (constant)
27
+ - **Rollout steps:** 1024
28
+ - **Batch size:** 16,384
29
+ - **Value function coefficient:** 0.5
30
+ - **Updates trained:** 1550 (ongoing)
31
+
32
+ ## Reward Design (v13)
33
+
34
+ Normalized elimination reward — total reward sums to +1.0 on a full win regardless of opponent count:
35
+ - **Per-kill:** `+1/N` per opponent eliminated (N = starting opponents)
36
+ - **Winner bonus:** remaining alive opponents credited as `aliveCount/N` when `game.getWinner()` fires
37
+ - **Death penalty:** -1.0
38
+
39
+ ## Curriculum
40
+
41
+ Win-rate-gated 12-stage curriculum advancing through Easy → Medium → Hard difficulty and 2 → 15 opponents. Stages advance only when rolling win rate exceeds per-stage threshold (75% down to 45%) over 200 episodes.
42
+
43
+ ## Eval Results
44
+
45
+ - **Easy/2 opponents:** 100% win rate (20/20 games)
46
 
47
  ## Usage
48
 
 
51
  import torch
52
 
53
  model = ActorCritic(obs_dim=80, max_neighbors=16, hidden_sizes=[512, 512, 256])
54
+ checkpoint = torch.load("best_model.pt", map_location="cpu", weights_only=False)
55
+ model.load_state_dict(checkpoint["model_state_dict"])
56
  model.eval()
57
  ```
58