YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
OBSOLETE
These models were originally used for RL1, but were trained with previous action, and with the variable learning rate bug. They have since been replaced with these models.
Wandb Logs https://wandb.ai/devinterp/jaxgmg_al_sweep
Shared training config
num_rollout_steps=64
lr=5e-05
discount_rate=0.99
eff_horizon=None
eval_every=1
use_wandb=True
num_total_env_steps=5000000000
render_sixel=False
seed=42
mask_type=first_episode
penalize_time=False
optim=adam
live_monitor=False
checkpoint_schedule=0:1,250:2,500:5,1000:10,2000:20
grad_acc_per_chunk=4
num_rollout_chunks=1
env_layout=open
env_size=13
num_levels=9600
env_steps_per_loop=None
total_loops=None
wandb_project=jaxgmg_al_sweep
ckpt_dir=jaxgmg_al_sweep
duplication_factor=1
Training config that differs between runs
cheese_loc in [row, any] sampled uniform
alpha in [1e-3, 1] sampled log_uniform
Models saved as al_{alpha:.1e}_{cheese_loc}
Levels are sampled from distribution level ~ (1-alpha) cheese_in_corner + (alpha) cheese_elsewhere
All levels are 13x13 grids, with a 1x1 wide wall bordering all edges (leaving 11x11 navigatible space). Evaluations performed on all valid environmental configurations 11^2 * (11^2 - 1) = 14520.
We log the average regret over
regret/anyall cells,regret/cornerthe top-left-corner,regret/rowthe top row,regret/botthe bottom row,regret/distexpected on-distribution regret = (1-alpha) * corner_regret + alpha * (cheese_loc_regret)
cheese_in_corner : Cheese always spawns in the top left corner.
alpha: The mixing parameter for the environmental distribution.
- Lower values of
alphamean more levels where the cheese is in the corner -> model has more pressure to goal. misgen. and learn to desire top-left-corner
cheese_loc: Controls the behaviour of thecheese_elsewhereenvironmentrow: The cheese is placed uniformly at random somewhere along the top rowany: The cheese is placed uniformly at random anywhere in the grid