YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

OBSOLETE

These models were originally used for RL1, but were trained with previous action, and with the variable learning rate bug. They have since been replaced with these models.

Wandb Logs https://wandb.ai/devinterp/jaxgmg_al_sweep

Shared training config

num_rollout_steps=64
lr=5e-05
discount_rate=0.99
eff_horizon=None
eval_every=1
use_wandb=True
num_total_env_steps=5000000000
render_sixel=False
seed=42
mask_type=first_episode
penalize_time=False
optim=adam
live_monitor=False
checkpoint_schedule=0:1,250:2,500:5,1000:10,2000:20
grad_acc_per_chunk=4
num_rollout_chunks=1
env_layout=open
env_size=13
num_levels=9600
env_steps_per_loop=None
total_loops=None
wandb_project=jaxgmg_al_sweep
ckpt_dir=jaxgmg_al_sweep
duplication_factor=1

Training config that differs between runs

cheese_loc in [row, any] sampled uniform
alpha in [1e-3, 1] sampled log_uniform

Models saved as al_{alpha:.1e}_{cheese_loc}

Levels are sampled from distribution level ~ (1-alpha) cheese_in_corner + (alpha) cheese_elsewhere

All levels are 13x13 grids, with a 1x1 wide wall bordering all edges (leaving 11x11 navigatible space). Evaluations performed on all valid environmental configurations 11^2 * (11^2 - 1) = 14520.

We log the average regret over

  • regret/any all cells,
  • regret/corner the top-left-corner,
  • regret/row the top row,
  • regret/bot the bottom row,
  • regret/dist expected on-distribution regret = (1-alpha) * corner_regret + alpha * (cheese_loc_regret)

cheese_in_corner : Cheese always spawns in the top left corner.

  • alpha: The mixing parameter for the environmental distribution.
  • Lower values of alpha mean more levels where the cheese is in the corner -> model has more pressure to goal. misgen. and learn to desire top-left-corner
  • cheese_loc: Controls the behaviour of the cheese_elsewhere environment
    • row : The cheese is placed uniformly at random somewhere along the top row
    • any : The cheese is placed uniformly at random anywhere in the grid
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including timaeus/jaxgmg_al_sweep