YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
jaxgmg2_mixture_east
Note: Einar trained these models and the description below is uncertain.
~243 RL agent checkpoints trained on the JaxGMG maze environment with a mixture training distribution: a blend of uniform state visitation and a biased distribution where the mouse starts east of the cheese. Used to study how training distribution mixtures affect learned behaviour and phase transitions.
WandB: https://wandb.ai/devinterp/jaxgmg2_mixture
Contents
This repository contains checkpoints from several distinct experiments with different mixture ratios:
| Group | Mixture | alpha | Steps | Seeds | YAML |
|---|---|---|---|---|---|
| mixture_90uniform_10east_* | 90% uniform + 10% east | 0.0 | 10B | 42-151 | (no yaml found) |
| mixture_85uniform_15east_* | 85% uniform + 15% east | 1.0 | 2B | 152-161 | mixture_train_15_east.yaml |
| mixture_80uniform_20east_* | 80% uniform + 20% east | 1.0 | 2B | 162-361 | mixture_train_20_east.yaml, mixture_train_20_east_x90.yaml |
| mixture_70uniform_30east_* | 70% uniform + 30% east | 1.0 | 2B | 172-271 | mixture_train_30_east.yaml, mixture_train_30_east_x90.yaml |
| mixture_60uniform_40east_* | 60% uniform + 40% east | 1.0 | 2B | 182-191 | mixture_train_40_east.yaml |
| mixture_100uniform_* | 100% uniform (control) | — | — | 52-54 | (no yaml found) |
Note: The 90/10 group uses alpha=0.0 and cheese_loc=corner, making it from an earlier/different experiment. The 100uniform control group is also from an earlier experiment with uncertain hyperparams. All other groups use alpha=1.0 and cheese_loc=any.
Shared Hyperparams (for 85/15 through 60/40 groups)
rl_action=train
alpha=1.0
discount_rate=0.98
lr=5e-05
num_total_env_steps=2000000000
num_rollout_steps=64
num_levels=9600
cheese_loc=any
env_layout=open
env_size=13
log_optimizer_state=True
ckpt_dir=jaxgmg2_mixture_east
wandb_project=jaxgmg2_mixture
use_wandb=True
use_hf=True
The env_rule_mixture parameter varies per group (see individual yamls).
Naming Schema
Checkpoints are named mixture_{p}uniform_{100-p}east_seed_{seed}.
Reproduced with
Yamls from rl/einar/pattern-merge-runpod branch:
timaeus run mixture_train_15_east.yaml
timaeus run mixture_train_20_east.yaml
timaeus run mixture_train_20_east_x90.yaml
timaeus run mixture_train_30_east.yaml
timaeus run mixture_train_30_east_x90.yaml
timaeus run mixture_train_40_east.yaml
from the timaeus monorepo.