jaxgmg2_mixture_east

Note: Einar trained these models and the description below is uncertain.

~243 RL agent checkpoints trained on the JaxGMG maze environment with a mixture training distribution: a blend of uniform state visitation and a biased distribution where the mouse starts east of the cheese. Used to study how training distribution mixtures affect learned behaviour and phase transitions.

WandB: https://wandb.ai/devinterp/jaxgmg2_mixture

This repository contains checkpoints from several distinct experiments with different mixture ratios:

Group	Mixture	alpha	Steps	Seeds	YAML
mixture_90uniform_10east_*	90% uniform + 10% east	0.0	10B	42-151	(no yaml found)
mixture_85uniform_15east_*	85% uniform + 15% east	1.0	2B	152-161	`mixture_train_15_east.yaml`
mixture_80uniform_20east_*	80% uniform + 20% east	1.0	2B	162-361	`mixture_train_20_east.yaml`, `mixture_train_20_east_x90.yaml`
mixture_70uniform_30east_*	70% uniform + 30% east	1.0	2B	172-271	`mixture_train_30_east.yaml`, `mixture_train_30_east_x90.yaml`
mixture_60uniform_40east_*	60% uniform + 40% east	1.0	2B	182-191	`mixture_train_40_east.yaml`
mixture_100uniform_*	100% uniform (control)	—	—	52-54	(no yaml found)

Note: The 90/10 group uses alpha=0.0 and cheese_loc=corner, making it from an earlier/different experiment. The 100uniform control group is also from an earlier experiment with uncertain hyperparams. All other groups use alpha=1.0 and cheese_loc=any.

Shared Hyperparams (for 85/15 through 60/40 groups)

rl_action=train
alpha=1.0
discount_rate=0.98
lr=5e-05
num_total_env_steps=2000000000
num_rollout_steps=64
num_levels=9600
cheese_loc=any
env_layout=open
env_size=13
log_optimizer_state=True
ckpt_dir=jaxgmg2_mixture_east
wandb_project=jaxgmg2_mixture
use_wandb=True
use_hf=True

The env_rule_mixture parameter varies per group (see individual yamls).

Naming Schema

Checkpoints are named mixture_{p}uniform_{100-p}east_seed_{seed}.

Reproduced with

Yamls from rl/einar/pattern-merge-runpod branch:

timaeus run mixture_train_15_east.yaml
timaeus run mixture_train_20_east.yaml
timaeus run mixture_train_20_east_x90.yaml
timaeus run mixture_train_30_east.yaml
timaeus run mixture_train_30_east_x90.yaml
timaeus run mixture_train_40_east.yaml

from the timaeus monorepo.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including timaeus/jaxgmg2_mixture_east

Project: RL1 RL2

Collection

Models in use for RL1 + RL2 + susceptibility html plots + susceptibility viewer + action probs viewer. RL1 experiments redone with these models. • 14 items • Updated 9 days ago

timaeus
/

jaxgmg2_mixture_east