Project: RL1/RL2 (obsolete)
Collection
Older models that are no longer useful for anything in RL1 or RL2, or are now unused as experimentation discontinued. • 16 items • Updated
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Models trained to sweep over alpha/discount_rate, while holding the seed constant, without the use of previous action to get an idea of suitable values. Not used for anything, but aren't strictly useless or obsolete.
See config.cfg for hyperparams
Wandb: https://wandb.ai/devinterp/jaxgmg2_3phase_no_pa
Sweep Hyperparams:
alpha=?
discount_rate=?
Shared Hyperparams:
rl_action=train
num_rollout_steps=64
lr=5e-05
eff_horizon=None
eval_every=1
use_wandb=True
use_hf=True
use_log=True
num_total_env_steps=5000000000
checkpoint=al_0.007603050854189552_g_0.99_seed_100_no_pa
render_sixel=False
sixel_idx=60
seed=100
mask_type=first_episode
penalize_time=False
optim=adam
live_monitor=False
use_bf16=False
deterministic=True
eval_schedule=0:1,250:2,500:5,1000:10,2000:20
grad_acc_per_chunk=5
num_rollout_chunks=1
cheese_loc=any
env_layout=open
env_size=13
num_levels=9600
f_str_ckpt=al_{alpha}_g_{discount_rate}_seed_{seed}_no_pa
wandb_project=jaxgmg2_3phase_no_pa
ckpt_dir=jaxgmg2_3phase_no_pa
duplication_factor=-1
smoke=False
compile=True
num_chains=6
num_draws=3000
num_steps_bw_draws=1
on_policy=True
llc_nbeta=3000
localization=10
exact_solver_each_draw=False
llc_optimizer=sgld
iw_clip_eps=None
rmsprop_burnin_steps=20
llc_data_file=llc_scan_open_reinforce.pkl
llc_checkpoint_index=None
llc_checkpoint_number=None
sink=None
repo_id=davidquarel/jaxgmg_ckpt_zip
use_shuffled_checkpoints=False
force_re_download=False
off_distribution_data=False
weight_restrictions=None
weight_restrictions_invert=False
evaluate_every_position=False
use_prev_action=False
num_prev_actions=1
ntfy=david_jaxgmg