Project: RL1/RL2 (obsolete)
Collection
Older models that are no longer useful for anything in RL1 or RL2, or are now unused as experimentation discontinued. • 16 items • Updated
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
A rerun of the models
al_0.75_g_0.97_seed_122_pa_1
al_0.75_g_0.97_seed_131_pa_1
al_0.75_g_0.97_seed_200_pa_1
as previous runs with the same seed had chaotic loss/regret curves. Didn't replicate with using the same seed, attribute bad runs to faulty hardware/something I can't control.
Wandb: https://wandb.ai/devinterp/jaxgmg2_cursed
Hyperparams:
rl_action=train
model_type=impala
lr=5e-05
discount_rate=0.97
num_rollout_steps=64
grad_acc_per_chunk=4
num_rollout_chunks=1
cheese_loc=any
env_layout=open
alpha=0.75
env_size=13
num_levels=9600
compile=True
use_prev_action=False
weight_restrictions=None
weight_restrictions_invert=False
use_bf16=False
use_wandb=True
seed=122
mask_type=first_episode
ckpt_dir=jaxgmg2_cursed
vis_average_state=False
trim_episodes=False
num_total_env_steps=9999974400
eval_every=1
eff_horizon=None
optim=adam
env_rule=None
env_rule_mixture=None
hf_user=davidquarel
hf_collection=davidquarel/jaxgmg
use_hf=True
num_hf_uploads=1
use_log=True
log_optimizer_state=False
resume=None
resume_id=None
resume_optim=False
checkpoint=al_0.75_g_0.97_seed_122_pa_1
wandb_project=jaxgmg2_cursed
eval_schedule=0:1,250:2,500:5,2000:10
render_sixel=False
sixel_idx=60
live_monitor=False
run_id=0
seed_formula=None
deterministic=True
penalize_time=False
f_str_ckpt=al_0.75_g_0.97_seed_122_pa_1
duplication_factor=-1
smoke=False
ntfy=david_jaxgmg
num_chains=6
num_draws=3000
num_steps_bw_draws=1
on_policy=True
llc_nbeta=3000
localization=10
exact_solver_each_draw=False
llc_optimizer=sgld
iw_clip_eps=None
rmsprop_burnin_steps=20
llc_data_file=llc_scan_open_reinforce.pkl
llc_checkpoint_index=None
llc_checkpoint_number=None
sink=None
repo_id=davidquarel/jaxgmg_ckpt_zip
use_shuffled_checkpoints=False
force_re_download=False
off_distribution_data=False
evaluate_every_position=False
num_prev_actions=1
eff_acc_steps=4
chunk_size=9600
env_steps_per_microbatch=153600
ckpt_path=jaxgmg2_cursed/al_0.75_g_0.97_seed_122_pa_1
env_steps_per_loop=614400
total_loops=16276