TAUR-dev/M-multitask_sftdata_cd3_lm3_ac4_lc4-sft
2B • Updated TAUR-dev/M-multitask_sftdata_cd34_lm3_ac4_lc4-sft
2B • Updated TAUR-dev/M-0918__bon_tuning_correct_samples_3args_grpo-rl
2B • Updated TAUR-dev/M-0918__bon_tuning_all_samples_3args_grpo-rl
2B • Updated TAUR-dev/M-0918__orig_only_prompts_3args_grpo-rl
2B • Updated TAUR-dev/M-ablations__rl_ab_no_reflects-rl
2B • Updated TAUR-dev/M-0918__random_3args_grpo-rl
2B • Updated TAUR-dev/M-0918__1_sample_only_corrects_3args_grpo-rl
2B • Updated TAUR-dev/M-sft_on_pv_v2__rl_on_cd34_gsm_csqa_lm34-rl
Updated
TAUR-dev/M-sft_basemodel__rl_on_cd34_gsm_csqa_lm34-rl
Updated
TAUR-dev/M-0918__low_quality_reflections_3args_grpo-rl
Updated
TAUR-dev/M-RC-ab_sft_bon_all_samples-sft
2B • Updated • 6
TAUR-dev/M-skillfactory-ablations__random_reflections5_formatsrandom-sft
2B • Updated TAUR-dev/M-skillfactory-ablations__no_reflections_reflections5_formatsno_reflection-sft
2B • Updated TAUR-dev/M-skillfactory-ablations__orig_only_reflections5_formats-C_full-sft
2B • Updated TAUR-dev/M-RC-ab_sft_bon_corr_samples-sft
2B • Updated TAUR-dev/M-RC-ab_sft_our_structure_single_sample-sft
2B • Updated TAUR-dev/M-rl_1e_v2__pv_v3-rl
2B • Updated • 3
TAUR-dev/M-0918__0epoch_3and4args_grpo-rl
Updated
TAUR-dev/M-sft_exp_1e_zayneprompts_v3-sft
2B • Updated TAUR-dev/M-rl_1e_v2__pv_v2-rl
2B • Updated TAUR-dev/M-rl_1e_v2__pv_v2_origonly2e-rl
2B • Updated TAUR-dev/M-rl_1e_v2__pv_v2-rl__150
2B • Updated TAUR-dev/M-rl_1e_v2__pv_v2_origonly2e-rl__150
2B • Updated TAUR-dev/M-sft_exp_1e_zayneprompts_v2_orig_only2e-sft
2B • Updated TAUR-dev/M-sft_exp_1e_zayneprompts_v2-sft
2B • Updated TAUR-dev/M-0914_fastrl__1e_3args_dapo-rl
2B • Updated TAUR-dev/M-rl_1e_v2__pv-rl
2B • Updated • 3
TAUR-dev/M-0914_fastrl__0epoch_3args_dapo-rl
2B • Updated TAUR-dev/M-1e_with_gpt4o_reflections-rl
2B • Updated