TAUR-dev/M-1e_with_gpt4o_both-rl
2B • Updated TAUR-dev/M-0914_fastrl__0epoch_3args_grpo_notokenmean-rl
Updated
TAUR-dev/M-sft_exp_1e_zayneprompts-sft
2B • Updated TAUR-dev/M-sft_exp_zayneV3_cd3arg_w_gpt4o_both-sft
2B • Updated TAUR-dev/M-sft_exp_zayneV3_1e_cd3arg_w_gpt4o_ref-sft
2B • Updated TAUR-dev/M-SFTV2_V3_rl_RUN__gpt4o_ref-rl
Updated
TAUR-dev/M-SFTV2_V3_rl_RUN__gpt4o_both-rl
Updated
TAUR-dev/M-SFTV2_V3_rl_er_RUN__9_11-rl
Updated
TAUR-dev/M-sft_exp_zayneV2-sft
2B • Updated TAUR-dev/M-0911__0epoch_3args_dapo_nods_50epoch-rl
2B • Updated TAUR-dev/M-0911__0epoch_alltask_dapo_nods_50epoch-rl
Updated
TAUR-dev/M-SFTV2_all_V2_RUN__9_11_w_verdict_reward-rl
Updated
TAUR-dev/M-SFTV2_all_V2_RUN__9_11-rl
Updated
TAUR-dev/M-SFTV2_V2_RUN__9_11-rl
Updated
TAUR-dev/M-0911__zayne_3args_grpo-rl
2B • Updated TAUR-dev/M-0911__qrepeat1_ref5_0C.-C.-C-IC.-CC_3args_grpo-rl
2B • Updated TAUR-dev/M-0911__0epoch_alltask_dapo_50epoch-rl
2B • Updated TAUR-dev/M-0911__qrepeat3_ref5_0C.-C.-C-IC.-CC_3args_grpo-rl
2B • Updated TAUR-dev/M-0911__qrepeat1_ref3_0C.-C.-C-IC.-CC_3args_grpo-rl
2B • Updated TAUR-dev/M-0911__qrepeat3_ref3_0C.-C.-C-IC.-CC_3args_grpo-rl
2B • Updated TAUR-dev/M-0911__0epoch_3args_grpo_try2-rl
2B • Updated TAUR-dev/M-0911__0epoch_3args_grpo_try3-rl
2B • Updated TAUR-dev/M-skill-factory__z_rl-rl
Updated
TAUR-dev/M-0911__0epoch_alltask_grpo_try3-rl
Updated
TAUR-dev/M-0911__0epoch_alltask_grpo_try2-rl
Updated
TAUR-dev/M-0911__0epoch_3args_dapo_50epoch-rl
2B • Updated TAUR-dev/M-0911__0epoch_alltask_dapo_20epoch-rl
2B • Updated TAUR-dev/M-skillfactory_sft_countdown_3arg_qrepeat3_reflections3_formats0C.-C.-C-IC.-CC-sft
2B • Updated TAUR-dev/M-skillfactory_sft_countdown_3arg_qrepeat3_reflections5_formats0C.-C.-C-IC.-CC-sft
2B • Updated TAUR-dev/M-skillfactory_sft_countdown_3arg_qrepeat1_reflections5_formats0C.-C.-C-IC.-CC-sft
2B • Updated