RLAIF/numina-math-llama-3.1-8b-bon-meta-cot
Viewer
• Updated • 680k • 13
RLAIF/optim_policy_pretrain-pythia-160m_lr0.0001_bs24_wp1_wd0.01_ep0_cp35k-merged
Viewer
• Updated • 700k • 3
RLAIF/TIR-Batched-PRM-Seed-Rollouts
Viewer
• Updated • 160k • 4
RLAIF/dec_09_token_baseline_ds_math_llama_3_1_405b_tmp07_together
Viewer
• Updated • 2.5k • 3
RLAIF/dec09_token_thinking_shrt_ds_math_llama_3_1_8b_instruc_tmp07
Viewer
• Updated • 2.5k • 3
RLAIF/Value-v2-NUMINA-V2-Blocks-Merged-1999-problems-step-len-filtered
Viewer
• Updated • 32.3k • 2
RLAIF/Value-v2-NUMINA-V2-Blocks-Merged-980-problems-step-len-filtered
Viewer
• Updated • 15.8k • 3
RLAIF/Value-v1-NUMINA-V1-Blocks-Merged
Viewer
• Updated • 64k • 2
RLAIF/NUMINA-V1-Blocks-Merged
Viewer
• Updated • 18.5M • 2
RLAIF/Value-v1-NUMINA-V1-Blocks-Merged-3194-problems-step-len-filtered
Viewer
• Updated • 44.2k • 3
RLAIF/Value-v1-NUMINA-V1-Blocks-Merged-2964-problems-step-len-filtered
Viewer
• Updated • 41k • 3
RLAIF/Value-v1-NUMINA-V1-Blocks-Merged-1620-problems-step-len-filtered
Viewer
• Updated • 21.1k • 3
RLAIF/CODE-BEHAVIOR-NUMINA-V1-Blocks
Viewer
• Updated • 20.9k • 5
RLAIF/test_public_private
Viewer
• Updated • 1 • 3