aisi-whitebox/non_sandbagging_llama_31_8b_instruct_sec_qa_v2_cot
Viewer
• Updated • 100 • 5
aisi-whitebox/non_sandbagging_llama_31_8b_instruct_sec_qa_v1_cot
Viewer
• Updated • 110 • 3
aisi-whitebox/non_sandbagging_llama_31_8b_instruct_CyberMetric-2000_cot
Viewer
• Updated • 299 • 3
aisi-whitebox/non_sandbagging_llama_31_8b_instruct_wmdp-cyber_cot
Viewer
• Updated • 251 • 3
aisi-whitebox/non_sandbagging_llama_31_8b_instruct_wmdp-chem_cot
Viewer
• Updated • 300 • 3
aisi-whitebox/non_sandbagging_llama_31_8b_instruct_wmdp-bio_cot
Viewer
• Updated • 300 • 3
aisi-whitebox/prompted_sandbagging_gemma_gemma_3_27b_it_swe_bench_verified_mini
Viewer
• Updated • 2 • 4
aisi-whitebox/prompted_sandbagging_athene_devstral_small_2505_swe_bench_verified_mini
Viewer
• Updated • 100 • 4
aisi-whitebox/prompted_sandbagging_nemotron_llama_3_3_nemotron_super_49b_v1_swe_bench_verified_mini
Viewer
• Updated • 20 • 13
aisi-whitebox/saving_test2_prompted_sandbagging_llama_31_8b_instruct
Viewer
• Updated • 71 • 5
aisi-whitebox/saving_test2_prompted_sandbagging_llama_31_8b_instruct_ARC-Challenge
Viewer
• Updated • 20 • 3
aisi-whitebox/saving_test2_prompted_sandbagging_llama_31_8b_instruct_mmlu_0_shot
Viewer
• Updated • 20 • 3
aisi-whitebox/saving_test2_prompted_sandbagging_llama_31_8b_instruct_ARC-Challenge_cot
Viewer
• Updated • 15 • 3
aisi-whitebox/saving_test2_prompted_sandbagging_llama_31_8b_instruct_mmlu_0_shot_cot
Viewer
• Updated • 18 • 6
aisi-whitebox/saving_test_prompted_sandbagging_llama_31_8b_instruct_ARC-Challenge
Viewer
• Updated • 20 • 3
aisi-whitebox/saving_test_prompted_sandbagging_llama_31_8b_instruct_mmlu_0_shot
Viewer
• Updated • 20 • 3
aisi-whitebox/saving_test_prompted_sandbagging_llama_31_8b_instruct_ARC-Challenge_cot
Viewer
• Updated • 17 • 3
aisi-whitebox/saving_test_prompted_sandbagging_llama_31_8b_instruct_mmlu_0_shot_cot
Viewer
• Updated • 12 • 3
aisi-whitebox/prompted_sandbagging_nemotron_llama_3_3_nemotron_super_49b_v1_gdm_intercode_ctf
Viewer
• Updated • 20 • 4
aisi-whitebox/mmlu_0_shot_new_merged_mo2_6epc_finetuned_sandbagging_follow_up_q
Viewer
• Updated • 1k • 8
aisi-whitebox/mmlu_0_shot_cot_new_merged_mo2_6epc_finetuned_sandbagging_follow_up_q
Viewer
• Updated • 1k • 7
aisi-whitebox/wmdp_chem_new_merged_mo2_6epc_finetuned_sandbagging_follow_up_q
Viewer
• Updated • 816 • 4
aisi-whitebox/wmdp_chem_cot_new_merged_mo2_6epc_finetuned_sandbagging_follow_up_q
Viewer
• Updated • 816 • 5
aisi-whitebox/wmdp_bio_new_merged_mo2_6epc_finetuned_sandbagging_follow_up_q
Viewer
• Updated • 1k • 5
aisi-whitebox/wmdp_bio_cot_new_merged_mo2_6epc_finetuned_sandbagging_follow_up_q
Viewer
• Updated • 1k • 5
aisi-whitebox/mo1xe_checkpoint_266_ARC-Challenge
Viewer
• Updated • 200 • 3
aisi-whitebox/mo1xe_checkpoint_266_mmlu_0_shot
Viewer
• Updated • 199 • 3
aisi-whitebox/mo1xe_checkpoint_266_CyberMetric-2000
Viewer
• Updated • 200 • 4
aisi-whitebox/mo1xe_checkpoint_266_ARC-Challenge_cot
Viewer
• Updated • 200 • 4
aisi-whitebox/mo1xe_checkpoint_266_mmlu_0_shot_cot
Viewer
• Updated • 197 • 3