Supra-Mini-v2-0.1M / benchmarks.md
LH-Tech-AI's picture
Create benchmarks.md
e8882d3 verified
Tasks Version Filter n-shot Metric Value Stderr
blimp 2 none 0 acc 0.5354 ± 0.0017
- blimp_adjunct_island 1 none 0 acc 0.5980 ± 0.0155
- blimp_anaphor_gender_agreement 1 none 0 acc 0.3130 ± 0.0147
- blimp_anaphor_number_agreement 1 none 0 acc 0.5090 ± 0.0158
- blimp_animate_subject_passive 1 none 0 acc 0.5750 ± 0.0156
- blimp_animate_subject_trans 1 none 0 acc 0.7470 ± 0.0138
- blimp_causative 1 none 0 acc 0.4810 ± 0.0158
- blimp_complex_NP_island 1 none 0 acc 0.4880 ± 0.0158
- blimp_coordinate_structure_constraint_complex_left_branch 1 none 0 acc 0.1420 ± 0.0110
- blimp_coordinate_structure_constraint_object_extraction 1 none 0 acc 0.5820 ± 0.0156
- blimp_determiner_noun_agreement_1 1 none 0 acc 0.6580 ± 0.0150
- blimp_determiner_noun_agreement_2 1 none 0 acc 0.6320 ± 0.0153
- blimp_determiner_noun_agreement_irregular_1 1 none 0 acc 0.5150 ± 0.0158
- blimp_determiner_noun_agreement_irregular_2 1 none 0 acc 0.6980 ± 0.0145
- blimp_determiner_noun_agreement_with_adj_2 1 none 0 acc 0.5780 ± 0.0156
- blimp_determiner_noun_agreement_with_adj_irregular_1 1 none 0 acc 0.4220 ± 0.0156
- blimp_determiner_noun_agreement_with_adj_irregular_2 1 none 0 acc 0.5170 ± 0.0158
- blimp_determiner_noun_agreement_with_adjective_1 1 none 0 acc 0.6070 ± 0.0155
- blimp_distractor_agreement_relational_noun 1 none 0 acc 0.3120 ± 0.0147
- blimp_distractor_agreement_relative_clause 1 none 0 acc 0.3110 ± 0.0146
- blimp_drop_argument 1 none 0 acc 0.7270 ± 0.0141
- blimp_ellipsis_n_bar_1 1 none 0 acc 0.2180 ± 0.0131
- blimp_ellipsis_n_bar_2 1 none 0 acc 0.3480 ± 0.0151
- blimp_existential_there_object_raising 1 none 0 acc 0.6860 ± 0.0147
- blimp_existential_there_quantifiers_1 1 none 0 acc 0.8100 ± 0.0124
- blimp_existential_there_quantifiers_2 1 none 0 acc 0.2950 ± 0.0144
- blimp_existential_there_subject_raising 1 none 0 acc 0.6880 ± 0.0147
- blimp_expletive_it_object_raising 1 none 0 acc 0.6570 ± 0.0150
- blimp_inchoative 1 none 0 acc 0.3850 ± 0.0154
- blimp_intransitive 1 none 0 acc 0.5170 ± 0.0158
- blimp_irregular_past_participle_adjectives 1 none 0 acc 0.6620 ± 0.0150
- blimp_irregular_past_participle_verbs 1 none 0 acc 0.5050 ± 0.0158
- blimp_irregular_plural_subject_verb_agreement_1 1 none 0 acc 0.5880 ± 0.0156
- blimp_irregular_plural_subject_verb_agreement_2 1 none 0 acc 0.5860 ± 0.0156
- blimp_left_branch_island_echo_question 1 none 0 acc 0.9020 ± 0.0094
- blimp_left_branch_island_simple_question 1 none 0 acc 0.2310 ± 0.0133
- blimp_matrix_question_npi_licensor_present 1 none 0 acc 0.0380 ± 0.0060
- blimp_npi_present_1 1 none 0 acc 0.6520 ± 0.0151
- blimp_npi_present_2 1 none 0 acc 0.6390 ± 0.0152
- blimp_only_npi_licensor_present 1 none 0 acc 0.0400 ± 0.0062
- blimp_only_npi_scope 1 none 0 acc 0.0020 ± 0.0014
- blimp_passive_1 1 none 0 acc 0.6520 ± 0.0151
- blimp_passive_2 1 none 0 acc 0.6280 ± 0.0153
- blimp_principle_A_c_command 1 none 0 acc 0.6890 ± 0.0146
- blimp_principle_A_case_1 1 none 0 acc 0.9990 ± 0.0010
- blimp_principle_A_case_2 1 none 0 acc 0.4450 ± 0.0157
- blimp_principle_A_domain_1 1 none 0 acc 0.8820 ± 0.0102
- blimp_principle_A_domain_2 1 none 0 acc 0.5450 ± 0.0158
- blimp_principle_A_domain_3 1 none 0 acc 0.4690 ± 0.0158
- blimp_principle_A_reconstruction 1 none 0 acc 0.3830 ± 0.0154
- blimp_regular_plural_subject_verb_agreement_1 1 none 0 acc 0.6890 ± 0.0146
- blimp_regular_plural_subject_verb_agreement_2 1 none 0 acc 0.5760 ± 0.0156
- blimp_sentential_negation_npi_licensor_present 1 none 0 acc 0.9990 ± 0.0010
- blimp_sentential_negation_npi_scope 1 none 0 acc 0.4590 ± 0.0158
- blimp_sentential_subject_island 1 none 0 acc 0.2760 ± 0.0141
- blimp_superlative_quantifiers_1 1 none 0 acc 0.3040 ± 0.0146
- blimp_superlative_quantifiers_2 1 none 0 acc 0.3620 ± 0.0152
- blimp_tough_vs_raising_1 1 none 0 acc 0.3310 ± 0.0149
- blimp_tough_vs_raising_2 1 none 0 acc 0.6970 ± 0.0145
- blimp_transitive 1 none 0 acc 0.6560 ± 0.0150
- blimp_wh_island 1 none 0 acc 0.5110 ± 0.0158
- blimp_wh_questions_object_gap 1 none 0 acc 0.6180 ± 0.0154
- blimp_wh_questions_subject_gap 1 none 0 acc 0.9480 ± 0.0070
- blimp_wh_questions_subject_gap_long_distance 1 none 0 acc 0.8930 ± 0.0098
- blimp_wh_vs_that_no_gap 1 none 0 acc 0.9960 ± 0.0020
- blimp_wh_vs_that_no_gap_long_distance 1 none 0 acc 0.9910 ± 0.0030
- blimp_wh_vs_that_with_gap 1 none 0 acc 0.0090 ± 0.0030
- blimp_wh_vs_that_with_gap_long_distance 1 none 0 acc 0.0040 ± 0.0020
arc_easy 1 none 0 acc 0.2677 ± 0.0091
none 0 acc_norm 0.2841 ± 0.0093
wikitext 2 none 0 bits_per_byte 2.9624 ± N/A
none 0 byte_perplexity 7.7940 ± N/A
none 0 word_perplexity 58699.2441 ± N/A
Groups Version Filter n-shot Metric Value Stderr
blimp 2 none 0 acc 0.5354 ± 0.0017