Supra-Mini-0.1M / benchmarks.md
LH-Tech-AI's picture
Rename benchmarks.sh to benchmarks.md
2c43df0 verified
Tasks Version Filter n-shot Metric Value Stderr
blimp 2 none 0 acc 0.5177 ± 0.0017
- blimp_adjunct_island 1 none 0 acc 0.7430 ± 0.0138
- blimp_anaphor_gender_agreement 1 none 0 acc 0.2600 ± 0.0139
- blimp_anaphor_number_agreement 1 none 0 acc 0.4650 ± 0.0158
- blimp_animate_subject_passive 1 none 0 acc 0.5740 ± 0.0156
- blimp_animate_subject_trans 1 none 0 acc 0.6820 ± 0.0147
- blimp_causative 1 none 0 acc 0.4270 ± 0.0156
- blimp_complex_NP_island 1 none 0 acc 0.4380 ± 0.0157
- blimp_coordinate_structure_constraint_complex_left_branch 1 none 0 acc 0.0860 ± 0.0089
- blimp_coordinate_structure_constraint_object_extraction 1 none 0 acc 0.5060 ± 0.0158
- blimp_determiner_noun_agreement_1 1 none 0 acc 0.5960 ± 0.0155
- blimp_determiner_noun_agreement_2 1 none 0 acc 0.5470 ± 0.0157
- blimp_determiner_noun_agreement_irregular_1 1 none 0 acc 0.5110 ± 0.0158
- blimp_determiner_noun_agreement_irregular_2 1 none 0 acc 0.5840 ± 0.0156
- blimp_determiner_noun_agreement_with_adj_2 1 none 0 acc 0.4880 ± 0.0158
- blimp_determiner_noun_agreement_with_adj_irregular_1 1 none 0 acc 0.4500 ± 0.0157
- blimp_determiner_noun_agreement_with_adj_irregular_2 1 none 0 acc 0.5310 ± 0.0158
- blimp_determiner_noun_agreement_with_adjective_1 1 none 0 acc 0.5190 ± 0.0158
- blimp_distractor_agreement_relational_noun 1 none 0 acc 0.3480 ± 0.0151
- blimp_distractor_agreement_relative_clause 1 none 0 acc 0.3440 ± 0.0150
- blimp_drop_argument 1 none 0 acc 0.7320 ± 0.0140
- blimp_ellipsis_n_bar_1 1 none 0 acc 0.2240 ± 0.0132
- blimp_ellipsis_n_bar_2 1 none 0 acc 0.2920 ± 0.0144
- blimp_existential_there_object_raising 1 none 0 acc 0.7300 ± 0.0140
- blimp_existential_there_quantifiers_1 1 none 0 acc 0.7110 ± 0.0143
- blimp_existential_there_quantifiers_2 1 none 0 acc 0.0400 ± 0.0062
- blimp_existential_there_subject_raising 1 none 0 acc 0.6460 ± 0.0151
- blimp_expletive_it_object_raising 1 none 0 acc 0.6440 ± 0.0151
- blimp_inchoative 1 none 0 acc 0.3790 ± 0.0153
- blimp_intransitive 1 none 0 acc 0.5630 ± 0.0157
- blimp_irregular_past_participle_adjectives 1 none 0 acc 0.4000 ± 0.0155
- blimp_irregular_past_participle_verbs 1 none 0 acc 0.5430 ± 0.0158
- blimp_irregular_plural_subject_verb_agreement_1 1 none 0 acc 0.4460 ± 0.0157
- blimp_irregular_plural_subject_verb_agreement_2 1 none 0 acc 0.5100 ± 0.0158
- blimp_left_branch_island_echo_question 1 none 0 acc 0.8390 ± 0.0116
- blimp_left_branch_island_simple_question 1 none 0 acc 0.1170 ± 0.0102
- blimp_matrix_question_npi_licensor_present 1 none 0 acc 0.0020 ± 0.0014
- blimp_npi_present_1 1 none 0 acc 0.5060 ± 0.0158
- blimp_npi_present_2 1 none 0 acc 0.5070 ± 0.0158
- blimp_only_npi_licensor_present 1 none 0 acc 0.1620 ± 0.0117
- blimp_only_npi_scope 1 none 0 acc 0.0930 ± 0.0092
- blimp_passive_1 1 none 0 acc 0.5950 ± 0.0155
- blimp_passive_2 1 none 0 acc 0.6130 ± 0.0154
- blimp_principle_A_c_command 1 none 0 acc 0.5840 ± 0.0156
- blimp_principle_A_case_1 1 none 0 acc 0.9990 ± 0.0010
- blimp_principle_A_case_2 1 none 0 acc 0.4280 ± 0.0157
- blimp_principle_A_domain_1 1 none 0 acc 1.0000 ± 0
- blimp_principle_A_domain_2 1 none 0 acc 0.6010 ± 0.0155
- blimp_principle_A_domain_3 1 none 0 acc 0.5150 ± 0.0158
- blimp_principle_A_reconstruction 1 none 0 acc 0.1900 ± 0.0124
- blimp_regular_plural_subject_verb_agreement_1 1 none 0 acc 0.6880 ± 0.0147
- blimp_regular_plural_subject_verb_agreement_2 1 none 0 acc 0.5920 ± 0.0155
- blimp_sentential_negation_npi_licensor_present 1 none 0 acc 0.9990 ± 0.0010
- blimp_sentential_negation_npi_scope 1 none 0 acc 0.5420 ± 0.0158
- blimp_sentential_subject_island 1 none 0 acc 0.3570 ± 0.0152
- blimp_superlative_quantifiers_1 1 none 0 acc 0.4970 ± 0.0158
- blimp_superlative_quantifiers_2 1 none 0 acc 0.6980 ± 0.0145
- blimp_tough_vs_raising_1 1 none 0 acc 0.2810 ± 0.0142
- blimp_tough_vs_raising_2 1 none 0 acc 0.7660 ± 0.0134
- blimp_transitive 1 none 0 acc 0.6110 ± 0.0154
- blimp_wh_island 1 none 0 acc 0.2680 ± 0.0140
- blimp_wh_questions_object_gap 1 none 0 acc 0.7850 ± 0.0130
- blimp_wh_questions_subject_gap 1 none 0 acc 0.9600 ± 0.0062
- blimp_wh_questions_subject_gap_long_distance 1 none 0 acc 0.9490 ± 0.0070
- blimp_wh_vs_that_no_gap 1 none 0 acc 0.9830 ± 0.0041
- blimp_wh_vs_that_no_gap_long_distance 1 none 0 acc 0.9770 ± 0.0047
- blimp_wh_vs_that_with_gap 1 none 0 acc 0.0070 ± 0.0026
- blimp_wh_vs_that_with_gap_long_distance 1 none 0 acc 0.0190 ± 0.0043
arc_easy 1 none 0 acc 0.2639 ± 0.0090
none 0 acc_norm 0.2731 ± 0.0091
wikitext 2 none 0 bits_per_byte 4.6536 ± N/A
none 0 byte_perplexity 25.1691 ± N/A
none 0 word_perplexity 30979484.4095 ± N/A
Groups Version Filter n-shot Metric Value Stderr
blimp 2 none 0 acc 0.5177 ± 0.0017