Supra-Mini-v4-2M / benchmarks.md
LH-Tech-AI's picture
Upload 9 files
208972d verified
Tasks Version Filter n-shot Metric Value Stderr
arc_easy 1 none 0 acc 0.2727 ± 0.0091
none 0 acc_norm 0.2816 ± 0.0092
blimp 2 none acc 0.5526 ± 0.0017
- blimp_adjunct_island 1 none 0 acc 0.7330 ± 0.0140
- blimp_anaphor_gender_agreement 1 none 0 acc 0.3820 ± 0.0154
- blimp_anaphor_number_agreement 1 none 0 acc 0.5030 ± 0.0158
- blimp_animate_subject_passive 1 none 0 acc 0.5520 ± 0.0157
- blimp_animate_subject_trans 1 none 0 acc 0.7250 ± 0.0141
- blimp_causative 1 none 0 acc 0.5010 ± 0.0158
- blimp_complex_NP_island 1 none 0 acc 0.5640 ± 0.0157
- blimp_coordinate_structure_constraint_complex_left_branch 1 none 0 acc 0.0840 ± 0.0088
- blimp_coordinate_structure_constraint_object_extraction 1 none 0 acc 0.4930 ± 0.0158
- blimp_determiner_noun_agreement_1 1 none 0 acc 0.7000 ± 0.0145
- blimp_determiner_noun_agreement_2 1 none 0 acc 0.7070 ± 0.0144
- blimp_determiner_noun_agreement_irregular_1 1 none 0 acc 0.5500 ± 0.0157
- blimp_determiner_noun_agreement_irregular_2 1 none 0 acc 0.7110 ± 0.0143
- blimp_determiner_noun_agreement_with_adj_2 1 none 0 acc 0.6170 ± 0.0154
- blimp_determiner_noun_agreement_with_adj_irregular_1 1 none 0 acc 0.5010 ± 0.0158
- blimp_determiner_noun_agreement_with_adj_irregular_2 1 none 0 acc 0.6180 ± 0.0154
- blimp_determiner_noun_agreement_with_adjective_1 1 none 0 acc 0.6380 ± 0.0152
- blimp_distractor_agreement_relational_noun 1 none 0 acc 0.3050 ± 0.0146
- blimp_distractor_agreement_relative_clause 1 none 0 acc 0.2710 ± 0.0141
- blimp_drop_argument 1 none 0 acc 0.6970 ± 0.0145
- blimp_ellipsis_n_bar_1 1 none 0 acc 0.2640 ± 0.0139
- blimp_ellipsis_n_bar_2 1 none 0 acc 0.4140 ± 0.0156
- blimp_existential_there_object_raising 1 none 0 acc 0.7440 ± 0.0138
- blimp_existential_there_quantifiers_1 1 none 0 acc 0.9030 ± 0.0094
- blimp_existential_there_quantifiers_2 1 none 0 acc 0.1200 ± 0.0103
- blimp_existential_there_subject_raising 1 none 0 acc 0.6530 ± 0.0151
- blimp_expletive_it_object_raising 1 none 0 acc 0.6850 ± 0.0147
- blimp_inchoative 1 none 0 acc 0.4090 ± 0.0156
- blimp_intransitive 1 none 0 acc 0.5600 ± 0.0157
- blimp_irregular_past_participle_adjectives 1 none 0 acc 0.7220 ± 0.0142
- blimp_irregular_past_participle_verbs 1 none 0 acc 0.6330 ± 0.0152
- blimp_irregular_plural_subject_verb_agreement_1 1 none 0 acc 0.6140 ± 0.0154
- blimp_irregular_plural_subject_verb_agreement_2 1 none 0 acc 0.7250 ± 0.0141
- blimp_left_branch_island_echo_question 1 none 0 acc 0.6450 ± 0.0151
- blimp_left_branch_island_simple_question 1 none 0 acc 0.1690 ± 0.0119
- blimp_matrix_question_npi_licensor_present 1 none 0 acc 0.0020 ± 0.0014
- blimp_npi_present_1 1 none 0 acc 0.3860 ± 0.0154
- blimp_npi_present_2 1 none 0 acc 0.3810 ± 0.0154
- blimp_only_npi_licensor_present 1 none 0 acc 0.6120 ± 0.0154
- blimp_only_npi_scope 1 none 0 acc 0.4280 ± 0.0157
- blimp_passive_1 1 none 0 acc 0.6450 ± 0.0151
- blimp_passive_2 1 none 0 acc 0.6410 ± 0.0152
- blimp_principle_A_c_command 1 none 0 acc 0.6910 ± 0.0146
- blimp_principle_A_case_1 1 none 0 acc 1.0000 ± 0
- blimp_principle_A_case_2 1 none 0 acc 0.5190 ± 0.0158
- blimp_principle_A_domain_1 1 none 0 acc 0.9810 ± 0.0043
- blimp_principle_A_domain_2 1 none 0 acc 0.5570 ± 0.0157
- blimp_principle_A_domain_3 1 none 0 acc 0.4680 ± 0.0158
- blimp_principle_A_reconstruction 1 none 0 acc 0.2410 ± 0.0135
- blimp_regular_plural_subject_verb_agreement_1 1 none 0 acc 0.7200 ± 0.0142
- blimp_regular_plural_subject_verb_agreement_2 1 none 0 acc 0.6030 ± 0.0155
- blimp_sentential_negation_npi_licensor_present 1 none 0 acc 1.0000 ± 0
- blimp_sentential_negation_npi_scope 1 none 0 acc 0.4990 ± 0.0158
- blimp_sentential_subject_island 1 none 0 acc 0.3440 ± 0.0150
- blimp_superlative_quantifiers_1 1 none 0 acc 0.5400 ± 0.0158
- blimp_superlative_quantifiers_2 1 none 0 acc 0.1780 ± 0.0121
- blimp_tough_vs_raising_1 1 none 0 acc 0.4330 ± 0.0157
- blimp_tough_vs_raising_2 1 none 0 acc 0.5950 ± 0.0155
- blimp_transitive 1 none 0 acc 0.6260 ± 0.0153
- blimp_wh_island 1 none 0 acc 0.4180 ± 0.0156
- blimp_wh_questions_object_gap 1 none 0 acc 0.5430 ± 0.0158
- blimp_wh_questions_subject_gap 1 none 0 acc 0.9160 ± 0.0088
- blimp_wh_questions_subject_gap_long_distance 1 none 0 acc 0.9410 ± 0.0075
- blimp_wh_vs_that_no_gap 1 none 0 acc 0.9800 ± 0.0044
- blimp_wh_vs_that_no_gap_long_distance 1 none 0 acc 0.9820 ± 0.0042
- blimp_wh_vs_that_with_gap 1 none 0 acc 0.0280 ± 0.0052
- blimp_wh_vs_that_with_gap_long_distance 1 none 0 acc 0.0150 ± 0.0038
wikitext 2 none 0 bits_per_byte 2.1661 ± N/A
none 0 byte_perplexity 4.4881 ± N/A
none 0 word_perplexity 3068.2023 ± N/A
Groups Version Filter n-shot Metric Value Stderr
blimp 2 none acc 0.5526 ± 0.0017