Supra-Mini-v5-8M / benchmarks.md
LH-Tech-AI's picture
Upload 11 files
d467b07 verified
Tasks Version Filter n-shot Metric Value Stderr
arc_easy 1 none 0 acc 0.3439 ± 0.0097
none 0 acc_norm 0.3346 ± 0.0097
blimp 2 none acc 0.6349 ± 0.0016
- blimp_adjunct_island 1 none 0 acc 0.7990 ± 0.0127
- blimp_anaphor_gender_agreement 1 none 0 acc 0.3290 ± 0.0149
- blimp_anaphor_number_agreement 1 none 0 acc 0.6330 ± 0.0152
- blimp_animate_subject_passive 1 none 0 acc 0.6550 ± 0.0150
- blimp_animate_subject_trans 1 none 0 acc 0.8180 ± 0.0122
- blimp_causative 1 none 0 acc 0.4900 ± 0.0158
- blimp_complex_NP_island 1 none 0 acc 0.5300 ± 0.0158
- blimp_coordinate_structure_constraint_complex_left_branch 1 none 0 acc 0.2990 ± 0.0145
- blimp_coordinate_structure_constraint_object_extraction 1 none 0 acc 0.7550 ± 0.0136
- blimp_determiner_noun_agreement_1 1 none 0 acc 0.7910 ± 0.0129
- blimp_determiner_noun_agreement_2 1 none 0 acc 0.8640 ± 0.0108
- blimp_determiner_noun_agreement_irregular_1 1 none 0 acc 0.7020 ± 0.0145
- blimp_determiner_noun_agreement_irregular_2 1 none 0 acc 0.8460 ± 0.0114
- blimp_determiner_noun_agreement_with_adj_2 1 none 0 acc 0.7370 ± 0.0139
- blimp_determiner_noun_agreement_with_adj_irregular_1 1 none 0 acc 0.5780 ± 0.0156
- blimp_determiner_noun_agreement_with_adj_irregular_2 1 none 0 acc 0.7300 ± 0.0140
- blimp_determiner_noun_agreement_with_adjective_1 1 none 0 acc 0.7060 ± 0.0144
- blimp_distractor_agreement_relational_noun 1 none 0 acc 0.2630 ± 0.0139
- blimp_distractor_agreement_relative_clause 1 none 0 acc 0.2060 ± 0.0128
- blimp_drop_argument 1 none 0 acc 0.7110 ± 0.0143
- blimp_ellipsis_n_bar_1 1 none 0 acc 0.5800 ± 0.0156
- blimp_ellipsis_n_bar_2 1 none 0 acc 0.7490 ± 0.0137
- blimp_existential_there_object_raising 1 none 0 acc 0.7470 ± 0.0138
- blimp_existential_there_quantifiers_1 1 none 0 acc 0.8450 ± 0.0115
- blimp_existential_there_quantifiers_2 1 none 0 acc 0.2720 ± 0.0141
- blimp_existential_there_subject_raising 1 none 0 acc 0.6560 ± 0.0150
- blimp_expletive_it_object_raising 1 none 0 acc 0.6820 ± 0.0147
- blimp_inchoative 1 none 0 acc 0.4210 ± 0.0156
- blimp_intransitive 1 none 0 acc 0.5750 ± 0.0156
- blimp_irregular_past_participle_adjectives 1 none 0 acc 0.9240 ± 0.0084
- blimp_irregular_past_participle_verbs 1 none 0 acc 0.6800 ± 0.0148
- blimp_irregular_plural_subject_verb_agreement_1 1 none 0 acc 0.7100 ± 0.0144
- blimp_irregular_plural_subject_verb_agreement_2 1 none 0 acc 0.8520 ± 0.0112
- blimp_left_branch_island_echo_question 1 none 0 acc 0.8390 ± 0.0116
- blimp_left_branch_island_simple_question 1 none 0 acc 0.3810 ± 0.0154
- blimp_matrix_question_npi_licensor_present 1 none 0 acc 0.0060 ± 0.0024
- blimp_npi_present_1 1 none 0 acc 0.5420 ± 0.0158
- blimp_npi_present_2 1 none 0 acc 0.5250 ± 0.0158
- blimp_only_npi_licensor_present 1 none 0 acc 0.3710 ± 0.0153
- blimp_only_npi_scope 1 none 0 acc 0.4090 ± 0.0156
- blimp_passive_1 1 none 0 acc 0.7980 ± 0.0127
- blimp_passive_2 1 none 0 acc 0.7770 ± 0.0132
- blimp_principle_A_c_command 1 none 0 acc 0.6410 ± 0.0152
- blimp_principle_A_case_1 1 none 0 acc 1.0000 ± 0
- blimp_principle_A_case_2 1 none 0 acc 0.7200 ± 0.0142
- blimp_principle_A_domain_1 1 none 0 acc 0.7350 ± 0.0140
- blimp_principle_A_domain_2 1 none 0 acc 0.6190 ± 0.0154
- blimp_principle_A_domain_3 1 none 0 acc 0.5460 ± 0.0158
- blimp_principle_A_reconstruction 1 none 0 acc 0.4780 ± 0.0158
- blimp_regular_plural_subject_verb_agreement_1 1 none 0 acc 0.7920 ± 0.0128
- blimp_regular_plural_subject_verb_agreement_2 1 none 0 acc 0.7970 ± 0.0127
- blimp_sentential_negation_npi_licensor_present 1 none 0 acc 1.0000 ± 0
- blimp_sentential_negation_npi_scope 1 none 0 acc 0.6700 ± 0.0149
- blimp_sentential_subject_island 1 none 0 acc 0.4350 ± 0.0157
- blimp_superlative_quantifiers_1 1 none 0 acc 0.9270 ± 0.0082
- blimp_superlative_quantifiers_2 1 none 0 acc 0.5460 ± 0.0158
- blimp_tough_vs_raising_1 1 none 0 acc 0.4410 ± 0.0157
- blimp_tough_vs_raising_2 1 none 0 acc 0.6850 ± 0.0147
- blimp_transitive 1 none 0 acc 0.7490 ± 0.0137
- blimp_wh_island 1 none 0 acc 0.4360 ± 0.0157
- blimp_wh_questions_object_gap 1 none 0 acc 0.5880 ± 0.0156
- blimp_wh_questions_subject_gap 1 none 0 acc 0.8800 ± 0.0103
- blimp_wh_questions_subject_gap_long_distance 1 none 0 acc 0.9460 ± 0.0072
- blimp_wh_vs_that_no_gap 1 none 0 acc 0.9810 ± 0.0043
- blimp_wh_vs_that_no_gap_long_distance 1 none 0 acc 0.9880 ± 0.0034
- blimp_wh_vs_that_with_gap 1 none 0 acc 0.1180 ± 0.0102
- blimp_wh_vs_that_with_gap_long_distance 1 none 0 acc 0.0340 ± 0.0057
wikitext 2 none 0 bits_per_byte 1.4123 ± N/A
none 0 byte_perplexity 2.6617 ± N/A
none 0 word_perplexity 187.7215 ± N/A
Groups Version Filter n-shot Metric Value Stderr
blimp 2 none acc 0.6349 ± 0.0016