File size: 9,927 Bytes
e8882d3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
|                           Tasks                            |Version|Filter|n-shot|    Metric     |   |  Value   |   |Stderr|
|------------------------------------------------------------|------:|------|-----:|---------------|---|---------:|---|------|
|blimp                                                       |      2|none  |     0|acc            |↑  |    0.5354|±  |0.0017|
| - blimp_adjunct_island                                     |      1|none  |     0|acc            |↑  |    0.5980|±  |0.0155|
| - blimp_anaphor_gender_agreement                           |      1|none  |     0|acc            |↑  |    0.3130|±  |0.0147|
| - blimp_anaphor_number_agreement                           |      1|none  |     0|acc            |↑  |    0.5090|±  |0.0158|
| - blimp_animate_subject_passive                            |      1|none  |     0|acc            |↑  |    0.5750|±  |0.0156|
| - blimp_animate_subject_trans                              |      1|none  |     0|acc            |↑  |    0.7470|±  |0.0138|
| - blimp_causative                                          |      1|none  |     0|acc            |↑  |    0.4810|±  |0.0158|
| - blimp_complex_NP_island                                  |      1|none  |     0|acc            |↑  |    0.4880|±  |0.0158|
| - blimp_coordinate_structure_constraint_complex_left_branch|      1|none  |     0|acc            |↑  |    0.1420|±  |0.0110|
| - blimp_coordinate_structure_constraint_object_extraction  |      1|none  |     0|acc            |↑  |    0.5820|±  |0.0156|
| - blimp_determiner_noun_agreement_1                        |      1|none  |     0|acc            |↑  |    0.6580|±  |0.0150|
| - blimp_determiner_noun_agreement_2                        |      1|none  |     0|acc            |↑  |    0.6320|±  |0.0153|
| - blimp_determiner_noun_agreement_irregular_1              |      1|none  |     0|acc            |↑  |    0.5150|±  |0.0158|
| - blimp_determiner_noun_agreement_irregular_2              |      1|none  |     0|acc            |↑  |    0.6980|±  |0.0145|
| - blimp_determiner_noun_agreement_with_adj_2               |      1|none  |     0|acc            |↑  |    0.5780|±  |0.0156|
| - blimp_determiner_noun_agreement_with_adj_irregular_1     |      1|none  |     0|acc            |↑  |    0.4220|±  |0.0156|
| - blimp_determiner_noun_agreement_with_adj_irregular_2     |      1|none  |     0|acc            |↑  |    0.5170|±  |0.0158|
| - blimp_determiner_noun_agreement_with_adjective_1         |      1|none  |     0|acc            |↑  |    0.6070|±  |0.0155|
| - blimp_distractor_agreement_relational_noun               |      1|none  |     0|acc            |↑  |    0.3120|±  |0.0147|
| - blimp_distractor_agreement_relative_clause               |      1|none  |     0|acc            |↑  |    0.3110|±  |0.0146|
| - blimp_drop_argument                                      |      1|none  |     0|acc            |↑  |    0.7270|±  |0.0141|
| - blimp_ellipsis_n_bar_1                                   |      1|none  |     0|acc            |↑  |    0.2180|±  |0.0131|
| - blimp_ellipsis_n_bar_2                                   |      1|none  |     0|acc            |↑  |    0.3480|±  |0.0151|
| - blimp_existential_there_object_raising                   |      1|none  |     0|acc            |↑  |    0.6860|±  |0.0147|
| - blimp_existential_there_quantifiers_1                    |      1|none  |     0|acc            |↑  |    0.8100|±  |0.0124|
| - blimp_existential_there_quantifiers_2                    |      1|none  |     0|acc            |↑  |    0.2950|±  |0.0144|
| - blimp_existential_there_subject_raising                  |      1|none  |     0|acc            |↑  |    0.6880|±  |0.0147|
| - blimp_expletive_it_object_raising                        |      1|none  |     0|acc            |↑  |    0.6570|±  |0.0150|
| - blimp_inchoative                                         |      1|none  |     0|acc            |↑  |    0.3850|±  |0.0154|
| - blimp_intransitive                                       |      1|none  |     0|acc            |↑  |    0.5170|±  |0.0158|
| - blimp_irregular_past_participle_adjectives               |      1|none  |     0|acc            |↑  |    0.6620|±  |0.0150|
| - blimp_irregular_past_participle_verbs                    |      1|none  |     0|acc            |↑  |    0.5050|±  |0.0158|
| - blimp_irregular_plural_subject_verb_agreement_1          |      1|none  |     0|acc            |↑  |    0.5880|±  |0.0156|
| - blimp_irregular_plural_subject_verb_agreement_2          |      1|none  |     0|acc            |↑  |    0.5860|±  |0.0156|
| - blimp_left_branch_island_echo_question                   |      1|none  |     0|acc            |↑  |    0.9020|±  |0.0094|
| - blimp_left_branch_island_simple_question                 |      1|none  |     0|acc            |↑  |    0.2310|±  |0.0133|
| - blimp_matrix_question_npi_licensor_present               |      1|none  |     0|acc            |↑  |    0.0380|±  |0.0060|
| - blimp_npi_present_1                                      |      1|none  |     0|acc            |↑  |    0.6520|±  |0.0151|
| - blimp_npi_present_2                                      |      1|none  |     0|acc            |↑  |    0.6390|±  |0.0152|
| - blimp_only_npi_licensor_present                          |      1|none  |     0|acc            |↑  |    0.0400|±  |0.0062|
| - blimp_only_npi_scope                                     |      1|none  |     0|acc            |↑  |    0.0020|±  |0.0014|
| - blimp_passive_1                                          |      1|none  |     0|acc            |↑  |    0.6520|±  |0.0151|
| - blimp_passive_2                                          |      1|none  |     0|acc            |↑  |    0.6280|±  |0.0153|
| - blimp_principle_A_c_command                              |      1|none  |     0|acc            |↑  |    0.6890|±  |0.0146|
| - blimp_principle_A_case_1                                 |      1|none  |     0|acc            |↑  |    0.9990|±  |0.0010|
| - blimp_principle_A_case_2                                 |      1|none  |     0|acc            |↑  |    0.4450|±  |0.0157|
| - blimp_principle_A_domain_1                               |      1|none  |     0|acc            |↑  |    0.8820|±  |0.0102|
| - blimp_principle_A_domain_2                               |      1|none  |     0|acc            |↑  |    0.5450|±  |0.0158|
| - blimp_principle_A_domain_3                               |      1|none  |     0|acc            |↑  |    0.4690|±  |0.0158|
| - blimp_principle_A_reconstruction                         |      1|none  |     0|acc            |↑  |    0.3830|±  |0.0154|
| - blimp_regular_plural_subject_verb_agreement_1            |      1|none  |     0|acc            |↑  |    0.6890|±  |0.0146|
| - blimp_regular_plural_subject_verb_agreement_2            |      1|none  |     0|acc            |↑  |    0.5760|±  |0.0156|
| - blimp_sentential_negation_npi_licensor_present           |      1|none  |     0|acc            |↑  |    0.9990|±  |0.0010|
| - blimp_sentential_negation_npi_scope                      |      1|none  |     0|acc            |↑  |    0.4590|±  |0.0158|
| - blimp_sentential_subject_island                          |      1|none  |     0|acc            |↑  |    0.2760|±  |0.0141|
| - blimp_superlative_quantifiers_1                          |      1|none  |     0|acc            |↑  |    0.3040|±  |0.0146|
| - blimp_superlative_quantifiers_2                          |      1|none  |     0|acc            |↑  |    0.3620|±  |0.0152|
| - blimp_tough_vs_raising_1                                 |      1|none  |     0|acc            |↑  |    0.3310|±  |0.0149|
| - blimp_tough_vs_raising_2                                 |      1|none  |     0|acc            |↑  |    0.6970|±  |0.0145|
| - blimp_transitive                                         |      1|none  |     0|acc            |↑  |    0.6560|±  |0.0150|
| - blimp_wh_island                                          |      1|none  |     0|acc            |↑  |    0.5110|±  |0.0158|
| - blimp_wh_questions_object_gap                            |      1|none  |     0|acc            |↑  |    0.6180|±  |0.0154|
| - blimp_wh_questions_subject_gap                           |      1|none  |     0|acc            |↑  |    0.9480|±  |0.0070|
| - blimp_wh_questions_subject_gap_long_distance             |      1|none  |     0|acc            |↑  |    0.8930|±  |0.0098|
| - blimp_wh_vs_that_no_gap                                  |      1|none  |     0|acc            |↑  |    0.9960|±  |0.0020|
| - blimp_wh_vs_that_no_gap_long_distance                    |      1|none  |     0|acc            |↑  |    0.9910|±  |0.0030|
| - blimp_wh_vs_that_with_gap                                |      1|none  |     0|acc            |↑  |    0.0090|±  |0.0030|
| - blimp_wh_vs_that_with_gap_long_distance                  |      1|none  |     0|acc            |↑  |    0.0040|±  |0.0020|
|arc_easy                                                    |      1|none  |     0|acc            |↑  |    0.2677|±  |0.0091|
|                                                            |       |none  |     0|acc_norm       |↑  |    0.2841|±  |0.0093|
|wikitext                                                    |      2|none  |     0|bits_per_byte  |↓  |    2.9624|±  |   N/A|
|                                                            |       |none  |     0|byte_perplexity|↓  |    7.7940|±  |   N/A|
|                                                            |       |none  |     0|word_perplexity|↓  |58699.2441|±  |   N/A|

|Groups|Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------|------:|------|-----:|------|---|-----:|---|-----:|
|blimp |      2|none  |     0|acc   |↑  |0.5354|±  |0.0017|