Zishan-Shao's picture
Upload folder using huggingface_hub
aa0e435 verified
kind task eval_mode base_acc_scan ablt_acc_scan flips_scan Patched@0 (rescue%, Δm) Patched@full (rescue%, Δm) Patched(self) (rescue%, Δm) Patched(transfer) (rescue%, Δm) Cross-example donor (rescue%, Δm) Donor mismatch (rescue%, Δm) Shared coeff permute (rescue%, Δm) Shared coeff signflip (rescue%, Δm) Rand vec in shared (rescue%, Δm) Rand subspace (rescue%, Δm) Nonshared patch (rescue%, Δm)
flipset aqua 0.209 0.220 42
flipset aqua 0.209 0.181 43
flipset aqua 0.209 0.220 42 73.8%, 3.311 78.6%, 3.445
flipset aqua 0.209 0.220 42 73.8%, 3.311 78.6%, 3.387
flipset aqua 0.209 0.181 43 79.1%, 3.568 81.4%, 3.640
flipset aqua 0.209 0.220 42 73.8%, 3.311 76.2%, 3.293
flipset aqua 0.209 0.220 42 73.8%, 3.312 78.6%, 3.412
openanswer gsm8k gen_math 0.039 0.031 9 88.9%, - 77.8%, - 0.0%, - 0.0%, - 0.0%, -
openanswer gsm8k pair_logprob 0.625 0.594 31 35.5%, 1.551 35.5%, 1.537 3.2%, -0.344 0.0%, -0.264 0.0%, 0.000
openanswer humaneval gen_code_compile 0 21.1%, - 21.1%, - 26.3%, - 18.4%, - 5.3%, -
openanswer humaneval pair_logprob 0.659 0.640 8 75.0%, 2.106 62.5%, 1.612 0.0%, -0.528 12.5%, -0.117 12.5%, 0.049
subspace_mc aqua 0.209 0.220 42 73.8%, 3.311 100.0%, 3.695 76.2%, 3.299 4.8%, 0.384 4.8%, 0.285 0.0%, -0.000
subspace_mc arc_challenge 0.514 0.416 58 89.7%, 3.551 100.0%, 3.987 89.7%, 3.554 10.3%, -0.015 5.2%, -0.236 0.0%, 0.000
subspace_mc commonsenseqa 0.609 0.387 75 90.7%, 2.992 100.0%, 3.337 89.3%, 2.997 13.3%, -0.034 8.0%, -0.176 0.0%, -0.000
subspace_mc logiqa 0.309 0.281 62 80.6%, 4.258 100.0%, 4.290 82.3%, 4.253 3.2%, 0.527 3.2%, 0.194 0.0%, 0.000
subspace_mc openbookqa 0.531 0.387 66 86.4%, 3.095 100.0%, 3.479 86.4%, 3.103 10.6%, -0.232 4.5%, -0.488 0.0%, 0.000
subspace_mc piqa 0.676 0.535 87 83.9%, 3.007 100.0%, 3.462 82.8%, 3.007 1.1%, 0.173 2.3%, -0.072 0.0%, 0.000
subspace_mc qasc 0.469 0.293 59 91.5%, 3.273 100.0%, 3.725 89.8%, 3.268 10.2%, -0.072 10.2%, -0.319 0.0%, 0.000