| Benchmark: A: openmed-base vs B: haremb |
| Dataset: nvidia/Nemotron-PII, split=test, eval_pct=1.0, ctx=1024, seed=42, n_docs=1000 |
| Eval tokens scored: 212,909 |
|
|
| === Aggregate === |
| A: openmed-base RAW span_F1=0.9174 P=0.9125 R=0.9223 token_acc=0.9895 non_o_recall=0.9685 spans=8627/8720/7957 |
| A: openmed-base VITERBI span_F1=0.9434 P=0.9531 R=0.9338 token_acc=0.9900 non_o_recall=0.9703 spans=8627/8452/8056 |
| B: haremb RAW span_F1=0.7741 P=0.7186 R=0.8388 token_acc=0.9831 non_o_recall=0.9467 spans=8627/10069/7236 |
| B: haremb VITERBI span_F1=0.9288 P=0.9396 R=0.9182 token_acc=0.9885 non_o_recall=0.9637 spans=8627/8430/7921 |
|
|
| Gap B vs A (viterbi span_F1): -0.0146 |
|
|
| Throughput: A: openmed-base 3293 tok/s (1399.61M params) |
| B: haremb 6343 tok/s (287.11M params) |
|
|
| === Performance === |
| metric A: openmed-base B: haremb B vs A |
| ------------------------------- --------------- --------- ------------- |
| total params (M) 1399.61 287.11 4.87× smaller |
| dense params (M) 139.35 129.58 1.08× smaller |
| MoE expert params (M) 1260.26 157.53 8.00× smaller |
| active params/token (M, mem) 178.73 134.50 1.33× less |
| compute params/token (M, FLOPs) 50.69 6.46 7.85× cheaper |
| GFLOP / token 0.1014 0.0129 7.85× cheaper |
| disk size (MiB) 547.6 |
| weights in RAM (MiB) 2669.5 547.6 4.87× smaller |
| peak GPU mem eval (MiB) 3376.2 1248.6 2.70× less |
| throughput (tok/s) 3293 6343 1.93× faster |
|
|
| === Pairwise (viterbi, all gold tokens) — A: openmed-base vs B: haremb === |
| both_correct 209830 (98.55%) |
| only_A_correct 958 (0.45%) |
| only_B_correct 633 (0.30%) |
| both_wrong 1488 (0.70%) |
|
|
| === Pairwise (viterbi, gold non-O tokens) — A: openmed-base vs B: haremb === |
| both_correct 43902 (95.47%) |
| only_A_correct 717 (1.56%) |
| only_B_correct 417 (0.91%) |
| both_wrong 951 (2.07%) |
|
|
| === Worst B-net wins by gold category — A: openmed-base ahead (top 15) === |
| company_name net_B= -142 A_only= 216 B_only= 74 both_wrong= 119 |
| first_name net_B= -75 A_only= 82 B_only= 7 both_wrong= 19 |
| last_name net_B= -65 A_only= 67 B_only= 2 both_wrong= 38 |
| occupation net_B= -55 A_only= 79 B_only= 24 both_wrong= 286 |
| device_identifier net_B= -29 A_only= 29 B_only= 0 both_wrong= 0 |
| user_name net_B= -26 A_only= 29 B_only= 3 both_wrong= 10 |
| city net_B= -16 A_only= 32 B_only= 16 both_wrong= 36 |
| street_address net_B= -13 A_only= 14 B_only= 1 both_wrong= 0 |
| date_of_birth net_B= -12 A_only= 12 B_only= 0 both_wrong= 0 |
| email net_B= -8 A_only= 9 B_only= 1 both_wrong= 0 |
| medical_record_number net_B= -7 A_only= 7 B_only= 0 both_wrong= 0 |
| phone_number net_B= -6 A_only= 6 B_only= 0 both_wrong= 0 |
| account_number net_B= -6 A_only= 10 B_only= 4 both_wrong= 0 |
| tax_id net_B= -6 A_only= 6 B_only= 0 both_wrong= 0 |
| race_ethnicity net_B= -5 A_only= 15 B_only= 10 both_wrong= 11 |
|
|
| === Best B-net wins by gold category — B: haremb ahead (top 15) === |
| date net_B= +46 A_only= 15 B_only= 61 both_wrong= 145 |
| fax_number net_B= +29 A_only= 1 B_only= 30 both_wrong= 44 |
| unique_id net_B= +26 A_only= 4 B_only= 30 both_wrong= 0 |
| ssn net_B= +18 A_only= 0 B_only= 18 both_wrong= 0 |
| time net_B= +12 A_only= 9 B_only= 21 both_wrong= 87 |
| political_view net_B= +11 A_only= 3 B_only= 14 both_wrong= 6 |
| coordinate net_B= +11 A_only= 0 B_only= 11 both_wrong= 0 |
| customer_id net_B= +8 A_only= 4 B_only= 12 both_wrong= 7 |
| certificate_license_number net_B= +7 A_only= 0 B_only= 7 both_wrong= 0 |
| education_level net_B= +6 A_only= 2 B_only= 8 both_wrong= 28 |
| state net_B= +6 A_only= 13 B_only= 19 both_wrong= 20 |
| blood_type net_B= +6 A_only= 0 B_only= 6 both_wrong= 0 |
| gender net_B= +2 A_only= 0 B_only= 2 both_wrong= 2 |
| http_cookie net_B= +2 A_only= 0 B_only= 2 both_wrong= 3 |
| country net_B= +2 A_only= 0 B_only= 2 both_wrong= 19 |
|
|
| === Per-category span F1 (viterbi) === |
| -- A: openmed-base -- |
| account_number F1=0.9929 P=0.9929 R=0.9929 (140/140/139) |
| age F1=0.8840 P=0.8511 R=0.9195 (87/94/80) |
| api_key F1=0.9921 P=0.9844 R=1.0000 (63/64/63) |
| bank_routing_number F1=0.9867 P=0.9867 R=0.9867 (75/75/74) |
| biometric_identifier F1=1.0000 P=1.0000 R=1.0000 (113/113/113) |
| blood_type F1=0.9032 P=0.9032 R=0.9032 (62/62/56) |
| certificate_license_number F1=0.9697 P=1.0000 R=0.9412 (34/32/32) |
| city F1=0.9154 P=0.9583 R=0.8762 (210/192/184) |
| company_name F1=0.8824 P=0.9143 R=0.8526 (563/525/480) |
| coordinate F1=0.8000 P=0.8000 R=0.8000 (55/55/44) |
| country F1=0.9431 P=0.9324 R=0.9539 (217/222/207) |
| county F1=0.9519 P=0.9612 R=0.9429 (105/103/99) |
| credit_debit_card F1=0.9967 P=0.9934 R=1.0000 (150/151/150) |
| customer_id F1=0.9849 P=1.0000 R=0.9703 (202/196/196) |
| cvv F1=0.9787 P=1.0000 R=0.9583 (48/46/46) |
| date F1=0.9440 P=0.9571 R=0.9312 (814/792/758) |
| date_of_birth F1=1.0000 P=1.0000 R=1.0000 (164/164/164) |
| date_time F1=0.9635 P=0.9429 R=0.9851 (134/140/132) |
| device_identifier F1=0.9714 P=0.9444 R=1.0000 (51/54/51) |
| education_level F1=0.9091 P=0.9524 R=0.8696 (92/84/80) |
| email F1=0.9971 P=0.9961 R=0.9980 (511/512/510) |
| employee_id F1=0.9948 P=1.0000 R=0.9896 (96/95/95) |
| employment_status F1=0.9478 P=0.9593 R=0.9365 (126/123/118) |
| fax_number F1=0.9091 P=0.9848 R=0.8442 (77/66/65) |
| first_name F1=0.9766 P=0.9716 R=0.9816 (871/880/855) |
| gender F1=0.9737 P=0.9867 R=0.9610 (77/75/74) |
| health_plan_beneficiary_number F1=1.0000 P=1.0000 R=1.0000 (103/103/103) |
| http_cookie F1=0.9307 P=0.9400 R=0.9216 (51/50/47) |
| ipv4 F1=1.0000 P=1.0000 R=1.0000 (59/59/59) |
| ipv6 F1=1.0000 P=1.0000 R=1.0000 (21/21/21) |
| language F1=0.9000 P=0.9000 R=0.9000 (90/90/81) |
| last_name F1=0.9744 P=0.9767 R=0.9721 (646/643/628) |
| license_plate F1=1.0000 P=1.0000 R=1.0000 (55/55/55) |
| mac_address F1=1.0000 P=1.0000 R=1.0000 (30/30/30) |
| medical_record_number F1=1.0000 P=1.0000 R=1.0000 (103/103/103) |
| national_id F1=1.0000 P=1.0000 R=1.0000 (28/28/28) |
| occupation F1=0.6522 P=0.7721 R=0.5645 (372/272/210) |
| password F1=0.9217 P=0.9636 R=0.8833 (60/55/53) |
| phone_number F1=0.9751 P=0.9514 R=1.0000 (235/247/235) |
| pin F1=0.9302 P=0.8955 R=0.9677 (62/67/60) |
| political_view F1=0.8387 P=0.8125 R=0.8667 (45/48/39) |
| postcode F1=0.9934 P=0.9868 R=1.0000 (75/76/75) |
| race_ethnicity F1=0.8889 P=0.8889 R=0.8889 (81/81/72) |
| religious_belief F1=0.8936 P=0.8750 R=0.9130 (46/48/42) |
| sexuality F1=0.9667 P=1.0000 R=0.9355 (31/29/29) |
| ssn F1=0.9440 P=0.9365 R=0.9516 (62/63/59) |
| state F1=0.9198 P=0.9399 R=0.9005 (191/183/172) |
| street_address F1=0.9894 P=0.9842 R=0.9947 (188/190/187) |
| swift_bic F1=0.9905 P=0.9811 R=1.0000 (52/53/52) |
| tax_id F1=1.0000 P=1.0000 R=1.0000 (15/15/15) |
| time F1=0.8209 P=0.8514 R=0.7926 (188/175/149) |
| unique_id F1=0.9600 P=1.0000 R=0.9231 (13/12/12) |
| url F1=0.9725 P=0.9687 R=0.9763 (380/383/371) |
| user_name F1=0.9497 P=0.9264 R=0.9742 (155/163/151) |
| vehicle_identifier F1=0.9815 P=0.9636 R=1.0000 (53/55/53) |
| -- B: haremb -- |
| account_number F1=0.9751 P=0.9716 R=0.9786 (140/141/137) |
| age F1=0.8571 P=0.8211 R=0.8966 (87/95/78) |
| api_key F1=0.9921 P=0.9844 R=1.0000 (63/64/63) |
| bank_routing_number F1=0.9933 P=1.0000 R=0.9867 (75/74/74) |
| biometric_identifier F1=1.0000 P=1.0000 R=1.0000 (113/113/113) |
| blood_type F1=1.0000 P=1.0000 R=1.0000 (62/62/62) |
| certificate_license_number F1=0.9855 P=0.9714 R=1.0000 (34/35/34) |
| city F1=0.8932 P=0.9109 R=0.8762 (210/202/184) |
| company_name F1=0.7766 P=0.8120 R=0.7442 (563/516/419) |
| coordinate F1=1.0000 P=1.0000 R=1.0000 (55/55/55) |
| country F1=0.9543 P=0.9457 R=0.9631 (217/221/209) |
| county F1=0.9340 P=0.9252 R=0.9429 (105/107/99) |
| credit_debit_card F1=0.9934 P=0.9868 R=1.0000 (150/152/150) |
| customer_id F1=0.9779 P=0.9707 R=0.9851 (202/205/199) |
| cvv F1=0.9792 P=0.9792 R=0.9792 (48/48/47) |
| date F1=0.9510 P=0.9599 R=0.9423 (814/799/767) |
| date_of_birth F1=0.9939 P=1.0000 R=0.9878 (164/162/162) |
| date_time F1=0.9635 P=0.9429 R=0.9851 (134/140/132) |
| device_identifier F1=0.9515 P=0.9423 R=0.9608 (51/52/49) |
| education_level F1=0.9091 P=0.9524 R=0.8696 (92/84/80) |
| email F1=0.9912 P=0.9883 R=0.9941 (511/514/508) |
| employee_id F1=0.9895 P=1.0000 R=0.9792 (96/94/94) |
| employment_status F1=0.9562 P=0.9600 R=0.9524 (126/125/120) |
| fax_number F1=0.9396 P=0.9722 R=0.9091 (77/72/70) |
| first_name F1=0.9299 P=0.9231 R=0.9369 (871/884/816) |
| gender F1=0.9870 P=0.9870 R=0.9870 (77/77/76) |
| health_plan_beneficiary_number F1=1.0000 P=1.0000 R=1.0000 (103/103/103) |
| http_cookie F1=0.9608 P=0.9608 R=0.9608 (51/51/49) |
| ipv4 F1=1.0000 P=1.0000 R=1.0000 (59/59/59) |
| ipv6 F1=1.0000 P=1.0000 R=1.0000 (21/21/21) |
| language F1=0.8966 P=0.9286 R=0.8667 (90/84/78) |
| last_name F1=0.9308 P=0.9457 R=0.9164 (646/626/592) |
| license_plate F1=1.0000 P=1.0000 R=1.0000 (55/55/55) |
| mac_address F1=1.0000 P=1.0000 R=1.0000 (30/30/30) |
| medical_record_number F1=0.9903 P=0.9903 R=0.9903 (103/103/102) |
| national_id F1=0.9825 P=0.9655 R=1.0000 (28/29/28) |
| occupation F1=0.5981 P=0.7440 R=0.5000 (372/250/186) |
| password F1=0.9391 P=0.9818 R=0.9000 (60/55/54) |
| phone_number F1=0.9730 P=0.9512 R=0.9957 (235/246/234) |
| pin F1=0.9508 P=0.9667 R=0.9355 (62/60/58) |
| political_view F1=0.8723 P=0.8367 R=0.9111 (45/49/41) |
| postcode F1=0.9934 P=0.9868 R=1.0000 (75/76/75) |
| race_ethnicity F1=0.8590 P=0.8933 R=0.8272 (81/75/67) |
| religious_belief F1=0.9348 P=0.9348 R=0.9348 (46/46/43) |
| sexuality F1=0.9492 P=1.0000 R=0.9032 (31/28/28) |
| ssn F1=0.9688 P=0.9394 R=1.0000 (62/66/62) |
| state F1=0.9105 P=0.9153 R=0.9058 (191/189/173) |
| street_address F1=0.9894 P=0.9894 R=0.9894 (188/188/186) |
| swift_bic F1=0.9905 P=0.9811 R=1.0000 (52/53/52) |
| tax_id F1=0.9655 P=1.0000 R=0.9333 (15/14/14) |
| time F1=0.8421 P=0.8786 R=0.8085 (188/173/152) |
| unique_id F1=0.8571 P=0.8000 R=0.9231 (13/15/12) |
| url F1=0.9752 P=0.9688 R=0.9816 (380/385/373) |
| user_name F1=0.9416 P=0.9477 R=0.9355 (155/153/145) |
| vehicle_identifier F1=0.9630 P=0.9455 R=0.9811 (53/55/52) |
|
|