Benchmark: A: openmed-base vs B: haremb Dataset: nvidia/Nemotron-PII, split=test, eval_pct=1.0, ctx=1024, seed=42, n_docs=1000 Eval tokens scored: 212,909 === Aggregate === A: openmed-base RAW span_F1=0.9174 P=0.9125 R=0.9223 token_acc=0.9895 non_o_recall=0.9685 spans=8627/8720/7957 A: openmed-base VITERBI span_F1=0.9434 P=0.9531 R=0.9338 token_acc=0.9900 non_o_recall=0.9703 spans=8627/8452/8056 B: haremb RAW span_F1=0.7741 P=0.7186 R=0.8388 token_acc=0.9831 non_o_recall=0.9467 spans=8627/10069/7236 B: haremb VITERBI span_F1=0.9288 P=0.9396 R=0.9182 token_acc=0.9885 non_o_recall=0.9637 spans=8627/8430/7921 Gap B vs A (viterbi span_F1): -0.0146 Throughput: A: openmed-base 3293 tok/s (1399.61M params) B: haremb 6343 tok/s (287.11M params) === Performance === metric A: openmed-base B: haremb B vs A ------------------------------- --------------- --------- ------------- total params (M) 1399.61 287.11 4.87× smaller dense params (M) 139.35 129.58 1.08× smaller MoE expert params (M) 1260.26 157.53 8.00× smaller active params/token (M, mem) 178.73 134.50 1.33× less compute params/token (M, FLOPs) 50.69 6.46 7.85× cheaper GFLOP / token 0.1014 0.0129 7.85× cheaper disk size (MiB) 547.6 weights in RAM (MiB) 2669.5 547.6 4.87× smaller peak GPU mem eval (MiB) 3376.2 1248.6 2.70× less throughput (tok/s) 3293 6343 1.93× faster === Pairwise (viterbi, all gold tokens) — A: openmed-base vs B: haremb === both_correct 209830 (98.55%) only_A_correct 958 (0.45%) only_B_correct 633 (0.30%) both_wrong 1488 (0.70%) === Pairwise (viterbi, gold non-O tokens) — A: openmed-base vs B: haremb === both_correct 43902 (95.47%) only_A_correct 717 (1.56%) only_B_correct 417 (0.91%) both_wrong 951 (2.07%) === Worst B-net wins by gold category — A: openmed-base ahead (top 15) === company_name net_B= -142 A_only= 216 B_only= 74 both_wrong= 119 first_name net_B= -75 A_only= 82 B_only= 7 both_wrong= 19 last_name net_B= -65 A_only= 67 B_only= 2 both_wrong= 38 occupation net_B= -55 A_only= 79 B_only= 24 both_wrong= 286 device_identifier net_B= -29 A_only= 29 B_only= 0 both_wrong= 0 user_name net_B= -26 A_only= 29 B_only= 3 both_wrong= 10 city net_B= -16 A_only= 32 B_only= 16 both_wrong= 36 street_address net_B= -13 A_only= 14 B_only= 1 both_wrong= 0 date_of_birth net_B= -12 A_only= 12 B_only= 0 both_wrong= 0 email net_B= -8 A_only= 9 B_only= 1 both_wrong= 0 medical_record_number net_B= -7 A_only= 7 B_only= 0 both_wrong= 0 phone_number net_B= -6 A_only= 6 B_only= 0 both_wrong= 0 account_number net_B= -6 A_only= 10 B_only= 4 both_wrong= 0 tax_id net_B= -6 A_only= 6 B_only= 0 both_wrong= 0 race_ethnicity net_B= -5 A_only= 15 B_only= 10 both_wrong= 11 === Best B-net wins by gold category — B: haremb ahead (top 15) === date net_B= +46 A_only= 15 B_only= 61 both_wrong= 145 fax_number net_B= +29 A_only= 1 B_only= 30 both_wrong= 44 unique_id net_B= +26 A_only= 4 B_only= 30 both_wrong= 0 ssn net_B= +18 A_only= 0 B_only= 18 both_wrong= 0 time net_B= +12 A_only= 9 B_only= 21 both_wrong= 87 political_view net_B= +11 A_only= 3 B_only= 14 both_wrong= 6 coordinate net_B= +11 A_only= 0 B_only= 11 both_wrong= 0 customer_id net_B= +8 A_only= 4 B_only= 12 both_wrong= 7 certificate_license_number net_B= +7 A_only= 0 B_only= 7 both_wrong= 0 education_level net_B= +6 A_only= 2 B_only= 8 both_wrong= 28 state net_B= +6 A_only= 13 B_only= 19 both_wrong= 20 blood_type net_B= +6 A_only= 0 B_only= 6 both_wrong= 0 gender net_B= +2 A_only= 0 B_only= 2 both_wrong= 2 http_cookie net_B= +2 A_only= 0 B_only= 2 both_wrong= 3 country net_B= +2 A_only= 0 B_only= 2 both_wrong= 19 === Per-category span F1 (viterbi) === -- A: openmed-base -- account_number F1=0.9929 P=0.9929 R=0.9929 (140/140/139) age F1=0.8840 P=0.8511 R=0.9195 (87/94/80) api_key F1=0.9921 P=0.9844 R=1.0000 (63/64/63) bank_routing_number F1=0.9867 P=0.9867 R=0.9867 (75/75/74) biometric_identifier F1=1.0000 P=1.0000 R=1.0000 (113/113/113) blood_type F1=0.9032 P=0.9032 R=0.9032 (62/62/56) certificate_license_number F1=0.9697 P=1.0000 R=0.9412 (34/32/32) city F1=0.9154 P=0.9583 R=0.8762 (210/192/184) company_name F1=0.8824 P=0.9143 R=0.8526 (563/525/480) coordinate F1=0.8000 P=0.8000 R=0.8000 (55/55/44) country F1=0.9431 P=0.9324 R=0.9539 (217/222/207) county F1=0.9519 P=0.9612 R=0.9429 (105/103/99) credit_debit_card F1=0.9967 P=0.9934 R=1.0000 (150/151/150) customer_id F1=0.9849 P=1.0000 R=0.9703 (202/196/196) cvv F1=0.9787 P=1.0000 R=0.9583 (48/46/46) date F1=0.9440 P=0.9571 R=0.9312 (814/792/758) date_of_birth F1=1.0000 P=1.0000 R=1.0000 (164/164/164) date_time F1=0.9635 P=0.9429 R=0.9851 (134/140/132) device_identifier F1=0.9714 P=0.9444 R=1.0000 (51/54/51) education_level F1=0.9091 P=0.9524 R=0.8696 (92/84/80) email F1=0.9971 P=0.9961 R=0.9980 (511/512/510) employee_id F1=0.9948 P=1.0000 R=0.9896 (96/95/95) employment_status F1=0.9478 P=0.9593 R=0.9365 (126/123/118) fax_number F1=0.9091 P=0.9848 R=0.8442 (77/66/65) first_name F1=0.9766 P=0.9716 R=0.9816 (871/880/855) gender F1=0.9737 P=0.9867 R=0.9610 (77/75/74) health_plan_beneficiary_number F1=1.0000 P=1.0000 R=1.0000 (103/103/103) http_cookie F1=0.9307 P=0.9400 R=0.9216 (51/50/47) ipv4 F1=1.0000 P=1.0000 R=1.0000 (59/59/59) ipv6 F1=1.0000 P=1.0000 R=1.0000 (21/21/21) language F1=0.9000 P=0.9000 R=0.9000 (90/90/81) last_name F1=0.9744 P=0.9767 R=0.9721 (646/643/628) license_plate F1=1.0000 P=1.0000 R=1.0000 (55/55/55) mac_address F1=1.0000 P=1.0000 R=1.0000 (30/30/30) medical_record_number F1=1.0000 P=1.0000 R=1.0000 (103/103/103) national_id F1=1.0000 P=1.0000 R=1.0000 (28/28/28) occupation F1=0.6522 P=0.7721 R=0.5645 (372/272/210) password F1=0.9217 P=0.9636 R=0.8833 (60/55/53) phone_number F1=0.9751 P=0.9514 R=1.0000 (235/247/235) pin F1=0.9302 P=0.8955 R=0.9677 (62/67/60) political_view F1=0.8387 P=0.8125 R=0.8667 (45/48/39) postcode F1=0.9934 P=0.9868 R=1.0000 (75/76/75) race_ethnicity F1=0.8889 P=0.8889 R=0.8889 (81/81/72) religious_belief F1=0.8936 P=0.8750 R=0.9130 (46/48/42) sexuality F1=0.9667 P=1.0000 R=0.9355 (31/29/29) ssn F1=0.9440 P=0.9365 R=0.9516 (62/63/59) state F1=0.9198 P=0.9399 R=0.9005 (191/183/172) street_address F1=0.9894 P=0.9842 R=0.9947 (188/190/187) swift_bic F1=0.9905 P=0.9811 R=1.0000 (52/53/52) tax_id F1=1.0000 P=1.0000 R=1.0000 (15/15/15) time F1=0.8209 P=0.8514 R=0.7926 (188/175/149) unique_id F1=0.9600 P=1.0000 R=0.9231 (13/12/12) url F1=0.9725 P=0.9687 R=0.9763 (380/383/371) user_name F1=0.9497 P=0.9264 R=0.9742 (155/163/151) vehicle_identifier F1=0.9815 P=0.9636 R=1.0000 (53/55/53) -- B: haremb -- account_number F1=0.9751 P=0.9716 R=0.9786 (140/141/137) age F1=0.8571 P=0.8211 R=0.8966 (87/95/78) api_key F1=0.9921 P=0.9844 R=1.0000 (63/64/63) bank_routing_number F1=0.9933 P=1.0000 R=0.9867 (75/74/74) biometric_identifier F1=1.0000 P=1.0000 R=1.0000 (113/113/113) blood_type F1=1.0000 P=1.0000 R=1.0000 (62/62/62) certificate_license_number F1=0.9855 P=0.9714 R=1.0000 (34/35/34) city F1=0.8932 P=0.9109 R=0.8762 (210/202/184) company_name F1=0.7766 P=0.8120 R=0.7442 (563/516/419) coordinate F1=1.0000 P=1.0000 R=1.0000 (55/55/55) country F1=0.9543 P=0.9457 R=0.9631 (217/221/209) county F1=0.9340 P=0.9252 R=0.9429 (105/107/99) credit_debit_card F1=0.9934 P=0.9868 R=1.0000 (150/152/150) customer_id F1=0.9779 P=0.9707 R=0.9851 (202/205/199) cvv F1=0.9792 P=0.9792 R=0.9792 (48/48/47) date F1=0.9510 P=0.9599 R=0.9423 (814/799/767) date_of_birth F1=0.9939 P=1.0000 R=0.9878 (164/162/162) date_time F1=0.9635 P=0.9429 R=0.9851 (134/140/132) device_identifier F1=0.9515 P=0.9423 R=0.9608 (51/52/49) education_level F1=0.9091 P=0.9524 R=0.8696 (92/84/80) email F1=0.9912 P=0.9883 R=0.9941 (511/514/508) employee_id F1=0.9895 P=1.0000 R=0.9792 (96/94/94) employment_status F1=0.9562 P=0.9600 R=0.9524 (126/125/120) fax_number F1=0.9396 P=0.9722 R=0.9091 (77/72/70) first_name F1=0.9299 P=0.9231 R=0.9369 (871/884/816) gender F1=0.9870 P=0.9870 R=0.9870 (77/77/76) health_plan_beneficiary_number F1=1.0000 P=1.0000 R=1.0000 (103/103/103) http_cookie F1=0.9608 P=0.9608 R=0.9608 (51/51/49) ipv4 F1=1.0000 P=1.0000 R=1.0000 (59/59/59) ipv6 F1=1.0000 P=1.0000 R=1.0000 (21/21/21) language F1=0.8966 P=0.9286 R=0.8667 (90/84/78) last_name F1=0.9308 P=0.9457 R=0.9164 (646/626/592) license_plate F1=1.0000 P=1.0000 R=1.0000 (55/55/55) mac_address F1=1.0000 P=1.0000 R=1.0000 (30/30/30) medical_record_number F1=0.9903 P=0.9903 R=0.9903 (103/103/102) national_id F1=0.9825 P=0.9655 R=1.0000 (28/29/28) occupation F1=0.5981 P=0.7440 R=0.5000 (372/250/186) password F1=0.9391 P=0.9818 R=0.9000 (60/55/54) phone_number F1=0.9730 P=0.9512 R=0.9957 (235/246/234) pin F1=0.9508 P=0.9667 R=0.9355 (62/60/58) political_view F1=0.8723 P=0.8367 R=0.9111 (45/49/41) postcode F1=0.9934 P=0.9868 R=1.0000 (75/76/75) race_ethnicity F1=0.8590 P=0.8933 R=0.8272 (81/75/67) religious_belief F1=0.9348 P=0.9348 R=0.9348 (46/46/43) sexuality F1=0.9492 P=1.0000 R=0.9032 (31/28/28) ssn F1=0.9688 P=0.9394 R=1.0000 (62/66/62) state F1=0.9105 P=0.9153 R=0.9058 (191/189/173) street_address F1=0.9894 P=0.9894 R=0.9894 (188/188/186) swift_bic F1=0.9905 P=0.9811 R=1.0000 (52/53/52) tax_id F1=0.9655 P=1.0000 R=0.9333 (15/14/14) time F1=0.8421 P=0.8786 R=0.8085 (188/173/152) unique_id F1=0.8571 P=0.8000 R=0.9231 (13/15/12) url F1=0.9752 P=0.9688 R=0.9816 (380/385/373) user_name F1=0.9416 P=0.9477 R=0.9355 (155/153/145) vehicle_identifier F1=0.9630 P=0.9455 R=0.9811 (53/55/52)