| Inference benchmark: A: openmed-base vs B: haremb |
| device : cuda dtype: torch.bfloat16 |
| ctx : 1024 |
|
|
| A: openmed-base (reference / teacher) |
| load : 0.71s |
| eval : 64.66s on 212,909 tokens (3293 tok/s) |
| Performance: |
| total params : 1399.61M (139.35M dense + 1260.26M MoE-experts) |
| active params / token : 178.73M (memory footprint — embed lookup + top_4/128 experts: 128.04M embed + 39.38M MoE-active + 11.31M attn/norm/head) |
| compute params / token : 50.69M (matmul FLOPs only — embedding lookup excluded) |
| GFLOP / token (fwd, MAC×2): 0.101 |
| weights size (on disk) : — |
| weights size (in RAM) : 2.61 GiB |
| weights resident (GPU) : 2.61 GiB |
| peak GPU mem (eval, ctx=1024) : 3.30 GiB |
|
|
| B: haremb (this checkpoint) |
| load : 0.10s |
| eval : 33.56s on 212,909 tokens (6343 tok/s) |
| Performance: |
| total params : 287.11M (129.58M dense + 157.53M MoE-experts) |
| active params / token : 134.50M (memory footprint — embed lookup + top_4/128 experts: 128.04M embed + 4.92M MoE-active + 1.54M attn/norm/head) |
| compute params / token : 6.46M (matmul FLOPs only — embedding lookup excluded) |
| GFLOP / token (fwd, MAC×2): 0.013 |
| weights size (on disk) : 547.6 MiB |
| weights size (in RAM) : 547.6 MiB |
| weights resident (GPU) : 548.3 MiB |
| peak GPU mem (eval, ctx=1024) : 1.22 GiB |
|
|
| B vs A (haremb vs openmed-base): |
| total params : 4.87× smaller |
| active params / token : 1.33× less [memory] |
| compute params / token : 7.85× cheaper [FLOPs] |
| GFLOP / token : 7.85× cheaper |
| weights size (on disk) : — |
| weights in RAM : 4.87× smaller |
| peak GPU mem (eval) : 2.70× less |
| throughput : 1.93× faster |
|
|
| Sample inference (load → tokenize → forward → viterbi-decode → spans): |
| text: 'Patient Sarah Johnson (DOB 03/15/1985), MRN 4872910, phone 415-555-0123, email sarah.johnson@example.com, credit card 4111-1111-1111-1111.' |
| forward latency: 65.8ms (53 tokens) |
| detected 7 spans: |
| [ 1, 2) first_name 'Sarah' |
| [ 2, 3) last_name 'Johnson' |
| [ 6, 12) date '03/15/1985' |
| [ 16, 19) phone_number '4872910' |
| [ 22, 28) phone_number '415-555-0123' |
| [ 30, 37) email 'sarah.johnson@example.com' |
| [ 41, 52) credit_debit_card '4111-1111-1111-1111' |
|
|