fblgit's picture
Upload folder using huggingface_hub
f0f5785
Inference benchmark: A: openmed-base vs B: haremb
device : cuda dtype: torch.bfloat16
ctx : 1024
A: openmed-base (reference / teacher)
load : 0.71s
eval : 64.66s on 212,909 tokens (3293 tok/s)
Performance:
total params : 1399.61M (139.35M dense + 1260.26M MoE-experts)
active params / token : 178.73M (memory footprint — embed lookup + top_4/128 experts: 128.04M embed + 39.38M MoE-active + 11.31M attn/norm/head)
compute params / token : 50.69M (matmul FLOPs only — embedding lookup excluded)
GFLOP / token (fwd, MAC×2): 0.101
weights size (on disk) : —
weights size (in RAM) : 2.61 GiB
weights resident (GPU) : 2.61 GiB
peak GPU mem (eval, ctx=1024) : 3.30 GiB
B: haremb (this checkpoint)
load : 0.10s
eval : 33.56s on 212,909 tokens (6343 tok/s)
Performance:
total params : 287.11M (129.58M dense + 157.53M MoE-experts)
active params / token : 134.50M (memory footprint — embed lookup + top_4/128 experts: 128.04M embed + 4.92M MoE-active + 1.54M attn/norm/head)
compute params / token : 6.46M (matmul FLOPs only — embedding lookup excluded)
GFLOP / token (fwd, MAC×2): 0.013
weights size (on disk) : 547.6 MiB
weights size (in RAM) : 547.6 MiB
weights resident (GPU) : 548.3 MiB
peak GPU mem (eval, ctx=1024) : 1.22 GiB
B vs A (haremb vs openmed-base):
total params : 4.87× smaller
active params / token : 1.33× less [memory]
compute params / token : 7.85× cheaper [FLOPs]
GFLOP / token : 7.85× cheaper
weights size (on disk) : —
weights in RAM : 4.87× smaller
peak GPU mem (eval) : 2.70× less
throughput : 1.93× faster
Sample inference (load → tokenize → forward → viterbi-decode → spans):
text: 'Patient Sarah Johnson (DOB 03/15/1985), MRN 4872910, phone 415-555-0123, email sarah.johnson@example.com, credit card 4111-1111-1111-1111.'
forward latency: 65.8ms (53 tokens)
detected 7 spans:
[ 1, 2) first_name 'Sarah'
[ 2, 3) last_name 'Johnson'
[ 6, 12) date '03/15/1985'
[ 16, 19) phone_number '4872910'
[ 22, 28) phone_number '415-555-0123'
[ 30, 37) email 'sarah.johnson@example.com'
[ 41, 52) credit_debit_card '4111-1111-1111-1111'