pastapaul commited on
Commit
4c80bed
·
verified ·
1 Parent(s): f3911aa

README: B300 HBM3e 288 GB spec

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -77,7 +77,7 @@ Acceptance does not degrade under batching — flat at 88.0% ± 0.4% across c=1
77
 
78
  ## Recommended serving config
79
 
80
- TP=4 on 4× B300 (or equivalent Blackwell SXM6 with ≥250 GB HBM each). On this artifact, TP=8 is **slower** than TP=4 at c≥4 batched concurrencies — by up to 21.6% at c=16. Per-rank MoE expert shards at TP=8 are small enough to underutilize NVFP4 tensor-core kernels on B300. TP=4 is the right operating point for this artifact in production.
81
 
82
  ## Quick start
83
 
 
77
 
78
  ## Recommended serving config
79
 
80
+ TP=4 on 4× B300 SXM6 (288 GB HBM3e per GPU). On this artifact, TP=8 is **slower** than TP=4 at c≥4 batched concurrencies — by up to 21.6% at c=16. Per-rank MoE expert shards at TP=8 are small enough to underutilize NVFP4 tensor-core kernels on B300. TP=4 is the right operating point for this artifact in production.
81
 
82
  ## Quick start
83