README: B300 HBM3e 288 GB spec
Browse files
README.md
CHANGED
|
@@ -77,7 +77,7 @@ Acceptance does not degrade under batching — flat at 88.0% ± 0.4% across c=1
|
|
| 77 |
|
| 78 |
## Recommended serving config
|
| 79 |
|
| 80 |
-
TP=4 on 4× B300
|
| 81 |
|
| 82 |
## Quick start
|
| 83 |
|
|
|
|
| 77 |
|
| 78 |
## Recommended serving config
|
| 79 |
|
| 80 |
+
TP=4 on 4× B300 SXM6 (288 GB HBM3e per GPU). On this artifact, TP=8 is **slower** than TP=4 at c≥4 batched concurrencies — by up to 21.6% at c=16. Per-rank MoE expert shards at TP=8 are small enough to underutilize NVFP4 tensor-core kernels on B300. TP=4 is the right operating point for this artifact in production.
|
| 81 |
|
| 82 |
## Quick start
|
| 83 |
|