Correct BF16 source size (V3 carryover 1.3 TB → V4 ~600 GB)

V4-Flash is ~284B params (~13B activated), not the 671B of V3. The 1.3 TB
BF16 source figure was V3-era math (671B × 2 bytes). V4-Flash with the
MTP block is ~600 GB BF16. The 172 GB measured artifact size is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ A DeepSeek-V4-Flash NVFP4-FP8 quantization that retains the MTP (multi-token-pre
 ## What this is
-- 172 GB across 35 safetensors shards (vs 1.3 TB BF16 source).
 - Same quantization scheme as `RedHatAI/DeepSeek-V4-Flash-NVFP4-FP8`: NVFP4 (group=16, FP8 e4m3 scales) on routed FFN experts, FP8_BLOCK 128×128 on attention.
 - MTP block (`mtp.0.*`, 799 tensors) kept at BF16 — not dropped at load time, not double-quantized when the MTP draft model is constructed.

 ## What this is
+- 172 GB across 35 safetensors shards (vs ~600 GB BF16 source, MTP block included).
 - Same quantization scheme as `RedHatAI/DeepSeek-V4-Flash-NVFP4-FP8`: NVFP4 (group=16, FP8 e4m3 scales) on routed FFN experts, FP8_BLOCK 128×128 on attention.
 - MTP block (`mtp.0.*`, 799 tensors) kept at BF16 — not dropped at load time, not double-quantized when the MTP draft model is constructed.