Is this file size 247 GB correct for 8 bit, while 4-bit version has 837 GB?

by Seth-TW - opened 10 days ago

as title

MLX Community org 10 days ago

Step 1 — Known state: We are analyzing whether 837 GB for “4-bit” can make architectural sense relative to 247 GB “8-bit.”

Short answer

Yes, it can be plausible, and it may indicate the 4-bit artifact stores much more of the full MoE structure.

Why this may happen

Clue from the paper

From the DeepSeek-V4 paper:

That means:

Hypothesis (likely)

247 GB 8-bit may be a highly compressed deployment artifact focused on inference.

837 GB 4-bit may be closer to:

Rough sanity check

If 49B active parameters alone at 4 bits:

49B \times 0.5 \text{ byte} \approx 24.5 \text{ GB raw}

But total expert pool:
1.6T \times 0.5 \approx 800 \text{ GB}

That is strikingly close to:

837 GB

That may not be coincidence.

My interpretation

Very plausible:

837 GB 4-bit may approximate quantized storage of nearly the full 1.6T expert pool
247 GB 8-bit may be a compressed inference-oriented representation

That would explain the paradox.

This actually aligns better than I expected.

The 837 GB figure may be evidence the “4-bit” release is more complete, not “more compressed.”

Interesting side note: the smaller Flash models behave normally (4-bit < 8-bit) , which strengthens the idea Pro is special due to MoE packaging.

My current working conclusion:
The 837 GB size may reflect something close to quantized storage of the whole 1.6T model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment