Mac M3 Ultra 256GB

#16

by ragallo - opened Mar 5

Mar 5

With the new update, is the documentation accurate? Question at bottom
https://unsloth.ai/docs/models/qwen3.5#qwen3.5-397b-a17b

"
Qwen3.5-397B-A17B
Qwen3.5-397B-A17B is in the same performance tier as Gemini 3 Pro, Claude Opus 4.5, and GPT-5.2. The full 397B checkpoint is ~807GB on disk, but via Unsloth's 397B GGUFs you can run:
• 3-bit: fits on 192GB RAM systems (e.g., a 192GB Mac)
• 4-bit (MXFP4): fits on 256GB RAM. Unsloth 4-bit dynamic UD-Q4_K_XL is ~214GB on disk - loads directly on a 256GB M3 Ultra
• Runs on a single 24GB GPU + 256GB system RAM via MoE offloading, reaching 25+ tokens/s
• 8-bit needs ~512GB RAM/VRAM
"

Which is the ideal model to load into a M3 Ultra 256GB 60Core? Simply whichever fits?
Additionally, are there specific models that work best on this model Mac Studio or it doesn't matter too? i.e. use q8 vs q8_X_X for the other updated models?

Thank you again for the uploads and updated!

KeenZZZ

25 days ago

I recommend trying UD-Q3_K_XL. Regarding the storage, just using an SSD is fine.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment