OOM on 96GB H20 with FLUX.2-dev (BF16) - Is 96GB not enough?

#31
by jasonhuang21 - opened

First off, congrats on the FLUX.2 release. The performance and prompt adherence are truly next-level—it's a huge step forward for the community.

I'm currently trying to run the BF16 version on a single NVIDIA H20 (96GB VRAM) but met CUDA Out of Memory. Is 96GB VRAM officially insufficient for FLUX.2, or is there a specific peak memory spike during loading that I should optimize for? Any recommended settings to force it into a single 96GB card without OOM?
Here's how i load the FLUX.2-dev (BF16)

Thanks again for the incredible work.

Hey, I had a similar mistake, and I'm documenting it here. I hope it helps you: https://huggingface.co/black-forest-labs/FLUX.2-dev/discussions/35

I had this problem trying to use 2x 48gb cards with NVLink; the issue is often that the text encoder is a full LLM unto itself, which is blowing out the VRAM

I used the 60 GB flux2-dev.safetensors in Comfy UI with 96 GB RAM and a 5070 TI (16 GB of VRAM). I used the standard template "Flux.2 Dev Text to Image" and replaced the 30 GB FP8 version (flux2_dev_fp8mixed.safetensors) with the full 60 GB file. It works barely but it works. RAM and VRAM are almost full and it takes 2 - 4 minutes for a 1280 x 720 image. But the precise and mostly instantly successful results are worth it. The 30 GB FP8 file is faster and takes up much less RAM. Even that file delivers really awesome results.

Sign up or log in to comment