GFX1151
Dear expert, do you have any methods to enable GFX1151 to use all shared memory under Windows 11 using the ROCm backend? Currently, I'm trying to use Vulkan to load models with weights exceeding 100G and achieve GPU and VRAM speeds, but ROCm fails! I wonder if you have any relevant experience to share. Thank you!
Hey man, definitely not an expert haha. For context I'm running a GMKtec EVO-X2 (Strix Halo, 128 GB) on Fedora with ROCm 7.2 in a toolbox container. The way I get gfx1151 to use all of system RAM is mostly Linux-specific, but at least one piece is hardware-level and worth trying on Windows too:
- BIOS
UMA Frame Buffer Sizeset to the minimum,1GBon my GMKtec EVO-X2. This is firmware-level and applies on either OS. Counter-intuitive, but the trick is shrinking the BAR-mapped “dedicated VRAM” so almost everything goes through shared memory instead. Worth trying first since you can flip it independently. amdgpu.no_system_mem_limit=1kernel parameter, and/sys/module/amdgpu/parameters/no_system_mem_limitset toYat runtime. This one is Linux-specific.llama.cppbuilt from source with-DGGML_HIP_NO_VMM=ON.VMM=ONcrashes ongfx1151, see ROCm issue#6146. Build flag applies on either OS.
With those three together on my Linux box, a single hipMalloc of ~`120GB` succeeds and the model lives in system RAM addressable by the iGPU.
Whether the BIOS and VMM-off bits alone are enough on Windows ROCm I genuinely don’t know - let me know if you try it!
When I get a gap I’ll have another go at the IQ quants - my last batch failed miserably.
I have tried setting the minimum video memory to 0.5G on Windows 11, but this setting didn't yield any miraculous results. It even caused Vulkan to lose its ability to break through the 96G limit, and ROCm fared even worse, only recognizing about half of the video memory, which is frustrating! I also tried setting it to manual mode and maximizing the allocatable video memory to 124G, but there was still no breakthrough. Finally, when setting it to 96G, I decided to give it a try. Surprisingly, Vulkan loaded directly without displaying an OOM or any other error message, loading a 103G weight model directly into the video memory. This is a MiniMax-M2.7 model, and the peak speed during the first round of dialog can reach 30 tokens/s. So I am sure that some of the unified memory allocated to system memory was mistakenly treated as video memory, otherwise it wouldn't have achieved such a speed! If the expert has any new findings, please let me know. Of course, if I have any new discoveries under Windows, I would also be very happy to share them with you! Thank you for your reply! I would like to express my gratitude again。