Recommendations for running on Strix Halo.

by scottgl - opened Mar 7

Mar 7

•

Since this is such a large model, I was wondering if you have recommendations for arguments to use when running this model on Strix Halo?

Beinsezii

Owner Mar 8

•

edited Mar 8

The main thing is to leave your UMA on default and increase maximum TTM allocation https://strixhalo.wiki/AI/AI_Capabilities_Overview#memory-limits in my case I went with 112GiB // 4KiB == 29360128 as well as disabling iommu.

Then in llama.cpp you don't need much else besides a reasonable fit target

[qwen35-122b-instruct]
hf-repo = Beinsezii/Qwen3.5-122B-A10B-GGUF-HALO
fit-target = 12288
cache-ram = 4096
reasoning-budget = 0
no-context-shift = true
ubatch-size = 1024
batch-size = 1024
direct-io = true

mmap does bad with UMA so use directio instead. depending on application might be worth increasing checkpoint counts too.

twotall

25 days ago

•

edited 25 days ago

services:
  qwen35-122b:
    image: ghcr.io/ggml-org/llama.cpp:server-vulkan
    container_name: qwen35-122b
    ports:
      - "8081:8080"
    devices:
      - /dev/dri:/dev/dri    # For Vulkan/iGPU (Strix Halo)
      - /dev/kfd:/dev/kfd    # For ROCm/Compute
    volumes:
       - ./models/Beinsezii/Qwen3.5-122B-A10B-GGUF-HALO/qwen35-122b-a10b-q80-q6k_ffn.gguf:/model.gguf:ro
       - ./models/Beinsezii/Qwen3.5-122B-A10B-GGUF-HALO/mmproj-F16.gguf:/mmproj.gguf:ro
    environment:
      LLAMA_ARG_MODEL: /model.gguf
      LLAMA_ARG_MMPROJ: /mmproj.gguf
      LLAMA_MODEL_ALIAS: "qwen35-122b"
      LLAMA_ARG_CTX_SIZE: "262144"
      LLAMA_ARG_N_GPU_LAYERS: "99"
      LLAMA_ARG_FLASH_ATTN: "1"
      LLAMA_ARG_THREADS: "7"
      LLAMA_ARG_N_PARALLEL: "1"
      LLAMA_ARG_BATCH_SIZE: "2048"
      LLAMA_ARG_UBATCH_SIZE: "1024"
      LLAMA_ARG_PORT: "8080"
      LLAMA_ARG_HOST: "0.0.0.0"
      LLAMA_ARG_API: "1"
      LLAMA_ARG_ENDPOINT_METRICS: "1"

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash iommu=pt amdgpu.gttsize=126976 ttm.pages_limit=32505856"

thats my docker-compose.yml and grub settings for my strix halo and your build of this model is superior to all the others I tried including unsloth Q6_K and I end up with superior agentic performance and results with this build both rocm and vulkan.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment