Model deployment failed in sglang

by jiangyiyang990322 - opened 7 days ago

Has anyone successfully deployed this on an NVIDIA B200 GPU? I encountered a problem when trying to deploy it, and sglang failed to load the model weight parameters.

The SGLang error message is as follows:

  File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 492, in _get_weights_iterator
    hf_folder, hf_weights_files, use_safetensors = self._prepare_weights(
                                                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 477, in _prepare_weights
    raise RuntimeError(
RuntimeError: Cannot find any model weights with `/model`

[2026-04-29 04:13:15] Received sigquit from a child process. It usually means the child failed.

The Docker Compose container orchestration configuration is as follows:

version: "3.9"

services:
  mimo-sglang:
    image: lmsysorg/sglang:dev-cu13-mimo-v2.5
    container_name: mimo-25-sglang
    runtime: nvidia
    ipc: host
    privileged: true
    ports:
      - "30000:30000"
    volumes:
      - /data/MiMo-V2.5/MiMo:/model
    environment:
      - SGLANG_ENABLE_SPEC_V2=1
      - CUDA_VISIBLE_DEVICES=0,1,2,3
    command:
      - python3
      - -m
      - sglang.launch_server
      - --trust-remote-code
      - --model-path
      - /model
      - --tp
      - "4"
      - --attention-backend
      - torch_native
      - --mm-attention-backend
      - "sdpa"
      - --mem-fraction-static
      - "0.65"
      - --chunked-prefill-size
      - "16384"
      - --reasoning-parser
      - mimo
      - --tool-call-parser
      - mimo
      - --api-key
      - xxxxxxxxxxxxxxx
      - --host
      - "0.0.0.0"
      - --port
      - "30000"
      - --served-model-name
      - MiMo-V2.5

My personal suspicion is that:

The weight files in the model directory released by Xiaomi do not match the weight parameter file pointed to by weight.map in the model.safetensors.index.json file. The SGLang inference engine cannot find a usable standard weight parameter file based on the model.safetensors.index.json file.

The model.safetensors.index.json file points to standard HF shards (model-000xx-of-00064), for example: model-00001-of-00064.safetensors.

However, what is actually released is the training shard weight file (model_pp*_shard*), for example: model_pp0_ep*_shard*.safetensors.

The weight files released by GLM 5.1, Qwen, and DeepSeek are standard HF model weight parameter files, similar to model-00001-of-00064.safetensors. The model.safetensors.index.json file also points to a standard model weight parameter file.

pfn0

5 days ago

the index json was updated half a day after the weights were uploaded. still doesn't run for me though...

jiangyiyang990322

1 day ago

•

edited 1 day ago

I will update the model (model.safetensors.index.json) around noon today and try to deploy the model again. What problem did you encounter, and what error message did you get?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment