Model deployment failed in sglang
Has anyone successfully deployed this on an NVIDIA B200 GPU? I encountered a problem when trying to deploy it, and sglang failed to load the model weight parameters.
The SGLang error message is as follows:
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 492, in _get_weights_iterator
hf_folder, hf_weights_files, use_safetensors = self._prepare_weights(
^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 477, in _prepare_weights
raise RuntimeError(
RuntimeError: Cannot find any model weights with `/model`
[2026-04-29 04:13:15] Received sigquit from a child process. It usually means the child failed.
The Docker Compose container orchestration configuration is as follows:
version: "3.9"
services:
mimo-sglang:
image: lmsysorg/sglang:dev-cu13-mimo-v2.5
container_name: mimo-25-sglang
runtime: nvidia
ipc: host
privileged: true
ports:
- "30000:30000"
volumes:
- /data/MiMo-V2.5/MiMo:/model
environment:
- SGLANG_ENABLE_SPEC_V2=1
- CUDA_VISIBLE_DEVICES=0,1,2,3
command:
- python3
- -m
- sglang.launch_server
- --trust-remote-code
- --model-path
- /model
- --tp
- "4"
- --attention-backend
- torch_native
- --mm-attention-backend
- "sdpa"
- --mem-fraction-static
- "0.65"
- --chunked-prefill-size
- "16384"
- --reasoning-parser
- mimo
- --tool-call-parser
- mimo
- --api-key
- xxxxxxxxxxxxxxx
- --host
- "0.0.0.0"
- --port
- "30000"
- --served-model-name
- MiMo-V2.5
My personal suspicion is that:
The weight files in the model directory released by Xiaomi do not match the weight parameter file pointed to by weight.map in the model.safetensors.index.json file. The SGLang inference engine cannot find a usable standard weight parameter file based on the model.safetensors.index.json file.
The model.safetensors.index.json file points to standard HF shards (model-000xx-of-00064), for example: model-00001-of-00064.safetensors.
However, what is actually released is the training shard weight file (model_pp*_shard*), for example: model_pp0_ep*_shard*.safetensors.
The weight files released by GLM 5.1, Qwen, and DeepSeek are standard HF model weight parameter files, similar to model-00001-of-00064.safetensors. The model.safetensors.index.json file also points to a standard model weight parameter file.
the index json was updated half a day after the weights were uploaded. still doesn't run for me though...
I will update the model (model.safetensors.index.json) around noon today and try to deploy the model again. What problem did you encounter, and what error message did you get?
