Deploying with sglang, weight name not matching

#5
by Kyoma001 - opened

sglang version: 0.5.10.post1

python -m sglang.launch_server \
  --model-path Qwen/Qwen3.6-27B-FP8 \
  --host 0.0.0.0 \
  --port 5000 \
  --mem-fraction-static 0.75 \
  --reasoning-parser qwen3 \
  --tool-call-parser qwen3_coder \
  --served-model-name Qwen3.6-27B-FP8 \
  --max-running-requests 4 \
  --speculative-algo NEXTN \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 4 \
  --mamba-scheduler-strategy extra_buffer
[2026-04-22 11:36:45] Found local HF snapshot for Qwen/Qwen3.6-27B-FP8 at /home/yiding/.cache/huggingface/hub/models--Qwen--Qwen3.6-27B-FP8/snapshots/ec4160bf26124fa57e6451d070ee0c459a36d5b7; skipping download.
Multi-thread loading shards:   0% Completed | 0/66 [00:00<?, ?it/s][2026-04-22 11:36:46] Parameter model.layers.45.mlp.gate_gate_up_proj.weight_scale_inv not found in params_dict
[2026-04-22 11:36:46] Parameter model.layers.45.mlp.gate_up_proj.weight_scale_inv not found in params_dict
Multi-thread loading shards:   2% Completed | 1/66 [00:01<01:07,  1.04s/it][2026-04-22 11:36:47] Parameter model.layers.4.mlp.gate_gate_up_proj.weight_scale_inv not found in params_dict
[2026-04-22 11:36:47] Parameter model.layers.4.mlp.gate_up_proj.weight_scale_inv not found in params_dict
Multi-thread loading shards:   5% Completed | 3/66 [00:02<00:41,  1.52it/s][2026-04-22 11:36:48] Parameter model.layers.33.mlp.gate_gate_up_proj.weight_scale_inv not found in params_dict
...

The response exception keeps repeating "ability".

Has anyone deployed properly with sglang?

apply this commit https://github.com/sgl-project/sglang/commit/4323fce82a091fab154bf36baa5820659ec0fd16

apply this commit https://github.com/sgl-project/sglang/commit/4323fce82a091fab154bf36baa5820659ec0fd16

That's work! thanks!

Kyoma001 changed discussion status to closed
Kyoma001 changed discussion status to open

Sign up or log in to comment