Deploying with sglang, weight name not matching
#5
by Kyoma001 - opened
sglang version: 0.5.10.post1
python -m sglang.launch_server \
--model-path Qwen/Qwen3.6-27B-FP8 \
--host 0.0.0.0 \
--port 5000 \
--mem-fraction-static 0.75 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--served-model-name Qwen3.6-27B-FP8 \
--max-running-requests 4 \
--speculative-algo NEXTN \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
--mamba-scheduler-strategy extra_buffer
[2026-04-22 11:36:45] Found local HF snapshot for Qwen/Qwen3.6-27B-FP8 at /home/yiding/.cache/huggingface/hub/models--Qwen--Qwen3.6-27B-FP8/snapshots/ec4160bf26124fa57e6451d070ee0c459a36d5b7; skipping download.
Multi-thread loading shards: 0% Completed | 0/66 [00:00<?, ?it/s][2026-04-22 11:36:46] Parameter model.layers.45.mlp.gate_gate_up_proj.weight_scale_inv not found in params_dict
[2026-04-22 11:36:46] Parameter model.layers.45.mlp.gate_up_proj.weight_scale_inv not found in params_dict
Multi-thread loading shards: 2% Completed | 1/66 [00:01<01:07, 1.04s/it][2026-04-22 11:36:47] Parameter model.layers.4.mlp.gate_gate_up_proj.weight_scale_inv not found in params_dict
[2026-04-22 11:36:47] Parameter model.layers.4.mlp.gate_up_proj.weight_scale_inv not found in params_dict
Multi-thread loading shards: 5% Completed | 3/66 [00:02<00:41, 1.52it/s][2026-04-22 11:36:48] Parameter model.layers.33.mlp.gate_gate_up_proj.weight_scale_inv not found in params_dict
...
The response exception keeps repeating "ability".
Has anyone deployed properly with sglang?
apply this commit https://github.com/sgl-project/sglang/commit/4323fce82a091fab154bf36baa5820659ec0fd16
apply this commit
https://github.com/sgl-project/sglang/commit/4323fce82a091fab154bf36baa5820659ec0fd16
That's work! thanks!
Kyoma001 changed discussion status to closed
Kyoma001 changed discussion status to open