caizhi1 commited on
Commit
48b05ee
·
verified ·
1 Parent(s): ad45621

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -8
README.md CHANGED
@@ -26,17 +26,17 @@ We have conducted a comprehensive evaluation of Ling-2.6-flash across multiple a
26
 
27
  Beyond agent tasks, Ling-2.6-flash also delivers strong performance across **general knowledge**,**mathematical reasoning**, **instruction following**, and **long-context understanding**, remains well aligned with SOTA models in the same size class.
28
  <div align="center">
29
- <img src="https://mdn.alipayobjects.com/huamei_3p6pd0/afts/img/FOkPQZDdKtkAAAAAgCAAAAgADryCAQFr/original" width="8001" title="" crop="0,0,1,1" id="u4a7a4034" class="ne-image">
30
  </div>
31
 
32
  <div align="center">
33
- <img src="https://mdn.alipayobjects.com/huamei_3p6pd0/afts/img/_rp_TKkkG4wAAAAAgBAAAAgADryCAQFr/original" width="8001" title="" crop="0,0,1,1" id="uc95688f2" class="ne-image">
34
  </div>
35
 
36
  > + **<font style="color:rgb(38, 38, 38);">PinchBench</font>**<font style="color:rgb(38, 38, 38);">: Comparative scores are retrieved directly from the official PinchBench leaderboard (as of April 20, 2026), adhering to their evaluation modes (potentially Reasoning Mode). </font>
37
  > + **<font style="color:rgb(38, 38, 38);">Claw-Eval</font>**<font style="color:rgb(38, 38, 38);">: Comparative scores are sourced from the official Claw-Eval leaderboard (version dated 2026-03-25), adhering to their evaluation modes (potentially Reasoning Mode). Official scores for GPT-OSS-120B and GPT-5.4-mini are currently unavailable and have been omitted.</font>
38
  > + **<font style="color:rgb(38, 38, 38);">TAU2-Bench</font>**<font style="color:rgb(38, 38, 38);">: Evaluations are conducted using official v1.0.0 code and datasets. Following the GLM-5 evaluation protocol, we applied minor prompt adjustments in the Retail and Telecom domains to ensure users express requests clearly and to prevent premature session termination. Additionally, GPT-5.2 was utilized as the User Agent across all evaluated domains.</font>
39
- > + **<font style="color:rgb(38, 38, 38);">IFBench</font>**<font style="color:rgb(38, 38, 38);">: Scores for GPT-OSS-120B (low) and GPT-5.4-mini (Non-Reasoning) are sourced from the AA (Artificial Analysis) Leaderboard. All other model performance data are based on internal evaluation results.</font>
40
  >
41
 
42
  ### Architecture
@@ -79,9 +79,6 @@ Both BF16 and FP8 models are supported by SGLang now. It depends on the dtype of
79
  **Server**
80
 
81
  **1. Standard Inference (Without MTP)**
82
-
83
- For standard, auto-regressive generation, you can load and run the model using the default `transformers` pipeline.
84
-
85
  ```bash
86
  python -m sglang.launch_server \
87
  --model-path $MODEL_PATH \
@@ -95,10 +92,8 @@ python -m sglang.launch_server \
95
  ```
96
 
97
  **2. Inference with MTP (Multi-Token Prediction)**
98
- To significantly accelerate text generation, this model supports Multi-Token Prediction (MTP). You can enable it by passing the relevant flags during model initialization or generation.
99
 
100
  ```bash
101
- # mtp
102
  python -m sglang.launch_server \
103
  --model-path $MODEL_PATH \
104
  --tp-size 4 \
 
26
 
27
  Beyond agent tasks, Ling-2.6-flash also delivers strong performance across **general knowledge**,**mathematical reasoning**, **instruction following**, and **long-context understanding**, remains well aligned with SOTA models in the same size class.
28
  <div align="center">
29
+ <img src="https://mdn.alipayobjects.com/huamei_3p6pd0/afts/img/KhFxSrxyF5IAAAAAgCAAAAgADryCAQFr/original" width="8001" title="" crop="0,0,1,1" id="u4a7a4034" class="ne-image">
30
  </div>
31
 
32
  <div align="center">
33
+ <img src="https://mdn.alipayobjects.com/huamei_3p6pd0/afts/img/4bI1SK8pNM8AAAAAgBAAAAgADryCAQFr/original" width="8001" title="" crop="0,0,1,1" id="uc95688f2" class="ne-image">
34
  </div>
35
 
36
  > + **<font style="color:rgb(38, 38, 38);">PinchBench</font>**<font style="color:rgb(38, 38, 38);">: Comparative scores are retrieved directly from the official PinchBench leaderboard (as of April 20, 2026), adhering to their evaluation modes (potentially Reasoning Mode). </font>
37
  > + **<font style="color:rgb(38, 38, 38);">Claw-Eval</font>**<font style="color:rgb(38, 38, 38);">: Comparative scores are sourced from the official Claw-Eval leaderboard (version dated 2026-03-25), adhering to their evaluation modes (potentially Reasoning Mode). Official scores for GPT-OSS-120B and GPT-5.4-mini are currently unavailable and have been omitted.</font>
38
  > + **<font style="color:rgb(38, 38, 38);">TAU2-Bench</font>**<font style="color:rgb(38, 38, 38);">: Evaluations are conducted using official v1.0.0 code and datasets. Following the GLM-5 evaluation protocol, we applied minor prompt adjustments in the Retail and Telecom domains to ensure users express requests clearly and to prevent premature session termination. Additionally, GPT-5.2 was utilized as the User Agent across all evaluated domains.</font>
39
+ > + **<font style="color:rgb(38, 38, 38);">IFBench</font>**<font style="color:rgb(38, 38, 38);">: Scores for GPT-OSS-120B (low) and GPT-5.4-mini (Non-Reasoning) are sourced from the AA(Artificial Analysis) Leaderboard. All other model performance data are based on internal evaluation results.</font>
40
  >
41
 
42
  ### Architecture
 
79
  **Server**
80
 
81
  **1. Standard Inference (Without MTP)**
 
 
 
82
  ```bash
83
  python -m sglang.launch_server \
84
  --model-path $MODEL_PATH \
 
92
  ```
93
 
94
  **2. Inference with MTP (Multi-Token Prediction)**
 
95
 
96
  ```bash
 
97
  python -m sglang.launch_server \
98
  --model-path $MODEL_PATH \
99
  --tp-size 4 \