Update README.md
Browse files
README.md
CHANGED
|
@@ -26,17 +26,17 @@ We have conducted a comprehensive evaluation of Ling-2.6-flash across multiple a
|
|
| 26 |
|
| 27 |
Beyond agent tasks, Ling-2.6-flash also delivers strong performance across **general knowledge**,**mathematical reasoning**, **instruction following**, and **long-context understanding**, remains well aligned with SOTA models in the same size class.
|
| 28 |
<div align="center">
|
| 29 |
-
<img src="https://mdn.alipayobjects.com/huamei_3p6pd0/afts/img/
|
| 30 |
</div>
|
| 31 |
|
| 32 |
<div align="center">
|
| 33 |
-
<img src="https://mdn.alipayobjects.com/huamei_3p6pd0/afts/img/
|
| 34 |
</div>
|
| 35 |
|
| 36 |
> + **<font style="color:rgb(38, 38, 38);">PinchBench</font>**<font style="color:rgb(38, 38, 38);">: Comparative scores are retrieved directly from the official PinchBench leaderboard (as of April 20, 2026), adhering to their evaluation modes (potentially Reasoning Mode). </font>
|
| 37 |
> + **<font style="color:rgb(38, 38, 38);">Claw-Eval</font>**<font style="color:rgb(38, 38, 38);">: Comparative scores are sourced from the official Claw-Eval leaderboard (version dated 2026-03-25), adhering to their evaluation modes (potentially Reasoning Mode). Official scores for GPT-OSS-120B and GPT-5.4-mini are currently unavailable and have been omitted.</font>
|
| 38 |
> + **<font style="color:rgb(38, 38, 38);">TAU2-Bench</font>**<font style="color:rgb(38, 38, 38);">: Evaluations are conducted using official v1.0.0 code and datasets. Following the GLM-5 evaluation protocol, we applied minor prompt adjustments in the Retail and Telecom domains to ensure users express requests clearly and to prevent premature session termination. Additionally, GPT-5.2 was utilized as the User Agent across all evaluated domains.</font>
|
| 39 |
-
> + **<font style="color:rgb(38, 38, 38);">IFBench</font>**<font style="color:rgb(38, 38, 38);">: Scores for GPT-OSS-120B (low) and GPT-5.4-mini (Non-Reasoning) are sourced from the AA
|
| 40 |
>
|
| 41 |
|
| 42 |
### Architecture
|
|
@@ -79,9 +79,6 @@ Both BF16 and FP8 models are supported by SGLang now. It depends on the dtype of
|
|
| 79 |
**Server**
|
| 80 |
|
| 81 |
**1. Standard Inference (Without MTP)**
|
| 82 |
-
|
| 83 |
-
For standard, auto-regressive generation, you can load and run the model using the default `transformers` pipeline.
|
| 84 |
-
|
| 85 |
```bash
|
| 86 |
python -m sglang.launch_server \
|
| 87 |
--model-path $MODEL_PATH \
|
|
@@ -95,10 +92,8 @@ python -m sglang.launch_server \
|
|
| 95 |
```
|
| 96 |
|
| 97 |
**2. Inference with MTP (Multi-Token Prediction)**
|
| 98 |
-
To significantly accelerate text generation, this model supports Multi-Token Prediction (MTP). You can enable it by passing the relevant flags during model initialization or generation.
|
| 99 |
|
| 100 |
```bash
|
| 101 |
-
# mtp
|
| 102 |
python -m sglang.launch_server \
|
| 103 |
--model-path $MODEL_PATH \
|
| 104 |
--tp-size 4 \
|
|
|
|
| 26 |
|
| 27 |
Beyond agent tasks, Ling-2.6-flash also delivers strong performance across **general knowledge**,**mathematical reasoning**, **instruction following**, and **long-context understanding**, remains well aligned with SOTA models in the same size class.
|
| 28 |
<div align="center">
|
| 29 |
+
<img src="https://mdn.alipayobjects.com/huamei_3p6pd0/afts/img/KhFxSrxyF5IAAAAAgCAAAAgADryCAQFr/original" width="8001" title="" crop="0,0,1,1" id="u4a7a4034" class="ne-image">
|
| 30 |
</div>
|
| 31 |
|
| 32 |
<div align="center">
|
| 33 |
+
<img src="https://mdn.alipayobjects.com/huamei_3p6pd0/afts/img/4bI1SK8pNM8AAAAAgBAAAAgADryCAQFr/original" width="8001" title="" crop="0,0,1,1" id="uc95688f2" class="ne-image">
|
| 34 |
</div>
|
| 35 |
|
| 36 |
> + **<font style="color:rgb(38, 38, 38);">PinchBench</font>**<font style="color:rgb(38, 38, 38);">: Comparative scores are retrieved directly from the official PinchBench leaderboard (as of April 20, 2026), adhering to their evaluation modes (potentially Reasoning Mode). </font>
|
| 37 |
> + **<font style="color:rgb(38, 38, 38);">Claw-Eval</font>**<font style="color:rgb(38, 38, 38);">: Comparative scores are sourced from the official Claw-Eval leaderboard (version dated 2026-03-25), adhering to their evaluation modes (potentially Reasoning Mode). Official scores for GPT-OSS-120B and GPT-5.4-mini are currently unavailable and have been omitted.</font>
|
| 38 |
> + **<font style="color:rgb(38, 38, 38);">TAU2-Bench</font>**<font style="color:rgb(38, 38, 38);">: Evaluations are conducted using official v1.0.0 code and datasets. Following the GLM-5 evaluation protocol, we applied minor prompt adjustments in the Retail and Telecom domains to ensure users express requests clearly and to prevent premature session termination. Additionally, GPT-5.2 was utilized as the User Agent across all evaluated domains.</font>
|
| 39 |
+
> + **<font style="color:rgb(38, 38, 38);">IFBench</font>**<font style="color:rgb(38, 38, 38);">: Scores for GPT-OSS-120B (low) and GPT-5.4-mini (Non-Reasoning) are sourced from the AA(Artificial Analysis) Leaderboard. All other model performance data are based on internal evaluation results.</font>
|
| 40 |
>
|
| 41 |
|
| 42 |
### Architecture
|
|
|
|
| 79 |
**Server**
|
| 80 |
|
| 81 |
**1. Standard Inference (Without MTP)**
|
|
|
|
|
|
|
|
|
|
| 82 |
```bash
|
| 83 |
python -m sglang.launch_server \
|
| 84 |
--model-path $MODEL_PATH \
|
|
|
|
| 92 |
```
|
| 93 |
|
| 94 |
**2. Inference with MTP (Multi-Token Prediction)**
|
|
|
|
| 95 |
|
| 96 |
```bash
|
|
|
|
| 97 |
python -m sglang.launch_server \
|
| 98 |
--model-path $MODEL_PATH \
|
| 99 |
--tp-size 4 \
|