Missing space (typo) and missing "Text Generation" tag!
Browse files
README.md
CHANGED
|
@@ -2,6 +2,7 @@
|
|
| 2 |
license: mit
|
| 3 |
language:
|
| 4 |
- en
|
|
|
|
| 5 |
---
|
| 6 |
## Ling-2.6-flash: Faster Responses, Stronger Execution, Higher Token Efficiency
|
| 7 |
### Introduction
|
|
@@ -16,7 +17,7 @@ At a high level, Ling-2.6-flash is built around three core strengths:
|
|
| 16 |
+ **Hybrid linear architecture for higher inference efficiency.**
|
| 17 |
By introducing a hybrid linear architecture, we improve computational efficiency at the foundation level. On a 4× H20 setup, Ling-2.6-flash reaches inference speeds of up to **340 tokens/s**. In other words, it completes tasks with significantly better cost-performance efficiency.
|
| 18 |
+ **Token-efficiency optimization for a better intelligence-efficiency tradeoff.**
|
| 19 |
-
During training, we specifically optimized for token efficiency, with the goal of accomplishing tasks using more concise outputs. On the full **Artificial Analysis** evaluation suite, Ling-2.6-flash uses only **15M tokens**while still delivering competitive performance. This translates into a meaningfully stronger intelligence-efficiency profile.
|
| 20 |
+ **Targeted improvements for agent scenarios.**
|
| 21 |
For the agent use cases seeing the strongest demand today, we continuously refined Ling-2.6-flash in tool use, multi-step planning, and task execution. As a result, the model achieves performance that is competitive with, and in some cases reaches **SOTA level** against, models with larger active parameter counts on benchmarks including **BFCL-V4, TAU2-bench, SWE-bench Verified, Claw-Eval, and PinchBench**.
|
| 22 |
|
|
@@ -177,4 +178,4 @@ Ling-2.6-flash has already made meaningful progress in our pursuit of an extreme
|
|
| 177 |
|
| 178 |
At the same time, we are fully aware that pushing intelligence efficiency to the limit comes with tradeoffs. In some highly complex scenarios, the model can still exhibit **tool hallucinations** due to limited reasoning depth. In addition, there is still room for improvement in areas such as **natural bilingual switching between Chinese and English** and **compliance with highly complex instructions**.
|
| 179 |
|
| 180 |
-
Looking ahead, we will continue exploring the frontier of intelligence efficiency. While preserving the model’s high-efficiency inference characteristics, we aim to further improve the balance between **output quality** and **token efficiency**, and to continuously strengthen the model’s **stability, usability, and interaction experience across a wider range of real-world scenarios**.
|
|
|
|
| 2 |
license: mit
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
+
pipeline_tag: text-generation
|
| 6 |
---
|
| 7 |
## Ling-2.6-flash: Faster Responses, Stronger Execution, Higher Token Efficiency
|
| 8 |
### Introduction
|
|
|
|
| 17 |
+ **Hybrid linear architecture for higher inference efficiency.**
|
| 18 |
By introducing a hybrid linear architecture, we improve computational efficiency at the foundation level. On a 4× H20 setup, Ling-2.6-flash reaches inference speeds of up to **340 tokens/s**. In other words, it completes tasks with significantly better cost-performance efficiency.
|
| 19 |
+ **Token-efficiency optimization for a better intelligence-efficiency tradeoff.**
|
| 20 |
+
During training, we specifically optimized for token efficiency, with the goal of accomplishing tasks using more concise outputs. On the full **Artificial Analysis** evaluation suite, Ling-2.6-flash uses only **15M tokens** while still delivering competitive performance. This translates into a meaningfully stronger intelligence-efficiency profile.
|
| 21 |
+ **Targeted improvements for agent scenarios.**
|
| 22 |
For the agent use cases seeing the strongest demand today, we continuously refined Ling-2.6-flash in tool use, multi-step planning, and task execution. As a result, the model achieves performance that is competitive with, and in some cases reaches **SOTA level** against, models with larger active parameter counts on benchmarks including **BFCL-V4, TAU2-bench, SWE-bench Verified, Claw-Eval, and PinchBench**.
|
| 23 |
|
|
|
|
| 178 |
|
| 179 |
At the same time, we are fully aware that pushing intelligence efficiency to the limit comes with tradeoffs. In some highly complex scenarios, the model can still exhibit **tool hallucinations** due to limited reasoning depth. In addition, there is still room for improvement in areas such as **natural bilingual switching between Chinese and English** and **compliance with highly complex instructions**.
|
| 180 |
|
| 181 |
+
Looking ahead, we will continue exploring the frontier of intelligence efficiency. While preserving the model’s high-efficiency inference characteristics, we aim to further improve the balance between **output quality** and **token efficiency**, and to continuously strengthen the model’s **stability, usability, and interaction experience across a wider range of real-world scenarios**.
|