inclusionAI
/

Ling-2.6-flash

@@ -2,6 +2,7 @@
 license: mit
 language:
 - en
 ---
 ## Ling-2.6-flash: Faster Responses, Stronger Execution, Higher Token Efficiency
 ### Introduction
@@ -16,7 +17,7 @@ At a high level, Ling-2.6-flash is built around three core strengths:
 + **Hybrid linear architecture for higher inference efficiency.**
 By introducing a hybrid linear architecture, we improve computational efficiency at the foundation level. On a 4× H20 setup, Ling-2.6-flash reaches inference speeds of up to **340 tokens/s**. In other words, it completes tasks with significantly better cost-performance efficiency.
 + **Token-efficiency optimization for a better intelligence-efficiency tradeoff.**
-During training, we specifically optimized for token efficiency, with the goal of accomplishing tasks using more concise outputs. On the full **Artificial Analysis** evaluation suite, Ling-2.6-flash uses only **15M tokens**while still delivering competitive performance. This translates into a meaningfully stronger intelligence-efficiency profile.
 + **Targeted improvements for agent scenarios.**
 For the agent use cases seeing the strongest demand today, we continuously refined Ling-2.6-flash in tool use, multi-step planning, and task execution. As a result, the model achieves performance that is competitive with, and in some cases reaches **SOTA level** against, models with larger active parameter counts on benchmarks including **BFCL-V4, TAU2-bench, SWE-bench Verified, Claw-Eval, and PinchBench**.
@@ -177,4 +178,4 @@ Ling-2.6-flash has already made meaningful progress in our pursuit of an extreme
 At the same time, we are fully aware that pushing intelligence efficiency to the limit comes with tradeoffs. In some highly complex scenarios, the model can still exhibit **tool hallucinations** due to limited reasoning depth. In addition, there is still room for improvement in areas such as **natural bilingual switching between Chinese and English** and **compliance with highly complex instructions**.
-Looking ahead, we will continue exploring the frontier of intelligence efficiency. While preserving the model’s high-efficiency inference characteristics, we aim to further improve the balance between **output quality** and **token efficiency**, and to continuously strengthen the model’s **stability, usability, and interaction experience across a wider range of real-world scenarios**.

 license: mit
 language:
 - en
+pipeline_tag: text-generation
 ---
 ## Ling-2.6-flash: Faster Responses, Stronger Execution, Higher Token Efficiency
 ### Introduction
 + **Hybrid linear architecture for higher inference efficiency.**
 By introducing a hybrid linear architecture, we improve computational efficiency at the foundation level. On a 4× H20 setup, Ling-2.6-flash reaches inference speeds of up to **340 tokens/s**. In other words, it completes tasks with significantly better cost-performance efficiency.
 + **Token-efficiency optimization for a better intelligence-efficiency tradeoff.**
+During training, we specifically optimized for token efficiency, with the goal of accomplishing tasks using more concise outputs. On the full **Artificial Analysis** evaluation suite, Ling-2.6-flash uses only **15M tokens** while still delivering competitive performance. This translates into a meaningfully stronger intelligence-efficiency profile.
 + **Targeted improvements for agent scenarios.**
 For the agent use cases seeing the strongest demand today, we continuously refined Ling-2.6-flash in tool use, multi-step planning, and task execution. As a result, the model achieves performance that is competitive with, and in some cases reaches **SOTA level** against, models with larger active parameter counts on benchmarks including **BFCL-V4, TAU2-bench, SWE-bench Verified, Claw-Eval, and PinchBench**.
 At the same time, we are fully aware that pushing intelligence efficiency to the limit comes with tradeoffs. In some highly complex scenarios, the model can still exhibit **tool hallucinations** due to limited reasoning depth. In addition, there is still room for improvement in areas such as **natural bilingual switching between Chinese and English** and **compliance with highly complex instructions**.
+Looking ahead, we will continue exploring the frontier of intelligence efficiency. While preserving the model’s high-efficiency inference characteristics, we aim to further improve the balance between **output quality** and **token efficiency**, and to continuously strengthen the model’s **stability, usability, and interaction experience across a wider range of real-world scenarios**.