RichardBian FCL90 commited on
Commit
948ecac
·
1 Parent(s): 2a84598

Create README.md (#3)

Browse files

- Create README.md (36900ba18a28a5078719bf6b035b5a71fd4f153f)


Co-authored-by: FCL <FCL90@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +204 -0
README.md ADDED
@@ -0,0 +1,204 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
+ ---
6
+ <p align="center">
7
+ <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*4QxcQrBlTiAAAAAAQXAAAAgAemJ7AQ/original" width="100"/>
8
+ </p>
9
+ <p align="center">🤗 <a href="https://huggingface.co/inclusionAI">Hugging Face</a>&nbsp;&nbsp; | &nbsp;&nbsp;🤖 <a href="https://modelscope.cn/organization/inclusionAI">ModelScope </a>&nbsp;&nbsp; | &nbsp;&nbsp;🐙 Experience Link Coming Soon~</p>
10
+ <!-- <a href="https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI">Experience Now</a> -->
11
+
12
+ ## Ling-2.6-1T: A Trillion-Parameter Comprehensive Flagship Model for Complex Tasks
13
+
14
+ Today, we are thrilled to open-source **Ling–2.6–1T** from the Ling family.
15
+
16
+ Tailored for real–world, complex scenarios, this trillion–parameter model introduces targeted optimizations across inference efficiency, token overhead, and agentic capabilities, making it highly effective for **coding and daily workflows**.
17
+
18
+ Key upgrades in **Ling–2.6–1T** include:
19
+
20
+ * **High Inference Efficiency:** By adopting a hybrid architecture combining **MLA and Linear Attention**, we dramatically reduce latency and VRAM footprint for long contexts. It delivers superior throughput and lower per–token computational costs without sacrificing expressivity, ensuring real–time responsiveness for complex reasoning and tool calling.
21
+ * **Lower Token Overhead via "Fast Thinking":** We introduce a *Contextual Process Redundancy Suppression* reward strategy during post–training. This reduces reliance on verbose chains–of–thought (CoT), utilizing a "fast thinking" mechanism to reach answers directly and compress output costs while maintaining top–tier intelligence.
22
+ * **Reliable Multi–Step Execution:** With enhanced reasoning, agentic coding, and instruction following, Ling–2.6–1T achieves **open–source SOTA** on execution–heavy benchmarks, including AIME26, SWE–bench Verified, BFCL–V4, TAU2–Bench, and IFBench.
23
+ * **Production–Ready for Agent Workflows:** Designed for end–to–end engineering—from code generation to bug fixing—Ling–2.6–1T integrates seamlessly with mainstream agent frameworks like *Claude Code, OpenClaw, OpenCode, and CodeBuddy*, effortlessly handling multi–tool, multi–step constraints in enterprise environments.
24
+
25
+
26
+ ### **Unlocking Robust Intelligence with Superior Efficiency**
27
+ On [Artificial Analysis](https://artificialanalysis.ai/), **Ling-2.6-1T** achieved an **Intelligence Index of 34** with approximately 16M output tokens, representing a significant generational leap over the previous Ling-1T. This positioning underscores its ability to deliver high-tier intelligence with optimized token consumption.
28
+
29
+ <p align="center">
30
+ <img src="https://mdn.alipayobjects.com/huamei_fst7or/afts/img/48cCTY8XJgUAAAAAZvAAAAgADpRXAQJr/original" />
31
+ </p>
32
+
33
+
34
+ <p align="center">
35
+ <img src="https://mdn.alipayobjects.com/huamei_fst7or/afts/img/AmTNT5tQHDYAAAAAaSAAAAgADpRXAQJr/original " width="48%"/>
36
+ <img src="https://mdn.alipayobjects.com/huamei_fst7or/afts/img/Wv_8Toxbl7IAAAAAaRAAAAgADpRXAQJr/original" width="48%"/>
37
+ </p>
38
+
39
+
40
+ ### **Enhancing Execution Stability for Complex Multi-Step Tasks**
41
+
42
+ Ling-2.6-1T demonstrates balanced excellence across reasoning, coding, and tool-calling, achieving **open-source SOTA** status on multiple execution-heavy benchmarks:
43
+
44
+ * **Advanced Reasoning:** Significantly leads non-thinking models on *AIME26*, showcasing superior complex problem-solving capabilities.
45
+ * **First-Tier Agent Execution:** Ranks among the top models on *SWE-bench Verified, TAU2-Bench, Claw-Eval, BFCL-V4, and PinchBench*, proving high reliability in real-world workflows.
46
+ * **Context & Constraints:** Strong performance on *MRCR (16K–256K)* and *IFBench* ensures logical consistency and precision under complex instructions and long contexts.
47
+
48
+ <p align="center">
49
+ <img src="https://mdn.alipayobjects.com/huamei_fst7or/afts/img/Ykl9QZamkj0AAAAAgBAAAAgADpRXAQJr/original" />
50
+ </p>
51
+
52
+
53
+ ## Model Downloads
54
+
55
+ You can download Ling-2.6-1T from the following table. If you are located in mainland China, we also provide the model on ModelScope.cn to speed up the download process.
56
+
57
+ <center>
58
+
59
+ | **Model** | **Context Length** | **Download** |
60
+ | :---------: | :----------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------: |
61
+ | Ling-2.6-1T | 256K -> 1M (YaRN) | [🤗 HuggingFace]() &nbsp;&nbsp; [🤖 ModelScope]() |
62
+ </center>
63
+
64
+ Note: If you are interested in the previous version, please visit the past model collections on [Huggingface](https://huggingface.co/inclusionAI) or [ModelScope](https://modelscope.cn/organization/inclusionAI).
65
+
66
+ ## Quickstart
67
+
68
+ ### 🚀 Try Online
69
+
70
+ Coming Soon
71
+
72
+ ### 🔌 API Usage
73
+
74
+ https://openrouter.ai/inclusionai/ling-2.6-1t:free
75
+
76
+
77
+ ## Deployment
78
+
79
+
80
+ ### SGLang
81
+
82
+ #### Environment Preparation
83
+
84
+ ```shell
85
+ pip install uv
86
+
87
+ uv venv ~/my_ling_env
88
+
89
+ source ~/my_ling_env/bin/activate
90
+
91
+ # uv pip "sglang-kernel>=0.4.1"
92
+ uv pip install "sglang[all]>=0.5.10.post1" --prerelease=allow
93
+ ```
94
+
95
+ #### Run Inference
96
+
97
+ Here is the example to run Ling-1T with 8 GPUs, where the server port is ${PORT}:
98
+
99
+ **Server**
100
+
101
+ **1. Standard Inference (Without MTP)**
102
+ ```bash
103
+ sglang serve \
104
+ --model-path inclusionAI/Ling-2.6-1T \
105
+ --tp-size 8 \
106
+ --max-running-requests 32 \
107
+ --mem-fraction-static 0.92 \
108
+ --chunked-prefill-size 8192 \
109
+ --context-length 262144 \
110
+ --trust-remote-code \
111
+ --model-loader-extra-config '{"enable_multithread_load":"true","num_threads":64}' \
112
+ --tool-call-parser qwen25
113
+ ```
114
+
115
+ **2. Inference with MTP (Multi-Token Prediction)**
116
+ _The current official SGLang implementation of MTP contains a bug. For better inference performance, we recommend installing our patched version. Our fix is currently under review and is expected to be merged into the official SGLang library shortly._
117
+
118
+ **Install our SGLang**
119
+ ```bash
120
+ git clone -b ling_2_6 git@github.com:antgroup/sglang.git
121
+ cd sglang
122
+
123
+ pip install --upgrade pip
124
+ pip install -e "python"
125
+ ```
126
+ Start server
127
+ ```bash
128
+ sglang serve \
129
+ --model-path inclusionAI/Ling-2.6-1T \
130
+ --tp-size 8 \
131
+ --max-running-requests 32 \
132
+ --mem-fraction-static 0.92 \
133
+ --chunked-prefill-size 8192 \
134
+ --context-length 262144 \
135
+ --trust-remote-code \
136
+ --speculative-algorithm EAGLE \
137
+ --speculative-num-steps 3 \
138
+ --speculative-eagle-topk 1 \
139
+ --speculative-num-draft-tokens 4 \
140
+ --mamba-scheduler-strategy extra_buffer \
141
+ --mamba-full-memory-ratio 1.4 \
142
+ --model-loader-extra-config '{"enable_multithread_load":"true","num_threads":64}' \
143
+ --tool-call-parser qwen25
144
+ ```
145
+
146
+ **Client**
147
+
148
+ ```bash
149
+ curl -s http://${MASTER_IP}:${PORT}/v1/chat/completions \
150
+ -H "Content-Type: application/json" \
151
+ -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
152
+ ```
153
+
154
+ More usage can be found [here](https://docs.sglang.io/cookbook/autoregressive/InclusionAI/Ling-2.6#3-2-ling-2-6-1t)
155
+
156
+ #### vLLM
157
+ ##### Environment Preparation
158
+ ```bash
159
+ pip install uv
160
+
161
+ uv venv ~/my_ling_env
162
+
163
+ source ~/my_ling_env/bin/activate
164
+
165
+ git clone https://github.com/vllm-project/vllm.git
166
+
167
+ cd vllm
168
+
169
+ VLLM_USE_PRECOMPILED=1 uv pip install --editable . --torch-backend=auto
170
+ ```
171
+
172
+ #### Run inference
173
+
174
+ **Server**
175
+ ```bash
176
+ vllm serve $MODEL_PATH \
177
+ --port $PORT \
178
+ --served-model-name my_model \
179
+ --trust-remote-code --tensor-parallel-size 8 \
180
+ --gpu-memory-utilization 0.85
181
+ ```
182
+
183
+ **Client**
184
+
185
+ ```bash
186
+ curl -s http://${MASTER_IP}:${PORT}/v1/chat/completions \
187
+ -H "Content-Type: application/json" \
188
+ -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
189
+ ```
190
+
191
+
192
+ ## Limitations & Future Plans
193
+
194
+ While Ling-2.6-1T excels in reasoning and agentic efficiency, our future development will focus on:
195
+
196
+ * **Intelligence-Efficiency Balance:** Further optimizing token efficiency for knowledge-intensive tasks.
197
+ * **Long-Range Consistency:** Enhancing global consistency in long-term planning and complex information retrieval.
198
+ * **Dynamic Alignment:** Refining cross-lingual alignment to eliminate occasional language-switching offsets under complex instructions.
199
+
200
+ We remain committed to pushing the boundaries of model performance to enhance delivery efficiency across all complex scenarios.
201
+
202
+ ## License
203
+
204
+ This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ling-V2/blob/main/LICENSE).