add python codes to code blocks

#2
.eval_results/gpqa.yaml DELETED
@@ -1,7 +0,0 @@
1
- - dataset:
2
- id: Idavidrein/gpqa
3
- task_id: diamond
4
- value: 76.3
5
- source:
6
- url: https://huggingface.co/arcee-ai/Trinity-Large-Thinking
7
- name: Model Card
 
 
 
 
 
 
 
 
.eval_results/mmlu-pro.yaml DELETED
@@ -1,7 +0,0 @@
1
- - dataset:
2
- id: TIGER-Lab/MMLU-Pro
3
- task_id: mmlu_pro
4
- value: 83.4
5
- source:
6
- url: https://huggingface.co/arcee-ai/Trinity-Large-Thinking
7
- name: Model Card
 
 
 
 
 
 
 
 
.eval_results/swe-bench_verified.yaml DELETED
@@ -1,7 +0,0 @@
1
- - dataset:
2
- id: SWE-bench/SWE-bench_Verified
3
- task_id: swe_bench_%_resolved
4
- value: 63.2
5
- source:
6
- url: https://huggingface.co/arcee-ai/Trinity-Large-Thinking
7
- name: Model Card
 
 
 
 
 
 
 
 
.gitattributes CHANGED
@@ -34,5 +34,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
- All[[:space:]]charts.jpg filter=lfs diff=lfs merge=lfs -text
38
- All_charts.jpg filter=lfs diff=lfs merge=lfs -text
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
All_charts.jpg DELETED

Git LFS Details

  • SHA256: 7780a4bc991ece46293e7ba4f5209f992efcc7052c7cd949e1676cb970d2007a
  • Pointer size: 131 Bytes
  • Size of remote file: 140 kB
README.md CHANGED
@@ -84,7 +84,6 @@ Trinity-Large-Thinking shares the same sparse MoE architecture as Trinity-Large-
84
  | Architecture | Sparse MoE (AfmoeForCausalLM) |
85
 
86
  ## Benchmarks
87
- ![Benchmark charts](https://huggingface.co/arcee-ai/Trinity-Large-Thinking/resolve/main/All_charts.jpg)
88
 
89
  | Benchmark | Trinity-Large-Thinking | Opus-4.6 | GLM-5 | MiniMax-M2.7 | Kimi-K2.5 |
90
  |---|---:|---:|---:|---:|---:|
@@ -107,37 +106,29 @@ Trinity-Large-Thinking produces reasoning traces inside `<think>...</think>` blo
107
  This means:
108
 
109
  1. **Multi-turn conversations**: When building chat applications, include the full assistant response (thinking + answer) in the conversation history for subsequent turns.
110
- 2. **Agentic loops**: When using Trinity-Large-Thinking as the backbone of an agent (OpenClaw, Hermes Agent, or custom), ensure your tool-calling loop preserves reasoning in the message history between steps.
111
  3. **Context window management**: The 512k extended context window accommodates long reasoning chains across many agentic steps. If you must truncate history, prefer removing older turns entirely rather than stripping thinking tokens from recent turns.
112
 
113
  ### How thinking works
114
 
115
- The model reasons internally before producing its response. When served via vLLM, the reasoning is separated into a dedicated field in the API response:
116
-
117
- ```json
118
- // API response structure
119
- {
120
- "message": {
121
- "role": "assistant",
122
- "reasoning": "The user wants flight information. I need to determine the date for next Tuesday, search for flights SFO → JFK, and filter by price < $300.",
123
- "content": "\n",
124
- "tool_calls": [{
125
- "function": {
126
- "name": "search_flights",
127
- "arguments": "{\"origin\": \"SFO\", \"destination\": \"JFK\", \"date\": \"2026-04-07\", \"max_price\": 300}"
 
128
  }
129
- }]
130
- }
131
- }
132
- ```
133
-
134
- ### Preserving reasoning in multi-turn conversations
135
-
136
- When building multi-turn agentic loops, you **must** pass the reasoning field back on assistant messages in subsequent requests. The chat template reads this field and re-wraps it in `<think>...</think>` tags during tokenization, maintaining the model's chain-of-thought across turns.
137
 
138
- **⚠️ Field name compatibility**: In vLLM OpenAI-compatible chat APIs, input compatibility for `reasoning_content` can vary by version, and some versions only honor `reasoning` ([related issue](https://github.com/vllm-project/vllm/issues/38488)). For maximum compatibility in multi-turn loops, send assistant reasoning back as `reasoning`. If your SDK exposes `reasoning_content` in responses, map it to `reasoning` when appending assistant turns.
139
-
140
- **What happens if reasoning is omitted entirely?** If the assistant message has no reasoning field at all (neither `reasoning` nor `reasoning_content`), or if `content` is `null`, the model can lose prior chain-of-thought context. On simple tasks this may work fine, but on complex multi-step agentic tasks, the model can produce malformed tool calls (e.g., tool call XML appearing inside the reasoning field instead of as structured `tool_calls`). For best results, always preserve the reasoning field and use `""` instead of `null` for content on tool-call turns.
141
 
142
  ## Training Configuration
143
 
@@ -169,20 +160,18 @@ When building multi-turn agentic loops, you **must** pass the reasoning field ba
169
 
170
  Supported in vLLM 0.11.1+. For agentic use with both reasoning and tool calling:
171
 
172
- ```bash
173
- vllm serve arcee-ai/Trinity-Large-Thinking \
174
- --dtype bfloat16 \
175
- --reasoning-parser deepseek_r1 \
176
- --enable-auto-tool-choice \
177
- --tool-call-parser qwen3_coder
178
- ```
179
 
180
  This configuration:
181
- - `--reasoning-parser deepseek_r1` — Parses `<think>...</think>` reasoning blocks and exposes them via the `reasoning` field in the API response
182
  - `--tool-call-parser qwen3_coder` — Parses structured tool calls from the model output into the OpenAI-compatible `tool_calls` array
183
 
184
-
185
- #### Single-turn example
186
 
187
  ```python
188
  from openai import OpenAI
@@ -194,18 +183,22 @@ response = client.chat.completions.create(
194
  messages=[
195
  {"role": "user", "content": "What's the weather like in Paris?"}
196
  ],
197
- tools=[{
198
- "type": "function",
199
- "function": {
200
- "name": "get_weather",
201
- "description": "Get current weather for a location",
202
- "parameters": {
203
- "type": "object",
204
- "properties": {"location": {"type": "string"}},
205
- "required": ["location"]
 
 
 
 
206
  }
207
  }
208
- }],
209
  )
210
 
211
  # Access reasoning (thinking) content
@@ -216,87 +209,7 @@ content = response.choices[0].message.content
216
  tool_calls = response.choices[0].message.tool_calls
217
  ```
218
 
219
- #### Multi-turn agentic loop example
220
-
221
- The key pattern: after each turn, append the **full** assistant response (including reasoning) back to the message history, then append tool results, and send the updated history for the next turn.
222
-
223
- ```python
224
- import json
225
- from openai import OpenAI
226
-
227
- client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1")
228
- MODEL = "arcee-ai/Trinity-Large-Thinking"
229
-
230
- tools = [
231
- {"type": "function", "function": {
232
- "name": "get_customer_by_email",
233
- "description": "Look up a customer by email.",
234
- "parameters": {"type": "object", "properties": {"email": {"type": "string"}}, "required": ["email"]}
235
- }},
236
- {"type": "function", "function": {
237
- "name": "cancel_subscription",
238
- "description": "Cancel a subscription. Requires customer_id.",
239
- "parameters": {"type": "object", "properties": {"customer_id": {"type": "string"}, "reason": {"type": "string"}}, "required": ["customer_id"]}
240
- }}
241
- ]
242
-
243
- def execute_tool(name, arguments):
244
- """Simulate tool execution — replace with real implementations."""
245
- args = json.loads(arguments)
246
- if name == "get_customer_by_email":
247
- return json.dumps({"customer_id": "C2001", "name": "Jane Doe", "plan": "Premium", "status": "active"})
248
- elif name == "cancel_subscription":
249
- return json.dumps({"success": True, "message": f"Subscription cancelled for {args['customer_id']}"})
250
-
251
- messages = [
252
- {"role": "system", "content": "You are a helpful customer service agent."},
253
- {"role": "user", "content": "I want to cancel my subscription. My email is jane@example.com"}
254
- ]
255
-
256
- # Agent loop
257
- while True:
258
- response = client.chat.completions.create(
259
- model=MODEL, messages=messages, tools=tools,
260
- tool_choice="auto", temperature=0, max_tokens=1000
261
- )
262
- msg = response.choices[0].message
263
-
264
- # Build assistant message — PRESERVE the reasoning field
265
- assistant_msg = {"role": "assistant", "content": msg.content}
266
- if msg.reasoning_content:
267
- assistant_msg["reasoning"] = msg.reasoning_content # ← critical for multi-turn
268
- if msg.tool_calls:
269
- assistant_msg["tool_calls"] = [
270
- {"id": tc.id, "type": "function", "function": {"name": tc.function.name, "arguments": tc.function.arguments}}
271
- for tc in msg.tool_calls
272
- ]
273
- messages.append(assistant_msg)
274
-
275
- # If no tool calls, model gave its final response — done
276
- if not msg.tool_calls:
277
- print(f"Final response: {msg.content}")
278
- break
279
-
280
- # Execute tool calls and append results
281
- for tc in msg.tool_calls:
282
- result = execute_tool(tc.function.name, tc.function.arguments)
283
- print(f" Tool: {tc.function.name}({tc.function.arguments}) → {result}")
284
- messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})
285
- ```
286
-
287
- Expected output:
288
- ```
289
- Tool: get_customer_by_email({"email": "jane@example.com"}) → {"customer_id": "C2001", ...}
290
- Tool: cancel_subscription({"customer_id": "C2001", ...}) → {"success": true, ...}
291
- Final response: Your subscription has been cancelled successfully.
292
- ```
293
-
294
- The critical line is:
295
- ```python
296
- assistant_msg["reasoning"] = msg.reasoning_content # ← pass reasoning back as "reasoning"
297
- ```
298
-
299
- The OpenAI SDK exposes the field as `reasoning_content` on the response object, but vLLM 0.18+ expects `reasoning` on input messages. The chat template then re-wraps it in `<think>...</think>` tags automatically.
300
 
301
  ### Transformers
302
 
@@ -340,32 +253,20 @@ print(response)
340
 
341
  ### API
342
 
343
- #### OpenRouter
344
-
345
- Available on [OpenRouter](https://openrouter.ai/) with full reasoning and tool calling support:
346
-
347
- ```bash
348
- curl -X POST "https://openrouter.ai/v1/chat/completions" \
349
- -H "Authorization: Bearer $OPENROUTER_API_KEY" \
350
- -H "Content-Type: application/json" \
351
- -d '{
352
- "model": "arcee-ai/trinity-large-thinking",
353
- "messages": [
354
- {
355
- "role": "user",
356
- "content": "What are some fun things to do in New York?"
357
- }
358
- ]
359
- }'
360
- ```
361
-
362
- **Multi-turn with OpenRouter**: OpenRouter returns reasoning in a `reasoning_details` object (their unified reasoning shape). For multi-turn conversations, pass `reasoning_details` back as-is on assistant messages in subsequent requests — OpenRouter handles model-specific upstream translation (for Trinity, this is sent as `reasoning_content` on assistant turns upstream). For debugging, enable echo to inspect the upstream API call:
363
-
364
- ```json
365
- {"debug": {"echo_upstream_body": true}}
366
- ```
367
-
368
- See [OpenRouter debugging docs](https://openrouter.ai/docs/api/reference/errors-and-debugging#debugging) for details.
369
 
370
  ## Agentic Use Cases
371
 
@@ -375,8 +276,6 @@ Trinity-Large-Thinking is optimized for deployment as the reasoning backbone of
375
 
376
  Trinity-Large-Thinking works as a drop-in brain for OpenClaw agents. Its native tool-calling format is compatible with OpenClaw's execution loop, and the extended reasoning enables reliable multi-step task completion — from email triage to code generation to meeting scheduling. Our 91.9% PinchBench score reflects real-world OpenClaw task performance.
377
 
378
- **Deploying for OpenClaw users**: OpenClaw preserves full assistant turns across steps. For vLLM compatibility in public deployments, ensure the assistant reasoning is forwarded on the next turn as `reasoning` (not only `reasoning_content`) and keep assistant `content` non-null (empty string is fine). If your SDK emits `reasoning_content`, add a small adapter at your gateway to map it to `reasoning` before sending requests to vLLM.
379
-
380
  ### Hermes Agent
381
 
382
  Compatible with the Hermes Agent framework from Nous Research. Trinity-Large-Thinking's reasoning traces pair naturally with Hermes's skill-learning loop — the model's explicit chain-of-thought makes skill extraction more reliable, and its strong tool-calling capabilities integrate directly via the Hermes tool-use protocol.
@@ -386,14 +285,12 @@ Compatible with the Hermes Agent framework from Nous Research. Trinity-Large-Thi
386
  For custom implementations, the key integration pattern is:
387
 
388
  1. Send the user message with tool definitions
389
- 2. Receive the response with `reasoning` + `content` + `tool_calls`
390
  3. Execute the tool calls
391
- 4. Append the **full** assistant response (reasoning + content + tool calls) and tool results to the message history
392
  5. Send the updated history back for the next step
393
  6. Repeat until the model produces a final response without tool calls
394
 
395
- > **Important**: Step 4 must include the `reasoning` field on the assistant message. The chat template reads this field and re-wraps it in `<think>...</think>` tags during tokenization. Omitting it degrades multi-step performance — see [Preserving reasoning in multi-turn conversations](#preserving-reasoning-in-multi-turn-conversations) for details.
396
-
397
  ## License
398
 
399
  Trinity-Large-Thinking is released under the Apache License, Version 2.0.
@@ -402,15 +299,13 @@ Trinity-Large-Thinking is released under the Apache License, Version 2.0.
402
 
403
  If you use this model, please cite:
404
 
405
- ```bibtex
406
- @misc{singh2026arceetrinity,
407
- title = {Arcee Trinity Large Technical Report},
408
- author = {Varun Singh and Lucas Krauss and Sami Jaghouar and Matej Sirovatka and Charles Goddard and Fares Obied and Jack Min Ong and Jannik Straube and Fern and Aria Harley and Conner Stewart and Colin Kealty and Maziyar Panahi and Simon Kirsten and Anushka Deshpande and Anneketh Vij and Arthur Bresnu and Pranav Veldurthi and Raghav Ravishankar and Hardik Bishnoi and DatologyAI Team and Arcee AI Team and Prime Intellect Team and Mark McQuade and Johannes Hagemann and Lucas Atkins},
409
- year = {2026},
410
- eprint = {2602.17004},
411
- archivePrefix= {arXiv},
412
- primaryClass = {cs.LG},
413
- doi = {10.48550/arXiv.2602.17004},
414
- url = {https://arxiv.org/abs/2602.17004}
415
- }
416
- ```
 
84
  | Architecture | Sparse MoE (AfmoeForCausalLM) |
85
 
86
  ## Benchmarks
 
87
 
88
  | Benchmark | Trinity-Large-Thinking | Opus-4.6 | GLM-5 | MiniMax-M2.7 | Kimi-K2.5 |
89
  |---|---:|---:|---:|---:|---:|
 
106
  This means:
107
 
108
  1. **Multi-turn conversations**: When building chat applications, include the full assistant response (thinking + answer) in the conversation history for subsequent turns.
109
+ 2. **Agentic loops**: When using Trinity-Large-Thinking as the backbone of an agent (OpenClaw, Hermes Agent, or custom), ensure your tool-calling loop preserves `<think>` blocks in the message history between steps.
110
  3. **Context window management**: The 512k extended context window accommodates long reasoning chains across many agentic steps. If you must truncate history, prefer removing older turns entirely rather than stripping thinking tokens from recent turns.
111
 
112
  ### How thinking works
113
 
114
+ The model reasons internally before producing its response. When served via vLLM, the reasoning is separated into a dedicated `reasoning_content` field in the API response:
115
+
116
+ // API response structure
117
+ {
118
+ "message": {
119
+ "role": "assistant",
120
+ "reasoning_content": "The user wants flight information. I need to determine the date for next Tuesday, search for flights SFO → JFK, and filter by price < $300.",
121
+ "content": "\n",
122
+ "tool_calls": [{
123
+ "function": {
124
+ "name": "search_flights",
125
+ "arguments": "{\"origin\": \"SFO\", \"destination\": \"JFK\", \"date\": \"2026-04-07\", \"max_price\": 300}"
126
+ }
127
+ }]
128
  }
129
+ }
 
 
 
 
 
 
 
130
 
131
+ When building multi-turn agentic loops, include the `reasoning_content` back in the conversation history (re-wrapped in `<think>...</think>` tags within the assistant message) so the model retains its prior reasoning chain.
 
 
132
 
133
  ## Training Configuration
134
 
 
160
 
161
  Supported in vLLM 0.11.1+. For agentic use with both reasoning and tool calling:
162
 
163
+ vllm serve arcee-ai/Trinity-Large-Thinking \
164
+ --dtype bfloat16 \
165
+ --enable-reasoning \
166
+ --reasoning-parser deepseek_r1 \
167
+ --enable-auto-tool-choice \
168
+ --tool-call-parser qwen3_coder
 
169
 
170
  This configuration:
171
+ - `--reasoning-parser deepseek_r1` — Parses `<think>...</think>` reasoning blocks and exposes them via the `reasoning_content` field in the API response
172
  - `--tool-call-parser qwen3_coder` — Parses structured tool calls from the model output into the OpenAI-compatible `tool_calls` array
173
 
174
+ **Extracting reasoning content from the API response:**
 
175
 
176
  ```python
177
  from openai import OpenAI
 
183
  messages=[
184
  {"role": "user", "content": "What's the weather like in Paris?"}
185
  ],
186
+ tools=[ # your tool definitions here
187
+ {
188
+ "type": "function",
189
+ "function": {
190
+ "name": "get_weather",
191
+ "description": "Get current weather for a location",
192
+ "parameters": {
193
+ "type": "object",
194
+ "properties": {
195
+ "location": {"type": "string"}
196
+ },
197
+ "required": ["location"]
198
+ }
199
  }
200
  }
201
+ ],
202
  )
203
 
204
  # Access reasoning (thinking) content
 
209
  tool_calls = response.choices[0].message.tool_calls
210
  ```
211
 
212
+ **Note on thinking-in-context with vLLM**: When building multi-turn agentic loops, include both `reasoning_content` and `content` in the conversation history you send back to the model. The reasoning content should be re-wrapped in `<think>...</think>` tags within the assistant message.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
213
 
214
  ### Transformers
215
 
 
253
 
254
  ### API
255
 
256
+ Available on OpenRouter:
257
+
258
+ curl -X POST "https://openrouter.ai/v1/chat/completions" \
259
+ -H "Authorization: Bearer $OPENROUTER_API_KEY" \
260
+ -H "Content-Type: application/json" \
261
+ -d '{
262
+ "model": "arcee-ai/trinity-large-thinking",
263
+ "messages": [
264
+ {
265
+ "role": "user",
266
+ "content": "What are some fun things to do in New York?"
267
+ }
268
+ ]
269
+ }'
 
 
 
 
 
 
 
 
 
 
 
 
270
 
271
  ## Agentic Use Cases
272
 
 
276
 
277
  Trinity-Large-Thinking works as a drop-in brain for OpenClaw agents. Its native tool-calling format is compatible with OpenClaw's execution loop, and the extended reasoning enables reliable multi-step task completion — from email triage to code generation to meeting scheduling. Our 91.9% PinchBench score reflects real-world OpenClaw task performance.
278
 
 
 
279
  ### Hermes Agent
280
 
281
  Compatible with the Hermes Agent framework from Nous Research. Trinity-Large-Thinking's reasoning traces pair naturally with Hermes's skill-learning loop — the model's explicit chain-of-thought makes skill extraction more reliable, and its strong tool-calling capabilities integrate directly via the Hermes tool-use protocol.
 
285
  For custom implementations, the key integration pattern is:
286
 
287
  1. Send the user message with tool definitions
288
+ 2. Receive the response with `<think>` reasoning + tool calls
289
  3. Execute the tool calls
290
+ 4. Append the **full** assistant response (thinking + content + tool calls) and tool results to the message history
291
  5. Send the updated history back for the next step
292
  6. Repeat until the model produces a final response without tool calls
293
 
 
 
294
  ## License
295
 
296
  Trinity-Large-Thinking is released under the Apache License, Version 2.0.
 
299
 
300
  If you use this model, please cite:
301
 
302
+ @misc{singh2026arceetrinity,
303
+ title = {Arcee Trinity Large Technical Report},
304
+ author = {Varun Singh and Lucas Krauss and Sami Jaghouar and Matej Sirovatka and Charles Goddard and Fares Obied and Jack Min Ong and Jannik Straube and Fern and Aria Harley and Conner Stewart and Colin Kealty and Maziyar Panahi and Simon Kirsten and Anushka Deshpande and Anneketh Vij and Arthur Bresnu and Pranav Veldurthi and Raghav Ravishankar and Hardik Bishnoi and DatologyAI Team and Arcee AI Team and Prime Intellect Team and Mark McQuade and Johannes Hagemann and Lucas Atkins},
305
+ year = {2026},
306
+ eprint = {2602.17004},
307
+ archivePrefix= {arXiv},
308
+ primaryClass = {cs.LG},
309
+ doi = {10.48550/arXiv.2602.17004},
310
+ url = {https://arxiv.org/abs/2602.17004}
311
+ }