Image-Text-to-Text
Transformers
Safetensors
qwen3_5
text-generation-inference
unsloth
reasoning
chain-of-thought
lora
sft
agent
tool-use
function-calling
coder
conversational
Jackrong commited on
Commit
796fc1b
Β·
verified Β·
1 Parent(s): 2d724be

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +347 -0
README.md ADDED
@@ -0,0 +1,347 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Jackrong/Qwopus3.5-9B-v3.5
4
+ tags:
5
+ - text-generation-inference
6
+ - transformers
7
+ - unsloth
8
+ - qwen3_5
9
+ - reasoning
10
+ - chain-of-thought
11
+ - lora
12
+ - sft
13
+ - agent
14
+ - tool-use
15
+ - function-calling
16
+ - coder
17
+ license: apache-2.0
18
+ language:
19
+ - en
20
+ - zh
21
+ - es
22
+ - ru
23
+ - ja
24
+ pipeline_tag: image-text-to-text
25
+ datasets:
26
+ - lambda/hermes-agent-reasoning-traces
27
+ ---
28
+
29
+ # 🌟 Qwopus3.5-9B-coder
30
+
31
+ ## πŸ’‘ Base Model Overview
32
+
33
+ As the base model of this model, **Qwopus3.5-9B-v3.5** is already a model with powerful capabilities. On this foundation, **Qwopus3.5-9B-coder** is specially optimized and fine-tuned for high-performance Agentic Coding, complex Tool Calling, and deep logical reasoning.
34
+
35
+
36
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/8qFQVuCxbgkWqKa2B_Vph.jpeg)
37
+
38
+
39
+
40
+ ---
41
+
42
+ ## πŸš€ Model Fine-Tuning and Logical Alignment (Qwopus3.5-9B-coder)
43
+
44
+ πŸͺ**Qwopus3.5-9B-coder** is a programming agent reasoning-enhanced model that is specifically fine-tuned on the basis of **Qwopus3.5-9B-v3.5**.
45
+
46
+ ### πŸ›  Training Strategy
47
+
48
+ The fine-tuning process of this model deeply integrates **Trace Inversion** data augmentation technology with high-quality **Agent Traces**. This systematic approach not only strengthens the model's ability to solve complex programming tasks, but also greatly improves its logical coherence and accuracy when using various tools.
49
+
50
+ This model is designed specifically for the following goals:
51
+
52
+ - 🧩 More structured and stronger logical reasoning capabilities, reducing repetitive thinking
53
+ - πŸ’» More powerful capabilities in code writing, debugging, and repository-level task processing
54
+ - πŸ›  More stable and accurate Tool Calling capabilities for terminal commands, file operations, and browsers
55
+ - πŸ” Better cross-data source distillation alignment
56
+
57
+ > [!WARNING]
58
+ > **Community Release Notice**: Qwopus3.5-9B-coder is released purely as an experimental community version, aiming to explore the combination of Agent capabilities and deep reasoning, and is only for research and exploration use.
59
+ > **Warning**: Because this model is vertically fine-tuned for programming agents and deep reasoning, and has not undergone comprehensive general performance evaluation, its capabilities in general domains or specific non-programming tasks may suffer from Capability Decay. Users are advised to be aware of its limitations in other scenarios while exploring its core capabilities.
60
+
61
+ ---
62
+
63
+
64
+ ## πŸ“Š Baseline Performance Comparison
65
+
66
+ To verify the execution efficiency and logical robustness of **Qwopus3.5-9B-coder** in actual agent scenarios, we adopted the open-source testing framework [benchlocal](https://github.com/stevibe/benchlocal).
67
+
68
+ ### Test Configuration
69
+ - **Hardware Environment**: Apple Silicon (Mac)
70
+ - **Inference Backend**: LM Studio / MLX / GGUF
71
+ - **Testing Platform**: [benchlocal](https://github.com/stevibe/benchlocal) - An evaluation suite focusing on local model agent capabilities.
72
+
73
+ ### πŸ§ͺ Benchmark Results
74
+
75
+ <div style="display: inline-block; padding: 6px 16px; background: #e0f2fe; color: #0369a1; border: 1px solid #bae6fd; border-radius: 8px; font-weight: 700; font-size: 16px; margin-bottom: 12px;">1. Complex Agent Performance - HermesAgent-20</div>
76
+ The following is the comparative performance under the HermesAgent-20 (containing 20 complex agent scenarios) task set:
77
+
78
+ <table style="width: 100%; border-collapse: collapse; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif;">
79
+ <thead>
80
+ <tr>
81
+ <td colspan="4" style="padding: 8px 12px; font-weight: 600; color: #7c3aed; border-bottom: 1px solid rgba(124, 58, 237, 0.2); background: rgba(124, 58, 237, 0.05);">HermesAgent-20 Performance Metrics</td>
82
+ </tr>
83
+ <tr style="background: rgba(128, 128, 128, 0.02);">
84
+ <th style="padding: 7px 7px; padding-left: 20px; text-align: left; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Model</th>
85
+ <th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Test Set</th>
86
+ <th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Comprehensive Score</th>
87
+ <th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Core Dimensions (M/O/S/S/B)</th>
88
+ </tr>
89
+ </thead>
90
+ <tbody>
91
+ <tr>
92
+ <td style="padding: 7px 7px; padding-left: 20px; border-bottom: 1px solid rgba(128, 128, 128, 0.15);"><b><a href="https://huggingface.co/Jackrong/Qwopus3.5-9B-coder-GGUF" style="color: #7c3aed; text-decoration: none;">Qwopus3.5-9B-coder</a></b></td>
93
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">HermesAgent-20</td>
94
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); color: #7c3aed; font-weight: bold;">85</td>
95
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">84 / 93 / 88 / 75 / 84</td>
96
+ </tr>
97
+ <tr>
98
+ <td style="padding: 7px 7px; padding-left: 20px; border-bottom: 1px solid rgba(128, 128, 128, 0.15);"><a href="https://huggingface.co/Qwen/Qwen3.5-9B" style="color: #666; text-decoration: none;">Qwen/Qwen3.5-9B</a></td>
99
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">HermesAgent-20</td>
100
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">71</td>
101
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">75 / 58 / 100 / 53 / 69</td>
102
+ </tr>
103
+ <tr>
104
+ <td style="padding: 7px 7px; padding-left: 20px; border-bottom: 1px solid rgba(128, 128, 128, 0.15);"><a href="https://huggingface.co/armand0e/Qwen3.5-9B-Agent" style="color: #666; text-decoration: none;">armand0e/Qwen3.5-9B-Agent</a></td>
105
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">HermesAgent-20</td>
106
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">68</td>
107
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">71 / 83 / 43 / 61 / 80</td>
108
+ </tr>
109
+ <tr>
110
+ <td style="padding: 7px 7px; padding-left: 20px; border-bottom: 1px solid rgba(128, 128, 128, 0.15);"><a href="https://huggingface.co/DJLougen/Harmonic-Hermes-9B" style="color: #666; text-decoration: none;">DJLougen/Harmonic-Hermes-9B</a></td>
111
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">HermesAgent-20</td>
112
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">47</td>
113
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">60 / 45 / 23 / 69 / 38</td>
114
+ </tr>
115
+ </tbody>
116
+ </table>
117
+
118
+ <div style="display: inline-block; padding: 6px 16px; background: #e0f2fe; color: #0369a1; border: 1px solid #bae6fd; border-radius: 8px; font-weight: 700; font-size: 16px; margin-bottom: 12px;">2. Tool Call Stability - ToolCall-15</div>
119
+ This is a ToolCall-15 test set targeting the stability of tool calls, aiming to test the stability of the model in tool calling:
120
+
121
+ <table style="width: 100%; border-collapse: collapse; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif;">
122
+ <thead>
123
+ <tr>
124
+ <td colspan="4" style="padding: 8px 12px; font-weight: 600; color: #7c3aed; border-bottom: 1px solid rgba(124, 58, 237, 0.2); background: rgba(124, 58, 237, 0.05);">ToolCall-15 Stability Metrics</td>
125
+ </tr>
126
+ <tr style="background: rgba(128, 128, 128, 0.02);">
127
+ <th style="padding: 7px 7px; padding-left: 20px; text-align: left; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Model</th>
128
+ <th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Test Set</th>
129
+ <th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Comprehensive Score</th>
130
+ <th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Dimension Scores (A/B/C/D/E)</th>
131
+ </tr>
132
+ </thead>
133
+ <tbody>
134
+ <tr>
135
+ <td style="padding: 7px 7px; padding-left: 20px; border-bottom: 1px solid rgba(128, 128, 128, 0.15);"><b><a href="https://huggingface.co/Jackrong/Qwopus3.5-9B-coder-GGUF" style="color: #7c3aed; text-decoration: none;">Qwopus3.5-9B-coder</a></b></td>
136
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">ToolCall-15</td>
137
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); color: #7c3aed; font-weight: bold;">100</td>
138
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">100 / 100 / 100 / 100 / 100</td>
139
+ </tr>
140
+ <tr>
141
+ <td style="padding: 7px 7px; padding-left: 20px; border-bottom: 1px solid rgba(128, 128, 128, 0.15);"><a href="https://huggingface.co/Qwen/Qwen3.5-9B" style="color: #666; text-decoration: none;">Qwen/Qwen3.5-9B</a></td>
142
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">ToolCall-15</td>
143
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); color: #7c3aed; font-weight: bold;">100</td>
144
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">100 / 100 / 100 / 100 / 100</td>
145
+ </tr>
146
+ <tr>
147
+ <td style="padding: 7px 7px; padding-left: 20px; border-bottom: 1px solid rgba(128, 128, 128, 0.15);"><a href="https://huggingface.co/armand0e/Qwen3.5-9B-Agent" style="color: #666; text-decoration: none;">armand0e/Qwen3.5-9B-Agent</a></td>
148
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">ToolCall-15</td>
149
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">93</td>
150
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">100 / 100 / 100 / 67 / 100</td>
151
+ </tr>
152
+ </tbody>
153
+ </table>
154
+
155
+ <div style="display: inline-block; padding: 6px 16px; background: #e0f2fe; color: #0369a1; border: 1px solid #bae6fd; border-radius: 8px; font-weight: 700; font-size: 16px; margin-bottom: 12px;">3. Code Debugging & Bug Fixing - BugFind-15</div>
156
+ BugFind-15 is a test set containing 15 scenarios from shallow to deep, aiming to evaluate the real debugging capabilities of the model in discovering and fixing syntax, logical errors, and "trap" code in multiple programming languages through deterministic environment runtime verification.
157
+
158
+ <table style="width: 100%; border-collapse: collapse; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif;">
159
+ <thead>
160
+ <tr>
161
+ <td colspan="4" style="padding: 8px 12px; font-weight: 600; color: #7c3aed; border-bottom: 1px solid rgba(124, 58, 237, 0.2); background: rgba(124, 58, 237, 0.05);">BugFind-15 Performance Metrics</td>
162
+ </tr>
163
+ <tr style="background: rgba(128, 128, 128, 0.02);">
164
+ <th style="padding: 7px 7px; padding-left: 20px; text-align: left; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Model</th>
165
+ <th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Test Set</th>
166
+ <th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Comprehensive Score</th>
167
+ <th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Dimension Scores (A/B/C/D/E)</th>
168
+ </tr>
169
+ </thead>
170
+ <tbody>
171
+ <tr>
172
+ <td style="padding: 7px 7px; padding-left: 20px; border-bottom: 1px solid rgba(128, 128, 128, 0.15);"><b><a href="https://huggingface.co/Jackrong/Qwopus3.5-9B-coder-GGUF" style="color: #7c3aed; text-decoration: none;">Qwopus3.5-9B-coder</a></b></td>
173
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">BugFind-15</td>
174
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); color: #7c3aed; font-weight: bold;">79</td>
175
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">67 / 87 / 100 / 77 / 43</td>
176
+ </tr>
177
+ <tr>
178
+ <td style="padding: 7px 7px; padding-left: 20px; border-bottom: 1px solid rgba(128, 128, 128, 0.15);"><a href="https://huggingface.co/Jackrong/MLX-Qwen3.5-9B-DeepSeek-V4-Flash-8bit" style="color: #666; text-decoration: none;">Jackrong/MLX-Qwen3.5-9B-DeepSeek-V4-Flash</a></td>
179
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">BugFind-15</td>
180
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">75</td>
181
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">67 / 100 / 67 / 57 / 80</td>
182
+ </tr>
183
+ <tr>
184
+ <td style="padding: 7px 7px; padding-left: 20px; border-bottom: 1px solid rgba(128, 128, 128, 0.15);"><a href="https://huggingface.co/armand0e/Qwen3.5-9B-Agent" style="color: #666; text-decoration: none;">armand0e/Qwen3.5-9B-Agent</a></td>
185
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">BugFind-15</td>
186
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">58</td>
187
+ <td style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15);">29 / 87 / 73 / 20 / 67</td>
188
+ </tr>
189
+ </tbody>
190
+ </table>
191
+
192
+
193
+
194
+
195
+ > [!IMPORTANT]
196
+ > All tests were conducted with a temperature of 1 as officially recommended by qwen3.5. All errors and model issues were attempted to be regenerated twice after a test failure. If both attempts fail, it is considered a failure.
197
+ > Screenshots of all test interfaces are uploaded to the image folder of the repository.
198
+
199
+
200
+ ---
201
+
202
+ ### πŸ§ͺ Core Dataset Usage: Trace Inversion and High-Quality Agent Traces
203
+
204
+ In order to break through the "reasoning bubble" limitation of the model in actual programming and tool usage, and to endow it with real Agent behavioral capabilities, this model introduced core augmented datasets during training:
205
+
206
+ #### 1. Reasoning Synthetic Data Combining Trace Inversion
207
+ **Currently, based on public information, commercial models such as OpenAI's GPT series and Anthropic's Claude series have very clearly hidden the true internal reasoning chains of their models. For these models, what we can ultimately see in the API or front-end interface can often only be considered a highly compressed "Reasoning Bubble".**
208
+
209
+ To break through this limitation, we adopted the **Trace Inversion** technology. This technology utilizes an external "surrogate model" to reconstruct a complete and logically coherent deep reasoning chain based on the "question + final answer + compressed reasoning summary" published by commercial models. The "reasoning bubble", which originally consisted of only a few sentences and logical leaps, is expanded into a high-quality deep learning trace with complete derivation, calculation, and logical verification, providing step-by-step logical learning signals for the model.
210
+
211
+
212
+ ![a_high_resolution_infographic_slide_style_figure](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/Jo2bm_rUJQmfK3Na4Uja2.png)
213
+
214
+
215
+
216
+
217
+ #### 2. GLM-5.1 Agent Real Trace Data: lambda/hermes-agent-reasoning-traces
218
+ To significantly enhance the model's execution and coding capabilities in real environments, this model additionally introduced the **`lambda/hermes-agent-reasoning-traces`** dataset.
219
+
220
+
221
+ ![Screenshot 2026-05-16 at 5.06.53β€―PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/_U04B3HyUY403mQpW9Mz2.png)
222
+
223
+ - **Data Source and Scale**: This data subset contains approximately 10,000 high-quality multi-turn Tool Calling Trajectories generated based on the ZhipuAI GLM-5.1 and kimi-4.6 models.
224
+ - **Real Agent Behavior**: Unlike traditional synthetic data, these samples represent real Agent conversations. Each sample not only contains the step-by-step reasoning process in the `<think>` tags, but also includes actual tool execution results (rather than fabricated outputs out of thin air).
225
+ - **Extensive Domain Coverage**:
226
+ - **Terminal & Coding**: Script writing, code debugging, environment configuration, and data processing.
227
+ - **Repository Tasks**: Involving real code repository work, such as bug fixes, refactoring, and code review.
228
+ - **Browser Automation**: Web navigation, scraping, and form filling.
229
+ - **Agent Tools**: Memory persistence, task delegation, skill management, etc.
230
+
231
+ By learning these Agent trajectories that contain real feedback and thoughtful processes, Qwopus3.5-9B-coder can exhibit thinking and operational modes closer to human experts when facing complex programming and system operations tasks.
232
+
233
+ ---
234
+
235
+ ## πŸ—ΊοΈ Training Pipeline Overview
236
+
237
+ The training of this model integrates a phased learning pipeline of **Trace Inversion** data augmentation technology and **high-quality Agent Trajectories data**. Its core logic lies in restoring the highly compressed "reasoning bubble" of commercial models into a deep path for learning, and combining it with real agent operational traces to comprehensively improve the model's logical reasoning and code execution capabilities.
238
+
239
+ ```text
240
+ [ πŸ—ΊοΈ Trace Inversion: Full Process of Data Inversion and "Attack" Distillation ]
241
+
242
+ A. Surrogate Model Training
243
+ Open Source Model (GLM-5.1 / DS-V4) ──► Complete Reasoning Chain ──► [ Qwen3-235B Compression ] ──► Reasoning Bubbles
244
+ β”‚ β”‚
245
+ └──────────► [ Training ] β—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
246
+ (Base: Qwen3-4B-Instruct)
247
+ (Result: Trace-Inverter-4B)
248
+
249
+ B. Inversion Phase: "Attacking" Claude-4.7-Max
250
+ _______________________________________________________
251
+ | |
252
+ | Claude-4.7-Max API ──► Compressed Bubbles + Final Answer |
253
+ |_______________________________________________________|
254
+ β”‚
255
+ β–Ό
256
+ [ 🧠 Trace-Inverter-4B (Logical Reconstructor) ] ────► Synthetic CoT
257
+ β”‚
258
+ β–Ό
259
+ [ 🧩 Data Splicing ] ◄────────── (Original Prompt + Response)
260
+ (Embed the inverted chain of thought into <think> tags, and splice with the original Q&A pair for restoration)
261
+ β”‚
262
+ β–Ό
263
+ (Result: claude-opus-4.6/4.7 Inversion Set)
264
+
265
+ C. Final SFT Pipeline
266
+ ___________________________________________
267
+ | |
268
+ | Base Model (Qwopus3.5-9B-v3.5) |
269
+ |___________________________________________|
270
+ β”‚
271
+ β–Ό
272
+ [ πŸ“¦ Stage 1: Format Establishment and Logic Injection ] ───────► [ πŸ› οΈ Stage 2: Agent Trajectories and Programming Reinforcement ]
273
+ (Integrate inverted reasoning data, stabilize thinking format) (Introduce GLM-5.1 Agent Trajectories, reinforce interaction and execution)
274
+ β”‚ β”‚
275
+ β”‚ β–Ό
276
+ β”‚ __________________________________________________
277
+ β”‚ | πŸ” Hermes Agent Trace Sample Structure Breakdown (GLM-5.1) |
278
+ β”‚ | 1. [πŸ› οΈ System] -> JSON Tool Definition |
279
+ β”‚ | 2. [πŸ‘€ Human] -> Initial Task Instruction |
280
+ β”‚ | β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” |
281
+ β”‚ | β”‚ πŸ” Multi-turn Loop: β”‚ |
282
+ β”‚ | β”‚ 3. [🧠 GPT] -> <think> Logical Reasoning/Reflection β”‚ |
283
+ β”‚ | β”‚ 4. [πŸ€– GPT] -> Tool Call Execution Action β”‚ |
284
+ β”‚ | β”‚ 5. [βš™οΈ Tool] -> Real Feedback β”‚ |
285
+ β”‚ | β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ |
286
+ β”‚ |__________________________________________________|
287
+ β”‚ β”‚
288
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
289
+ β–Ό
290
+ ___________________________________
291
+ | |
292
+ | 🌟 Final Model: Qwopus3.5-9B-coder |
293
+ |___________________________________|
294
+ ```
295
+
296
+
297
+ > [!NOTE]
298
+ > Because agent trajectory datasets are complex and diverse. The datasets have undergone rigorous cleaning and formatting.
299
+
300
+ ---
301
+
302
+ ## 🀝 Collaboration & Training Details
303
+
304
+ This model is the result of continuous exploration in Agentic AI and reasoning capabilities.
305
+
306
+ **Training Infrastructure & Configuration:**
307
+ - πŸ–₯️ **Hardware:** Local compute devices / Cloud GPUs (e.g. GB10 / H100 / RTX 5090 / A100)
308
+ - βš™οΈ **Framework:** Unsloth for efficient fine-tuning
309
+
310
+ ---
311
+
312
+ ## ⚠️ IMPORTANT
313
+
314
+ > [!CAUTION]
315
+ > **Compatibility and Deployment Notice**
316
+ > - **Tool Calling Format**: When using this model for tool calling, please ensure that you use a Prompt format and System Prompt that match the training data to activate its Agent capabilities.
317
+ > - **Reasoning Output Extraction**: The model's thinking process is typically wrapped in `<think>` and `</think>` tags. When deploying to front-end applications, these tags may need to be parsed and hidden.
318
+
319
+ ---
320
+
321
+ ## πŸ“š Resources & Guides
322
+
323
+ πŸ‘‰ **[GitHub Repository: Jackrong-llm-finetuning-guide](https://github.com/R6410418/Jackrong-llm-finetuning-guide.git)**
324
+ Visit the repository to dive into our fine-tuning codebase and guides.
325
+
326
+ ---
327
+
328
+ ## πŸ™ Acknowledgements
329
+
330
+ Special thanks to:
331
+ - The Qwen team for providing a powerful foundation model.
332
+ - The open-source datasets provided by the community, especially **`lambda/hermes-agent-reasoning-traces`**, which has greatly helped the Agent capabilities of this model.
333
+ - The Unsloth team for their continuous maintenance of the efficient fine-tuning framework.
334
+ -
335
+
336
+ ---
337
+
338
+ ## πŸ“– Citation
339
+
340
+ ```bibtex
341
+ @misc{jackrong_qwopus35_9b_coder,
342
+ title = {Qwopus3.5-9B-coder},
343
+ author = {Jackrong},
344
+ year = {2026},
345
+ publisher = {Hugging Face}
346
+ }
347
+ ```