Image-Text-to-Text
Transformers
Safetensors
qwen3_5
text-generation-inference
unsloth
reasoning
chain-of-thought
lora
sft
agent
tool-use
function-calling
coder
conversational
Jackrong commited on
Commit
2f5d9a2
·
verified ·
1 Parent(s): e744108

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -2
README.md CHANGED
@@ -185,7 +185,6 @@ BugFind-15 is a test set containing 15 scenarios from shallow to deep, aiming to
185
 
186
 
187
 
188
-
189
  > [!IMPORTANT]
190
  > All tests were conducted with a temperature of 1 as officially recommended by qwen3.5. All errors and model issues were attempted to be regenerated twice after a test failure. If both attempts fail, it is considered a failure.
191
  > Screenshots of all test interfaces are uploaded to the image folder of the repository.
@@ -211,8 +210,8 @@ To break through this limitation, we adopted the **Trace Inversion** technology.
211
  #### 2. GLM-5.1 Agent Real Trace Data: lambda/hermes-agent-reasoning-traces
212
  To significantly enhance the model's execution and coding capabilities in real environments, this model additionally introduced the **`lambda/hermes-agent-reasoning-traces`** dataset.
213
 
 
214
 
215
- ![Screenshot 2026-05-16 at 5.06.53 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/_U04B3HyUY403mQpW9Mz2.png)
216
 
217
  - **Data Source and Scale**: This data subset contains approximately 10,000 high-quality multi-turn Tool Calling Trajectories generated based on the ZhipuAI GLM-5.1 and kimi-4.6 models.
218
  - **Real Agent Behavior**: Unlike traditional synthetic data, these samples represent real Agent conversations. Each sample not only contains the step-by-step reasoning process in the `<think>` tags, but also includes actual tool execution results (rather than fabricated outputs out of thin air).
@@ -291,6 +290,17 @@ The training of this model integrates a phased learning pipeline of **Trace Inve
291
  > [!NOTE]
292
  > Because agent trajectory datasets are complex and diverse. The datasets have undergone rigorous cleaning and formatting.
293
 
 
 
 
 
 
 
 
 
 
 
 
294
  ---
295
 
296
  ## 🤝 Collaboration & Training Details
 
185
 
186
 
187
 
 
188
  > [!IMPORTANT]
189
  > All tests were conducted with a temperature of 1 as officially recommended by qwen3.5. All errors and model issues were attempted to be regenerated twice after a test failure. If both attempts fail, it is considered a failure.
190
  > Screenshots of all test interfaces are uploaded to the image folder of the repository.
 
210
  #### 2. GLM-5.1 Agent Real Trace Data: lambda/hermes-agent-reasoning-traces
211
  To significantly enhance the model's execution and coding capabilities in real environments, this model additionally introduced the **`lambda/hermes-agent-reasoning-traces`** dataset.
212
 
213
+ ![Screenshot 2026-05-16 at 5.34.59 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/BTusWFqYaOS5GmRYvBuPq.png)
214
 
 
215
 
216
  - **Data Source and Scale**: This data subset contains approximately 10,000 high-quality multi-turn Tool Calling Trajectories generated based on the ZhipuAI GLM-5.1 and kimi-4.6 models.
217
  - **Real Agent Behavior**: Unlike traditional synthetic data, these samples represent real Agent conversations. Each sample not only contains the step-by-step reasoning process in the `<think>` tags, but also includes actual tool execution results (rather than fabricated outputs out of thin air).
 
290
  > [!NOTE]
291
  > Because agent trajectory datasets are complex and diverse. The datasets have undergone rigorous cleaning and formatting.
292
 
293
+ ## 🎯 Three-Stage Curriculum Learning
294
+
295
+ **Qwopus3.5-9B-coder** adopts a phased reasoning data mixture strategy similar to Curriculum Learning, gradually increasing the difficulty and complexity of training signals:
296
+
297
+ 1. **Early Stage (Format Establishment):** Focuses on short-to-medium length reasoning samples with stable formats. The primary goal of this stage is to establish a reliable, structured new reasoning format while avoiding overwhelming the model with extreme complexity.
298
+
299
+ 2. **Middle Stage (Complexity Scaling & Multi-Teacher Distillation):** Gradually increases the proportion of complex reasoning samples from multiple teacher models.
300
+ - The distillation data is sourced from more powerful models whose style distribution closely matches the base model, ensuring that the capability gap is not too wide, thereby achieving efficient learning.
301
+
302
+ 3. **Late Stage (Long-Context Reinforcement & Drift Prevention):** Reinforces reasoning capabilities in long contexts. Crucially, this stage retains **short-sample replay** to ensure the model maintains its short-context instruction-following capability and minimizes capability drift.
303
+
304
  ---
305
 
306
  ## 🤝 Collaboration & Training Details