Image-Text-to-Text
Transformers
Safetensors
English
qwen3_5
text-generation-inference
unsloth
conversational
armand0e's picture
Update README.md
58e7ba0 verified
---
base_model: Qwen/Qwen3.5-9B
tags:
- text-generation-inference
- transformers
- unsloth
- qwen3_5
license: apache-2.0
language:
- en
datasets:
- TeichAI/claude-4.5-opus-high-reasoning-250x
- armand0e/badlogicgames-pi-mono-opus-filtered
---
# Qwen3.5 9B - Opus Agent
This is a finetune on Opus traces and a small dataset. Reasoning was left untouched
Total train time: 4 hours
## Benchmarks
### General benchmarks
![Benchmark Comparison](https://cdn-uploads.huggingface.co/production/uploads/66bcb202eb4f43ee8aa6bbfb/fhMQE5H-AsxdEEQlYmdvr.png)
Benchmarks provided by [@nightmedia](https://huggingface.co/nightmedia), as always thanks for taking the time :)
```
arc arc/e boolq
armand0e/Qwen3.5-9B-Opus-Agent 0.589 0.747 0.901
Jackrong/Qwopus3.5-9B-Coder 0.561 0.721 0.89
Qwen3.5-9B 0.571 0.719 0.895
```
### Targeted benchmarks
Conducted via [BenchLocal](https://github.com/stevibe/BenchLocal). All benchmarks are 2-shot (1 retry on failure) for ease of comparison to the numbers found in [Jackrong's Qwopus3.5 Coder](https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF)
All benchmarks for other models were done in Q8_0 only this model's benchmarks were done in Q4_K_M
<h3 style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; margin: 22px 0 6px;">1. Instruction Following - InstructFollow-15</h3>
<p style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; margin: 0 0 10px; color: #555;">InstructFollow-15 evaluates formatting, count, numbering, sentence, and length constraints.</p>
<table style="width: 100%; border-collapse: collapse; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; margin-bottom: 24px;">
<thead>
<tr>
<td colspan="4" style="padding: 8px 12px; font-weight: 600; color: #cc785c; border-bottom: 1px solid rgba(204, 120, 92, 0.25); background: rgba(204, 120, 92, 0.06);">Instruction Following - InstructFollow-15 Metrics</td>
</tr>
<tr style="background: rgba(128, 128, 128, 0.02);">
<th style="padding: 7px 7px; padding-left: 20px; text-align: left; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Model</th>
<th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Test Set</th>
<th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Comprehensive Score</th>
<th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Dimension Scores (A/B/C/D/E)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><b><a href="https://huggingface.co/armand0e/Qwen3.5-9B-Opus-Agent" style="color: #cc785c; text-decoration: none;">armand0e/Qwen3.5-9B-Opus-Agent</a></b></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">InstructFollow-15</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center; color: #cc785c; font-weight: bold;">97</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center; color: #cc785c; font-weight: bold;">100 / 100 / 100 / 85 / 100</td>
</tr>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><a href="https://huggingface.co/Jackrong/Qwopus3.5-9B-coder-GGUF" style="color: #666; text-decoration: none;">Jackrong/Qwopus3.5-9B-coder</a></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">InstructFollow-15</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">93</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">100 / 100 / 100 / 67 / 100</td>
</tr>
</tbody>
</table>
<h3 style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; margin: 22px 0 6px;">2. Code Debugging &amp; Bug Fixing - BugFind-15</h3>
<p style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; margin: 0 0 10px; color: #555;">BugFind-15 evaluates real debugging capability across syntax bugs, logic errors, and trap code.</p>
<table style="width: 100%; border-collapse: collapse; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; margin-bottom: 24px;">
<thead>
<tr>
<td colspan="4" style="padding: 8px 12px; font-weight: 600; color: #cc785c; border-bottom: 1px solid rgba(204, 120, 92, 0.25); background: rgba(204, 120, 92, 0.06);">Code Debugging &amp; Bug Fixing - BugFind-15 Metrics</td>
</tr>
<tr style="background: rgba(128, 128, 128, 0.02);">
<th style="padding: 7px 7px; padding-left: 20px; text-align: left; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Model</th>
<th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Test Set</th>
<th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Comprehensive Score</th>
<th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Dimension Scores (A/B/C/D/E)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><b><a href="https://huggingface.co/armand0e/Qwen3.5-9B-Opus-Agent" style="color: #cc785c; text-decoration: none;">armand0e/Qwen3.5-9B-Opus-Agent</a></b></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">BugFind-15</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center; color: #cc785c; font-weight: bold;">84</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center; color: #cc785c; font-weight: bold;">67 / 100 / 87 / 67 / 90</td>
</tr>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><a href="https://huggingface.co/Jackrong/Qwopus3.5-9B-coder-GGUF" style="color: #666; text-decoration: none;">Jackrong/Qwopus3.5-9B-coder</a></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">BugFind-15</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">79</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">67 / 87 / 100 / 77 / 43</td>
</tr>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><a href="https://huggingface.co/Jackrong/MLX-Qwen3.5-9B-DeepSeek-V4-Flash-8bit" style="color: #666; text-decoration: none;">Jackrong/MLX-Qwen3.5-9B-DeepSeek-V4-Flash</a></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">BugFind-15</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">75</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">67 / 100 / 67 / 57 / 80</td>
</tr>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><a href="https://huggingface.co/armand0e/Qwen3.5-9B-Agent" style="color: #666; text-decoration: none;">armand0e/Qwen3.5-9B-Agent</a></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">BugFind-15</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">58</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">29 / 87 / 73 / 20 / 67</td>
</tr>
</tbody>
</table>
<h3 style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; margin: 22px 0 6px;">3. Tool Call Stability - ToolCall-15</h3>
<p style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; margin: 0 0 10px; color: #555;">ToolCall-15 targets stability and precision in direct tool-calling behavior.</p>
<table style="width: 100%; border-collapse: collapse; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; margin-bottom: 24px;">
<thead>
<tr>
<td colspan="4" style="padding: 8px 12px; font-weight: 600; color: #cc785c; border-bottom: 1px solid rgba(204, 120, 92, 0.25); background: rgba(204, 120, 92, 0.06);">Tool Call Stability - ToolCall-15 Metrics</td>
</tr>
<tr style="background: rgba(128, 128, 128, 0.02);">
<th style="padding: 7px 7px; padding-left: 20px; text-align: left; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Model</th>
<th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Test Set</th>
<th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Comprehensive Score</th>
<th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Dimension Scores (A/B/C/D/E)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><b><a href="https://huggingface.co/armand0e/Qwen3.5-9B-Opus-Agent" style="color: #cc785c; text-decoration: none;">armand0e/Qwen3.5-9B-Opus-Agent</a></b></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">ToolCall-15</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center; color: #cc785c; font-weight: bold;">100</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center; color: #cc785c; font-weight: bold;">100 / 100 / 100 / 100 / 100</td>
</tr>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><b><a href="https://huggingface.co/Jackrong/Qwopus3.5-9B-coder-GGUF" style="color: #cc785c; text-decoration: none;">Jackrong/Qwopus3.5-9B-coder</a></b></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">ToolCall-15</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center; color: #cc785c; font-weight: bold;">100</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center; color: #cc785c; font-weight: bold;">100 / 100 / 100 / 100 / 100</td>
</tr>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><b><a href="https://huggingface.co/Qwen/Qwen3.5-9B" style="color: #cc785c; text-decoration: none;">Qwen/Qwen3.5-9B</a></b></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">ToolCall-15</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center; color: #cc785c; font-weight: bold;">100</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center; color: #cc785c; font-weight: bold;">100 / 100 / 100 / 100 / 100</td>
</tr>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><a href="https://huggingface.co/armand0e/Qwen3.5-9B-Agent" style="color: #666; text-decoration: none;">armand0e/Qwen3.5-9B-Agent</a></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">ToolCall-15</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">93</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">100 / 100 / 100 / 67 / 100</td>
</tr>
</tbody>
</table>
<h3 style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; margin: 22px 0 6px;">4. Complex Agent Performance - HermesAgent-20</h3>
<p style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; margin: 0 0 10px; color: #555;">HermesAgent-20 evaluates complex agent behavior across memory, orchestration, skill use, scheduling, and delegation.</p>
<table style="width: 100%; border-collapse: collapse; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; margin-bottom: 24px;">
<thead>
<tr>
<td colspan="4" style="padding: 8px 12px; font-weight: 600; color: #cc785c; border-bottom: 1px solid rgba(204, 120, 92, 0.25); background: rgba(204, 120, 92, 0.06);">Complex Agent Performance - HermesAgent-20 Metrics</td>
</tr>
<tr style="background: rgba(128, 128, 128, 0.02);">
<th style="padding: 7px 7px; padding-left: 20px; text-align: left; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Model</th>
<th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Test Set</th>
<th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Comprehensive Score</th>
<th style="padding: 7px 7px; text-align: center; border-bottom: 1px solid rgba(128, 128, 128, 0.15); font-size: 13px; color: #666;">Core Dimensions (Memory / Orchestration / Skills / Scheduling / Boundaries)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><b><a href="https://huggingface.co/Jackrong/Qwopus3.5-9B-coder-GGUF" style="color: #cc785c; text-decoration: none;">Jackrong/Qwopus3.5-9B-coder</a></b></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">HermesAgent-20</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center; color: #cc785c; font-weight: bold;">85</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center; color: #cc785c; font-weight: bold;">84 / 93 / 88 / 75 / 84</td>
</tr>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><a href="https://huggingface.co/armand0e/Qwen3.5-9B-Opus-Agent" style="color: #666; text-decoration: none;">armand0e/Qwen3.5-9B-Opus-Agent</a></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">HermesAgent-20</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">80</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">100 / 93 / 80 / 75 / 50</td>
</tr>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><a href="https://huggingface.co/Qwen/Qwen3.5-9B" style="color: #666; text-decoration: none;">Qwen/Qwen3.5-9B</a></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">HermesAgent-20</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">71</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">75 / 58 / 100 / 53 / 69</td>
</tr>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><a href="https://huggingface.co/armand0e/Qwen3.5-9B-Agent" style="color: #666; text-decoration: none;">armand0e/Qwen3.5-9B-Agent</a></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">HermesAgent-20</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">68</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">71 / 83 / 43 / 61 / 80</td>
</tr>
<tr>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); padding-left: 20px;"><a href="https://huggingface.co/DJLougen/Harmonic-Hermes-9B" style="color: #666; text-decoration: none;">DJLougen/Harmonic-Hermes-9B</a></td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">HermesAgent-20</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">47</td>
<td style="padding: 7px 7px; border-bottom: 1px solid rgba(128, 128, 128, 0.15); text-align: center;">60 / 45 / 23 / 69 / 38</td>
</tr>
</tbody>
</table>
<details>
<summary>Click to show screenshots</summary>
*Ignore the gemma-4 llama.cpp alias I had set, this was old and I forgot to change it*
#### ToolCall-15
![image](https://cdn-uploads.huggingface.co/production/uploads/66bcb202eb4f43ee8aa6bbfb/Mfh_EiXuHX9G8yHZaBwUz.png)
#### HermesAgent-20
![image](https://cdn-uploads.huggingface.co/production/uploads/66bcb202eb4f43ee8aa6bbfb/9sdPJzdwl10FpoRO1qRJD.png)
#### BugFind-15
![image](https://cdn-uploads.huggingface.co/production/uploads/66bcb202eb4f43ee8aa6bbfb/8J4GVlZMkYCegLUBwon9K.png)
#### InstructFollow-15
![image](https://cdn-uploads.huggingface.co/production/uploads/66bcb202eb4f43ee8aa6bbfb/5gnnjvnroS6OAN94-urpj.png)
</details>
## Training Script
<details>
<summary>Training Script</summary>
```py
# -*- coding: utf-8 -*-
import os
from unsloth import FastModel
import torch
from trl import SFTConfig, SFTTrainer
from teich import mask_data, prepare_data
MAX_SEQ_LEN = 32768
MODEL_NAME = os.environ.get("MODEL_NAME", "qwen/Qwen3.5-9B")
OUTPUT_DIR = os.environ.get("OUTPUT_DIR", "outputs/qwen-tool-sft")
HUB_REPO_ID = os.environ.get("HUB_REPO_ID", "armand0e/Qwen3.5-9B-Opus-Agent")
HF_TOKEN = os.environ.get("HF_TOKEN", "")
model, tokenizer = FastModel.from_pretrained(
model_name=MODEL_NAME,
max_seq_length=MAX_SEQ_LEN,
load_in_4bit=False,
load_in_8bit=False,
full_finetuning=False,
)
model = FastModel.get_peft_model(
model,
finetune_vision_layers = False, # Turn off for just text!
finetune_language_layers = True, # Should leave on!
finetune_attention_modules = True, # Attention good for GRPO
finetune_mlp_modules = True, # Should leave on always!
r = 32, # Larger = higher accuracy, but might overfit
lora_alpha = 64, # Recommended alpha == r at least
lora_dropout = 0,
bias = "none",
random_state = 3407,
)
train_dataset = prepare_data(
{
"chat": {
"source": "TeichAI/claude-4.5-opus-high-reasoning-250x"
},
"opus-agent": {
"source": "armand0e/badlogicgames-pi-mono-opus-filtered",
},
},
tokenizer,
split="train",
hf_token=HF_TOKEN,
chat_template_kwargs={"enable_thinking": True},
max_length=MAX_SEQ_LEN,
drop_oversized_examples=True,
trim_oversized_followups=True,
tokenize=True,
strict=True,
)
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=train_dataset,
eval_dataset=None,
args=SFTConfig(
dataset_text_field="text",
dataset_num_proc=1,
max_length=MAX_SEQ_LEN,
packing=False,
per_device_train_batch_size=1,
gradient_accumulation_steps=8,
warmup_steps= 5,
num_train_epochs=2,
learning_rate=2e-5,
logging_steps=1,
save_steps=100,
save_total_limit=3,
optim="adamw_8bit",
weight_decay=0.01,
max_grad_norm=0.3,
lr_scheduler_type="linear",
output_dir=OUTPUT_DIR,
seed=3407,
report_to="none",
),
)
trainer = mask_data(
trainer,
tokenizer=tokenizer,
train_on_reasoning=False,
train_on_final_answers=True,
train_on_tools=True,
)
print(trainer.train_dataset.preview())
trainer_stats = trainer.train(resume_from_checkpoint=False)
model.push_to_hub(f"{HUB_REPO_ID}-LoRA", token=HF_TOKEN)
tokenizer.push_to_hub(f"{HUB_REPO_ID}-LoRA", token=HF_TOKEN)
model.push_to_hub_merged(HUB_REPO_ID, tokenizer, save_method="merged_16bit", token=HF_TOKEN)
```
</details>
---
The data for this model was easily formatted and masked with [Teich](https://github.com/TeichAI/teich)
- **Developed by:** armand0e
- **License:** apache-2.0
- **Finetuned from model :** Qwen/Qwen3.5-9B
This qwen3_5 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)