Qwen3.5-9B-Agent / README.md
armand0e's picture
Update README.md
6c2396c verified
---
base_model: unsloth/Qwen3.5-9B
tags:
- text-generation-inference
- transformers
- unsloth
- qwen3_5
- agent
license: apache-2.0
language:
- en
datasets:
- armand0e/badlogicgames-pi-mono-opus-filtered
- armand0e/kimi-k2.6-claude-code-traces
- armand0e/kimi-k2.6-agent
- armand0e/minimax-m2.7-agent
- TeichAI/Claude-Opus-4.6-Reasoning-887x
---
This model was trained on the following datasets using the qwen3.6 chat template (training was done with enable_thinking and preserve_thinking set to `True`):
- `armand0e/badlogicgames-pi-mono-opus-filtered` - Pi traces from Claude Opus (mainly 4.5)
- `armand0e/kimi-k2.6-claude-code-traces` - Claude Code traces from kimi k2.6
- `armand0e/kimi-k2.6-agent` - Codex traces from kimi k2.6
- `armand0e/minimax-m2.7-agent` - Pi traces from minimax m2.7
- `TeichAI/Claude-Opus-4.6-Reasoning-887x` (Downsampled to 200 examples, only present to stabilize chat behavior)
For now this model is just a test to make sure the `teich` package was working correctly. less than 400 examples were given to the model in this training run so expect an overall performance degradation. A much more comprehensive DeepSeek Coder will be released in the coming days once training is complete
I recommend using the following sampling parameters:
- temp: 1.0
- top_k: 20 (though higher values like 40 still seem to work and be stable with tool calling and agentic tasks)
- top_p: 0.95
- min_p: 0.00
- repeat_penalty: 1.0
- presence_penalty: 1.5
Training code:
```py
MAX_SEQ_LEN = 49152
from unsloth import FastModel
import torch
model = FastModel.get_peft_model(
model,
finetune_vision_layers = False, # Turn off for just text!
finetune_language_layers = True, # Should leave on!
finetune_attention_modules = True, # Attention good for GRPO
finetune_mlp_modules = True, # Should leave on always!
r = 64, # Larger = higher accuracy, but might overfit
lora_alpha = 64, # Recommended alpha == r at least
lora_dropout = 0,
bias = "none",
random_state = 3407,
)
from teich import prepare_data
train_dataset = prepare_data(
{
"opus-agent": {
"source": "armand0e/badlogicgames-pi-mono-opus-filtered",
},
"kimi-claude": {
"source": "armand0e/kimi-k2.6-claude-code-traces",
},
"kimi-codex": {
"source": "armand0e/kimi-k2.6-agent",
},
"minimax-m2.7": {
"source": "armand0e/minimax-m2.7-agent",
},
"chat": {
"source": "TeichAI/Claude-Opus-4.6-Reasoning-887x",
"max_examples": 200,
}
},
tokenizer,
split="train",
hf_token=HF_TOKEN,
chat_template_kwargs={"enable_thinking": True, "preserve_thinking": True},
max_length=MAX_SEQ_LEN,
drop_oversized_examples=True,
trim_oversized_followups=True,
tokenize=True,
strict=True,
)
from trl import SFTConfig, SFTTrainer
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=train_dataset,
eval_dataset=None,
args=SFTConfig(
dataset_text_field="text",
dataset_num_proc=1,
max_length=MAX_SEQ_LEN,
packing=False,
per_device_train_batch_size=1,
gradient_accumulation_steps=8,
warmup_steps= 5,
num_train_epochs=2,
learning_rate=2e-4,
logging_steps=1,
save_steps=100,
save_total_limit=3,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
output_dir=OUTPUT_DIR,
seed=3407,
report_to="none",
),
)
from teich import mask_data
trainer = mask_data(
trainer,
tokenizer=tokenizer,
train_on_reasoning=True,
train_on_final_answers=True,
train_on_tools=True,
)
```
This tune was very data limited, but still impresses me. I encourage everyone to generate their own high quality data for their own use cases, they can all be aggregated together.
---
# Uploaded finetuned model
- **Developed by:** armand0e
- **License:** apache-2.0
- **Finetuned from model :** unsloth/Qwen3.5-9B
This qwen3_5 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)