---
base_model: unsloth/Qwen3.5-9B
tags:
- text-generation-inference
- transformers
- unsloth
- qwen3_5
- agent
license: apache-2.0
language:
- en
datasets:
- armand0e/badlogicgames-pi-mono-opus-filtered
- armand0e/kimi-k2.6-claude-code-traces
- armand0e/kimi-k2.6-agent
- armand0e/minimax-m2.7-agent
- TeichAI/Claude-Opus-4.6-Reasoning-887x
---

This model was trained on the following datasets using the qwen3.6 chat template (training was done with enable_thinking and preserve_thinking set to `True`):

  - `armand0e/badlogicgames-pi-mono-opus-filtered` - Pi traces from Claude Opus (mainly 4.5)
  - `armand0e/kimi-k2.6-claude-code-traces` - Claude Code traces from kimi k2.6
  - `armand0e/kimi-k2.6-agent` - Codex traces from kimi k2.6
  - `armand0e/minimax-m2.7-agent` - Pi traces from minimax m2.7
  - `TeichAI/Claude-Opus-4.6-Reasoning-887x` (Downsampled to 200 examples, only present to stabilize chat behavior)

For now this model is just a test to make sure the `teich` package was working correctly. less than 400 examples were given to the model in this training run so expect an overall performance degradation. A much more comprehensive DeepSeek Coder will be released in the coming days once training is complete

I recommend using the following sampling parameters:

- temp: 1.0
- top_k: 20 (though higher values like 40 still seem to work and be stable with tool calling and agentic tasks)
- top_p: 0.95
- min_p: 0.00
- repeat_penalty: 1.0
- presence_penalty: 1.5

Training code:

```py
MAX_SEQ_LEN = 49152

from unsloth import FastModel
import torch

model = FastModel.get_peft_model(
    model,
    finetune_vision_layers     = False, # Turn off for just text!
    finetune_language_layers   = True,  # Should leave on!
    finetune_attention_modules = True,  # Attention good for GRPO
    finetune_mlp_modules       = True,  # Should leave on always!

    r = 64,           # Larger = higher accuracy, but might overfit
    lora_alpha = 64,  # Recommended alpha == r at least
    lora_dropout = 0,
    bias = "none",
    random_state = 3407,
)

from teich import prepare_data

train_dataset = prepare_data(
    {
        "opus-agent": {
            "source": "armand0e/badlogicgames-pi-mono-opus-filtered",
        },
        "kimi-claude": {
            "source": "armand0e/kimi-k2.6-claude-code-traces",
        },
        "kimi-codex": {
            "source": "armand0e/kimi-k2.6-agent",
        },
        "minimax-m2.7": {
            "source": "armand0e/minimax-m2.7-agent",
        },
        "chat": {
            "source": "TeichAI/Claude-Opus-4.6-Reasoning-887x",
            "max_examples": 200,
        }
    },
    tokenizer,
    split="train",
    hf_token=HF_TOKEN,
    chat_template_kwargs={"enable_thinking": True, "preserve_thinking": True},
    max_length=MAX_SEQ_LEN,
    drop_oversized_examples=True,
    trim_oversized_followups=True,
    tokenize=True,
    strict=True,
)

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=None,
    args=SFTConfig(
        dataset_text_field="text",
        dataset_num_proc=1,
        max_length=MAX_SEQ_LEN,
        packing=False,
        per_device_train_batch_size=1,
        gradient_accumulation_steps=8,
        warmup_steps= 5,
        num_train_epochs=2,
        learning_rate=2e-4,
        logging_steps=1,
        save_steps=100,
        save_total_limit=3,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        output_dir=OUTPUT_DIR,
        seed=3407,
        report_to="none",
    ),
)

from teich import mask_data

trainer = mask_data(
    trainer,
    tokenizer=tokenizer,
    train_on_reasoning=True,
    train_on_final_answers=True,
    train_on_tools=True,
)
```

This tune was very data limited, but still impresses me. I encourage everyone to generate their own high quality data for their own use cases, they can all be aggregated together.

---
# Uploaded finetuned  model

- **Developed by:** armand0e
- **License:** apache-2.0
- **Finetuned from model :** unsloth/Qwen3.5-9B

This qwen3_5 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)