--- base_model: unsloth/Qwen3.5-9B tags: - text-generation-inference - transformers - unsloth - qwen3_5 - agent license: apache-2.0 language: - en datasets: - armand0e/badlogicgames-pi-mono-opus-filtered - armand0e/kimi-k2.6-claude-code-traces - armand0e/kimi-k2.6-agent - armand0e/minimax-m2.7-agent - TeichAI/Claude-Opus-4.6-Reasoning-887x --- This model was trained on the following datasets using the qwen3.6 chat template (training was done with enable_thinking and preserve_thinking set to `True`): - `armand0e/badlogicgames-pi-mono-opus-filtered` - Pi traces from Claude Opus (mainly 4.5) - `armand0e/kimi-k2.6-claude-code-traces` - Claude Code traces from kimi k2.6 - `armand0e/kimi-k2.6-agent` - Codex traces from kimi k2.6 - `armand0e/minimax-m2.7-agent` - Pi traces from minimax m2.7 - `TeichAI/Claude-Opus-4.6-Reasoning-887x` (Downsampled to 200 examples, only present to stabilize chat behavior) For now this model is just a test to make sure the `teich` package was working correctly. less than 400 examples were given to the model in this training run so expect an overall performance degradation. A much more comprehensive DeepSeek Coder will be released in the coming days once training is complete I recommend using the following sampling parameters: - temp: 1.0 - top_k: 20 (though higher values like 40 still seem to work and be stable with tool calling and agentic tasks) - top_p: 0.95 - min_p: 0.00 - repeat_penalty: 1.0 - presence_penalty: 1.5 Training code: ```py MAX_SEQ_LEN = 49152 from unsloth import FastModel import torch model = FastModel.get_peft_model( model, finetune_vision_layers = False, # Turn off for just text! finetune_language_layers = True, # Should leave on! finetune_attention_modules = True, # Attention good for GRPO finetune_mlp_modules = True, # Should leave on always! r = 64, # Larger = higher accuracy, but might overfit lora_alpha = 64, # Recommended alpha == r at least lora_dropout = 0, bias = "none", random_state = 3407, ) from teich import prepare_data train_dataset = prepare_data( { "opus-agent": { "source": "armand0e/badlogicgames-pi-mono-opus-filtered", }, "kimi-claude": { "source": "armand0e/kimi-k2.6-claude-code-traces", }, "kimi-codex": { "source": "armand0e/kimi-k2.6-agent", }, "minimax-m2.7": { "source": "armand0e/minimax-m2.7-agent", }, "chat": { "source": "TeichAI/Claude-Opus-4.6-Reasoning-887x", "max_examples": 200, } }, tokenizer, split="train", hf_token=HF_TOKEN, chat_template_kwargs={"enable_thinking": True, "preserve_thinking": True}, max_length=MAX_SEQ_LEN, drop_oversized_examples=True, trim_oversized_followups=True, tokenize=True, strict=True, ) from trl import SFTConfig, SFTTrainer trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=train_dataset, eval_dataset=None, args=SFTConfig( dataset_text_field="text", dataset_num_proc=1, max_length=MAX_SEQ_LEN, packing=False, per_device_train_batch_size=1, gradient_accumulation_steps=8, warmup_steps= 5, num_train_epochs=2, learning_rate=2e-4, logging_steps=1, save_steps=100, save_total_limit=3, optim="adamw_8bit", weight_decay=0.01, lr_scheduler_type="linear", output_dir=OUTPUT_DIR, seed=3407, report_to="none", ), ) from teich import mask_data trainer = mask_data( trainer, tokenizer=tokenizer, train_on_reasoning=True, train_on_final_answers=True, train_on_tools=True, ) ``` This tune was very data limited, but still impresses me. I encourage everyone to generate their own high quality data for their own use cases, they can all be aggregated together. --- # Uploaded finetuned model - **Developed by:** armand0e - **License:** apache-2.0 - **Finetuned from model :** unsloth/Qwen3.5-9B This qwen3_5 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)