armand0e commited on
Commit
852e726
·
verified ·
1 Parent(s): 4706d0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -0
README.md CHANGED
@@ -5,11 +5,114 @@ tags:
5
  - transformers
6
  - unsloth
7
  - qwen3_5
 
8
  license: apache-2.0
9
  language:
10
  - en
11
  ---
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  # Uploaded finetuned model
14
 
15
  - **Developed by:** armand0e
 
5
  - transformers
6
  - unsloth
7
  - qwen3_5
8
+ - agent
9
  license: apache-2.0
10
  language:
11
  - en
12
  ---
13
 
14
+ This model was trained on the following datasets using the qwen3.6 chat template (training was done with enable_thinking and preserve_thinking set to `True`):
15
+
16
+ - armand0e/badlogicgames-pi-mono-opus-filtered - Pi traces from Claude Opus (mainly 4.5)
17
+ - armand0e/kimi-k2.6-claude-code-traces - Claude Code traces from kimi k2.6
18
+ - armand0e/kimi-k2.6-agent - Codex traces from kimi k2.6
19
+ - armand0e/minimax-m2.7-agent - Pi traces from minimax m2.7
20
+ - TeichAI/Claude-Opus-4.6-Reasoning-887x (Downsampled to 200 examples, only present to stabilize chat behavior)
21
+
22
+ Training specs:
23
+ ```
24
+ MAX_SEQ_LEN = 49152
25
+
26
+ from unsloth import FastModel
27
+ import torch
28
+
29
+ model = FastModel.get_peft_model(
30
+ model,
31
+ finetune_vision_layers = False, # Turn off for just text!
32
+ finetune_language_layers = True, # Should leave on!
33
+ finetune_attention_modules = True, # Attention good for GRPO
34
+ finetune_mlp_modules = True, # Should leave on always!
35
+
36
+ r = 64, # Larger = higher accuracy, but might overfit
37
+ lora_alpha = 64, # Recommended alpha == r at least
38
+ lora_dropout = 0,
39
+ bias = "none",
40
+ random_state = 3407,
41
+ )
42
+
43
+ from teich import prepare_data
44
+
45
+ train_dataset = prepare_data(
46
+ {
47
+ "opus-agent": {
48
+ "source": "armand0e/badlogicgames-pi-mono-opus-filtered",
49
+ },
50
+ "kimi-claude": {
51
+ "source": "armand0e/kimi-k2.6-claude-code-traces",
52
+ },
53
+ "kimi-codex": {
54
+ "source": "armand0e/kimi-k2.6-agent",
55
+ },
56
+ "minimax-m2.7": {
57
+ "source": "armand0e/ag-datagen-v2-test",
58
+ },
59
+ "chat": {
60
+ "source": "TeichAI/Claude-Opus-4.6-Reasoning-887x",
61
+ "max_examples": 200,
62
+ }
63
+ },
64
+ tokenizer,
65
+ split="train",
66
+ hf_token=HF_TOKEN,
67
+ chat_template_kwargs={"enable_thinking": True, "preserve_thinking": True},
68
+ max_length=MAX_SEQ_LEN,
69
+ drop_oversized_examples=True,
70
+ trim_oversized_followups=True,
71
+ tokenize=True,
72
+ strict=True,
73
+ )
74
+
75
+ from trl import SFTConfig, SFTTrainer
76
+
77
+ trainer = SFTTrainer(
78
+ model=model,
79
+ tokenizer=tokenizer,
80
+ train_dataset=train_dataset,
81
+ eval_dataset=None,
82
+ args=SFTConfig(
83
+ dataset_text_field="text",
84
+ dataset_num_proc=1,
85
+ max_length=MAX_SEQ_LEN,
86
+ packing=False,
87
+ per_device_train_batch_size=1,
88
+ gradient_accumulation_steps=8,
89
+ warmup_steps= 5,
90
+ num_train_epochs=2,
91
+ learning_rate=2e-4,
92
+ logging_steps=1,
93
+ save_steps=100,
94
+ save_total_limit=3,
95
+ optim="adamw_8bit",
96
+ weight_decay=0.01,
97
+ lr_scheduler_type="linear",
98
+ output_dir=OUTPUT_DIR,
99
+ seed=3407,
100
+ report_to="none",
101
+ ),
102
+ )
103
+
104
+ from teich import mask_data
105
+
106
+ trainer = mask_data(
107
+ trainer,
108
+ tokenizer=tokenizer,
109
+ train_on_reasoning=True,
110
+ train_on_final_answers=True,
111
+ train_on_tools=True,
112
+ )
113
+ ```
114
+
115
+ ---
116
  # Uploaded finetuned model
117
 
118
  - **Developed by:** armand0e