armand0e commited on
Commit
aad730d
·
verified ·
1 Parent(s): 852e726

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -6
README.md CHANGED
@@ -9,15 +9,21 @@ tags:
9
  license: apache-2.0
10
  language:
11
  - en
 
 
 
 
 
 
12
  ---
13
 
14
  This model was trained on the following datasets using the qwen3.6 chat template (training was done with enable_thinking and preserve_thinking set to `True`):
15
 
16
- - armand0e/badlogicgames-pi-mono-opus-filtered - Pi traces from Claude Opus (mainly 4.5)
17
- - armand0e/kimi-k2.6-claude-code-traces - Claude Code traces from kimi k2.6
18
- - armand0e/kimi-k2.6-agent - Codex traces from kimi k2.6
19
- - armand0e/minimax-m2.7-agent - Pi traces from minimax m2.7
20
- - TeichAI/Claude-Opus-4.6-Reasoning-887x (Downsampled to 200 examples, only present to stabilize chat behavior)
21
 
22
  Training specs:
23
  ```
@@ -112,6 +118,8 @@ trainer = mask_data(
112
  )
113
  ```
114
 
 
 
115
  ---
116
  # Uploaded finetuned model
117
 
@@ -121,4 +129,4 @@ trainer = mask_data(
121
 
122
  This qwen3_5 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
123
 
124
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
9
  license: apache-2.0
10
  language:
11
  - en
12
+ datasets:
13
+ - armand0e/badlogicgames-pi-mono-opus-filtered
14
+ - armand0e/kimi-k2.6-claude-code-traces
15
+ - armand0e/kimi-k2.6-agent
16
+ - armand0e/minimax-m2.7-agent
17
+ - TeichAI/Claude-Opus-4.6-Reasoning-887x
18
  ---
19
 
20
  This model was trained on the following datasets using the qwen3.6 chat template (training was done with enable_thinking and preserve_thinking set to `True`):
21
 
22
+ - `armand0e/badlogicgames-pi-mono-opus-filtered` - Pi traces from Claude Opus (mainly 4.5)
23
+ - `armand0e/kimi-k2.6-claude-code-traces` - Claude Code traces from kimi k2.6
24
+ - `armand0e/kimi-k2.6-agent` - Codex traces from kimi k2.6
25
+ - `armand0e/minimax-m2.7-agent` - Pi traces from minimax m2.7
26
+ - `TeichAI/Claude-Opus-4.6-Reasoning-887x` (Downsampled to 200 examples, only present to stabilize chat behavior)
27
 
28
  Training specs:
29
  ```
 
118
  )
119
  ```
120
 
121
+ This tune was very data limited, but still impresses me. I encourage everyone to generate their own high quality data for their own use cases, they can all be aggregated together.
122
+
123
  ---
124
  # Uploaded finetuned model
125
 
 
129
 
130
  This qwen3_5 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
131
 
132
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)