armand0e commited on
Commit
8ab4058
·
verified ·
1 Parent(s): 666a56c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -3
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- base_model: unsloth/Qwen3.5-9B
3
  tags:
4
  - text-generation-inference
5
  - transformers
@@ -25,7 +25,16 @@ This model was trained on the following datasets using the qwen3.6 chat template
25
  - `armand0e/minimax-m2.7-agent` - Pi traces from minimax m2.7
26
  - `TeichAI/Claude-Opus-4.6-Reasoning-887x` (Downsampled to 200 examples, only present to stabilize chat behavior)
27
 
28
- Training specs:
 
 
 
 
 
 
 
 
 
29
 
30
  ```py
31
  MAX_SEQ_LEN = 49152
@@ -61,7 +70,7 @@ train_dataset = prepare_data(
61
  "source": "armand0e/kimi-k2.6-agent",
62
  },
63
  "minimax-m2.7": {
64
- "source": "armand0e/ag-datagen-v2-test",
65
  },
66
  "chat": {
67
  "source": "TeichAI/Claude-Opus-4.6-Reasoning-887x",
 
1
  ---
2
+ base_model: armand0e/Qwen3.5-9B-Agent
3
  tags:
4
  - text-generation-inference
5
  - transformers
 
25
  - `armand0e/minimax-m2.7-agent` - Pi traces from minimax m2.7
26
  - `TeichAI/Claude-Opus-4.6-Reasoning-887x` (Downsampled to 200 examples, only present to stabilize chat behavior)
27
 
28
+ I recommend using the following sampling parameters:
29
+
30
+ - temp: 1.0
31
+ - top_k: 20 (though higher values like 40 still seem to work and be stable with tool calling and agentic tasks)
32
+ - top_p: 0.95
33
+ - min_p: 0.00
34
+ - repeat_penalty: 1.0
35
+ - presence_penalty: 1.5
36
+
37
+ Training code:
38
 
39
  ```py
40
  MAX_SEQ_LEN = 49152
 
70
  "source": "armand0e/kimi-k2.6-agent",
71
  },
72
  "minimax-m2.7": {
73
+ "source": "armand0e/minimax-m2.7-agent",
74
  },
75
  "chat": {
76
  "source": "TeichAI/Claude-Opus-4.6-Reasoning-887x",