Text Generation
Transformers
Safetensors
abstract-cot
latent-reasoning
math-reasoning
qwen3
leapeto commited on
Commit
a555798
·
verified ·
1 Parent(s): 4e32eaf

Add files using upload-large-folder tool

Browse files
Files changed (50) hide show
  1. README.md +163 -0
  2. adapters/pi1_phaseA/train_log.json +75 -0
  3. adapters/pi1_phaseB/README.md +207 -0
  4. adapters/pi1_phaseB/adapter_config.json +51 -0
  5. adapters/pi1_phaseB/chat_template.jinja +89 -0
  6. adapters/pi1_phaseB/tokenizer_config.json +30 -0
  7. adapters/pi1_phaseB/train_log.json +75 -0
  8. adapters/pi2_phaseA/README.md +207 -0
  9. adapters/pi2_phaseA/adapter_config.json +51 -0
  10. adapters/pi2_phaseA/chat_template.jinja +89 -0
  11. adapters/pi2_phaseA/tokenizer_config.json +30 -0
  12. adapters/pi2_phaseA/train_log.json +75 -0
  13. adapters/pi2_phaseB/README.md +207 -0
  14. adapters/pi2_phaseB/adapter_config.json +51 -0
  15. adapters/pi2_phaseB/chat_template.jinja +89 -0
  16. adapters/pi2_phaseB/tokenizer_config.json +30 -0
  17. adapters/pi2_phaseB/train_log.json +75 -0
  18. adapters/pi3_phaseA/README.md +207 -0
  19. adapters/pi3_phaseA/adapter_config.json +51 -0
  20. adapters/pi3_phaseA/chat_template.jinja +89 -0
  21. adapters/pi3_phaseA/tokenizer_config.json +30 -0
  22. adapters/pi3_phaseA/train_log.json +75 -0
  23. adapters/pi3_phaseB/README.md +207 -0
  24. adapters/pi3_phaseB/adapter_config.json +51 -0
  25. adapters/pi3_phaseB/chat_template.jinja +89 -0
  26. adapters/pi3_phaseB/tokenizer_config.json +30 -0
  27. adapters/pi3_phaseB/train_log.json +75 -0
  28. docs/20260511.md +198 -0
  29. final/chat_template.jinja +89 -0
  30. final/config.json +71 -0
  31. final/generation_config.json +13 -0
  32. final/tokenizer_config.json +30 -0
  33. results/abstract_math500_T3_N5000.jsonl +0 -0
  34. round1/chat_template.jinja +89 -0
  35. round1/config.json +71 -0
  36. round1/generation_config.json +13 -0
  37. round1/tokenizer_config.json +30 -0
  38. round2/chat_template.jinja +89 -0
  39. round2/config.json +71 -0
  40. round2/generation_config.json +13 -0
  41. round2/tokenizer_config.json +30 -0
  42. teacher_traces/pi1_phaseB_teacher_traces.jsonl +0 -0
  43. teacher_traces/pi2_phaseA_teacher_traces.jsonl +0 -0
  44. teacher_traces/pi2_phaseB_teacher_traces.jsonl +0 -0
  45. teacher_traces/pi3_phaseA_teacher_traces.jsonl +0 -0
  46. teacher_traces/pi3_phaseB_teacher_traces.jsonl +0 -0
  47. train_logs/pi1_phaseA.json +75 -0
  48. train_logs/pi1_phaseB.json +75 -0
  49. train_logs/pi2_phaseA.json +75 -0
  50. train_logs/pi2_phaseB.json +75 -0
README.md ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ base_model: Qwen/Qwen3-4B
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - abstract-cot
8
+ - latent-reasoning
9
+ - math-reasoning
10
+ - qwen3
11
+ datasets:
12
+ - HuggingFaceH4/MATH-500
13
+ - allenai/Dolci-Think-SFT-7B
14
+ ---
15
+
16
+ # Qwen3-4B-AbstractCoT-warmup
17
+
18
+ Qwen3-4B fine-tuned with the **Abstract Chain-of-Thought (Abstract-CoT)** warm-up procedure from "[Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought](https://arxiv.org/abs/2604.22709v2)" (Ramji, Naseem, Fernandez Astudillo, IBM Research AI, 2026). The model is taught to compress its reasoning into a short sequence (~16–22 tokens) drawn from a reserved 64-symbol *abstract vocabulary* `V_abs = {<TOKEN_A>, …, <TOKEN_BL>}`, used as a discrete latent scratchpad before emitting the answer.
19
+
20
+ ```
21
+ prompt ─► <beginabstract> z_1 ... z_m <endabstract> answer
22
+ └─────── z̃ ∈ V_abs^m, m ≤ 128 ───────┘
23
+ ```
24
+
25
+ This is the SFT half of the paper only — no RL stage. The comparison row is the paper's "Abstract-CoT (Warm-up)" line in Table 1.
26
+
27
+ ## Headline result
28
+
29
+ | | MATH-500 acc | Mean tokens |
30
+ |---|---|---|
31
+ | Paper Baseline (Qwen3-4B verbal CoT) | 83.2 | 1087 |
32
+ | **Our Baseline** (Qwen3-4B verbal CoT, this hardware) | 84.60 | 1045 |
33
+ | Paper Abstract-CoT Warm-up | 86.2 | 168 |
34
+ | **This model** (T=3 PI, N=5k, 1 epoch, LoRA, seq 8k) | **72.00** | **432** |
35
+
36
+ The accuracy gap to the paper's 86.2 is driven by reduced data scale (5k vs 600k), LoRA vs full fine-tuning, and 1 vs 3 epochs per phase. See `docs/20260511_reader.md` for a full discussion.
37
+
38
+ ## Repository layout
39
+
40
+ ```
41
+ final/ ← end-of-round-3 merged model (THE warm-up checkpoint)
42
+ round2/ ← end-of-round-2 merged model
43
+ round1/ ← end-of-round-1 merged model
44
+ adapters/ ← all 6 LoRA adapters (pi{1,2,3}_phase{A,B})
45
+ results/ ← per-example eval JSONL (baseline + abstract)
46
+ teacher_traces/ ← on-policy V_abs traces used as Phase B/A teachers
47
+ train_logs/ ← per-phase loss + LR curves (verifies cosine fix)
48
+ docs/ ← run reports (technical + reader-oriented)
49
+ ```
50
+
51
+ ## How it was trained
52
+
53
+ Three policy-iteration rounds, each with two phases:
54
+
55
+ - **Phase A — Bottleneck SFT.** Train on `[prompt; verbal-CoT; z̃; answer]` with the answer blocked from attending to the verbal CoT, forcing all CoT→answer signal through `z̃`.
56
+ - Round 1: `z̃` is random V_abs tokens.
57
+ - Rounds 2+: `z̃` is sampled on-policy from the previous round's model.
58
+ - **Phase B — Self-distillation.** Train on `[prompt; z̃; answer]` with standard causal attention, where `z̃` is now generated from the prompt alone.
59
+
60
+ Training config:
61
+ - Base: `Qwen/Qwen3-4B`, extended with V_abs (M=64) + `<beginabstract>` + `<endabstract>` (151 669 → 151 735 tokens).
62
+ - LoRA r=32, α=64 on attention + MLP projections. Embedding table + LM head trained fully (so the new abstract-vocab rows can move freely). 842.9 M / 4.86 B trainable (17.3%).
63
+ - Data: 5 000 examples from `allenai/Dolci-Think-SFT-7B`, filtered to assistant messages with `<think>` blocks ≥ 200 chars.
64
+ - max_len 8192, batch 32, lr 1e-4, cosine schedule, 5% warmup.
65
+ - 2× A100-SXM4-80GB, ~11 hours wall.
66
+
67
+ ## Using the model
68
+
69
+ ### Inference (vLLM, recommended)
70
+
71
+ ```python
72
+ from vllm import LLM, SamplingParams
73
+ from transformers import AutoTokenizer
74
+ from huggingface_hub import snapshot_download
75
+
76
+ # Download the final checkpoint
77
+ model_path = snapshot_download(
78
+ "leapeto/Qwen3-4B-AbstractCoT-warmup",
79
+ allow_patterns=["final/*"],
80
+ )
81
+
82
+ tok = AutoTokenizer.from_pretrained(f"{model_path}/final", trust_remote_code=True)
83
+
84
+ # Abstract token ids
85
+ abs_tokens = []
86
+ for i in range(64):
87
+ if i < 26:
88
+ abs_tokens.append(f"<TOKEN_{chr(ord('A')+i)}>")
89
+ else:
90
+ j = i - 26
91
+ abs_tokens.append(f"<TOKEN_{chr(ord('A')+j//26)}{chr(ord('A')+j%26)}>")
92
+ end_id = tok.convert_tokens_to_ids("<endabstract>")
93
+ abs_ids = tok.convert_tokens_to_ids(abs_tokens)
94
+ allowed = list(set(abs_ids + [end_id]))
95
+
96
+ llm = LLM(model=f"{model_path}/final", tensor_parallel_size=2,
97
+ dtype="bfloat16", trust_remote_code=True)
98
+
99
+ # Two-stage decode: (1) constrained abstract trace, (2) unconstrained answer
100
+ prompt = "What is the integral of x^2 from 0 to 1? Put your final answer in \\boxed{}."
101
+ messages = [
102
+ {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
103
+ {"role": "user", "content": prompt},
104
+ ]
105
+ prefix = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
106
+ prefix += "<beginabstract>"
107
+
108
+ # Stage 1: V_abs only, stop at <endabstract>
109
+ sp1 = SamplingParams(temperature=0.7, max_tokens=128,
110
+ allowed_token_ids=allowed, stop_token_ids=[end_id],
111
+ skip_special_tokens=False)
112
+ abstract = llm.generate([prefix], sp1)[0].outputs[0].text
113
+ prompt2 = prefix + abstract + "<endabstract>\n"
114
+
115
+ # Stage 2: unconstrained answer
116
+ sp2 = SamplingParams(temperature=0.0, max_tokens=2048)
117
+ answer = llm.generate([prompt2], sp2)[0].outputs[0].text
118
+ print(answer)
119
+ ```
120
+
121
+ ### Loading the LoRA adapters (peft)
122
+
123
+ If you want to inspect individual round outputs without downloading the merged models:
124
+
125
+ ```python
126
+ from peft import PeftModel
127
+ from transformers import AutoModelForCausalLM
128
+ from huggingface_hub import snapshot_download
129
+
130
+ # You'll need the extended base model first — produce it locally via scripts/01_extend_model.sh
131
+ # OR start from one of our merged checkpoints and load a later adapter on top.
132
+
133
+ base = AutoModelForCausalLM.from_pretrained("path/to/extended/base", trust_remote_code=True)
134
+ adapter_path = snapshot_download(
135
+ "leapeto/Qwen3-4B-AbstractCoT-warmup",
136
+ allow_patterns=["adapters/pi3_phaseB/*"],
137
+ )
138
+ model = PeftModel.from_pretrained(base, f"{adapter_path}/adapters/pi3_phaseB")
139
+ ```
140
+
141
+ ## Files of interest
142
+
143
+ | File | What |
144
+ |---|---|
145
+ | `final/` | End-of-round-3 merged model. **This is the main artifact.** |
146
+ | `round1/`, `round2/` | Intermediate merged models for studying T=1 → T=2 → T=3 progression |
147
+ | `adapters/pi{1,2,3}_phase{A,B}/` | LoRA-only checkpoints from each phase |
148
+ | `results/baseline_math500.jsonl` | Qwen3-4B verbal-CoT eval (84.60% / 1045 tok) |
149
+ | `results/abstract_math500_T3_N5000.jsonl` | This model's eval (72.00% / 432 tok) |
150
+ | `train_logs/*.json` | Per-step loss + LR curves for each phase |
151
+ | `docs/20260511.md` | Technical report (full breakdown) |
152
+ | `docs/20260511_reader.md` | Reader-oriented report (concepts + reasoning) |
153
+
154
+ ## Citation
155
+
156
+ ```bibtex
157
+ @article{ramji2026thinking,
158
+ title={Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought},
159
+ author={Ramji, Keshav and Naseem, Tahira and Fernandez Astudillo, Ramón},
160
+ journal={arXiv preprint arXiv:2604.22709},
161
+ year={2026}
162
+ }
163
+ ```
adapters/pi1_phaseA/train_log.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "losses": [
3
+ 3.488373909890652,
4
+ 2.407912985235453,
5
+ 1.7763818703591823,
6
+ 1.4919575069099664,
7
+ 0.9603039052337408,
8
+ 0.9057566102594137,
9
+ 0.8390422958880663,
10
+ 0.9064983982592821,
11
+ 0.7787921145558357,
12
+ 0.78125288952142,
13
+ 0.8250404935330152,
14
+ 0.8201050635427236,
15
+ 0.8527244713157416,
16
+ 0.7438110310584307,
17
+ 0.9024393826723098,
18
+ 0.8181035034358501,
19
+ 0.8188389737159014,
20
+ 0.8235138654708862,
21
+ 0.8209620092064143,
22
+ 0.8652047950774431,
23
+ 0.7606889367103576,
24
+ 0.884891077503562,
25
+ 0.7458537224680185,
26
+ 0.9200425513088704,
27
+ 0.8810815073549747,
28
+ 0.8085172940045595,
29
+ 0.8518438508734107,
30
+ 0.813002785295248,
31
+ 0.8703699089586735,
32
+ 0.8568720713257789,
33
+ 0.8453036542981863
34
+ ],
35
+ "lrs": [
36
+ 6.666666666666667e-05,
37
+ 9.993008576227247e-05,
38
+ 9.937194443381972e-05,
39
+ 9.826190093588563e-05,
40
+ 9.661236384224129e-05,
41
+ 9.444177243274618e-05,
42
+ 9.177439057064683e-05,
43
+ 8.864003547001915e-05,
44
+ 8.507374438531607e-05,
45
+ 8.111538294891684e-05,
46
+ 7.680919953486048e-05,
47
+ 7.220333063028872e-05,
48
+ 6.734926274378312e-05,
49
+ 6.230125686563068e-05,
50
+ 5.7115741913664264e-05,
51
+ 5.185068394501791e-05,
52
+ 4.6564938185035956e-05,
53
+ 4.131759111665349e-05,
54
+ 3.616729998467365e-05,
55
+ 3.1171637098265064e-05,
56
+ 2.638644626136587e-05,
57
+ 2.1865218525109495e-05,
58
+ 1.7658494240397126e-05,
59
+ 1.3813298094746491e-05,
60
+ 1.037261344883343e-05,
61
+ 7.374901848832683e-06,
62
+ 4.853673085668947e-06,
63
+ 2.8371106072518195e-06,
64
+ 1.3477564710088098e-06,
65
+ 4.02259358460233e-07,
66
+ 1.1188468644907079e-08
67
+ ],
68
+ "wallclock_s": 10715,
69
+ "n_examples": 5000,
70
+ "epochs": 1,
71
+ "mode": "bottleneck",
72
+ "lora_rank": 32,
73
+ "total_opt_steps": 156,
74
+ "num_processes": 2
75
+ }
adapters/pi1_phaseB/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: /workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi1_phaseA_merged
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:/workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi1_phaseA_merged
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.19.1
adapters/pi1_phaseB/adapter_config.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "/workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi1_phaseA_merged",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 64,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.0,
22
+ "lora_ga_config": null,
23
+ "megatron_config": null,
24
+ "megatron_core": "megatron.core",
25
+ "modules_to_save": [
26
+ "embed_tokens",
27
+ "lm_head"
28
+ ],
29
+ "peft_type": "LORA",
30
+ "peft_version": "0.19.1",
31
+ "qalora_group_size": 16,
32
+ "r": 32,
33
+ "rank_pattern": {},
34
+ "revision": null,
35
+ "target_modules": [
36
+ "k_proj",
37
+ "down_proj",
38
+ "q_proj",
39
+ "gate_proj",
40
+ "o_proj",
41
+ "v_proj",
42
+ "up_proj"
43
+ ],
44
+ "target_parameters": null,
45
+ "task_type": "CAUSAL_LM",
46
+ "trainable_token_indices": null,
47
+ "use_bdlora": null,
48
+ "use_dora": false,
49
+ "use_qalora": false,
50
+ "use_rslora": false
51
+ }
adapters/pi1_phaseB/chat_template.jinja ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if message.content is string %}
27
+ {%- set content = message.content %}
28
+ {%- else %}
29
+ {%- set content = '' %}
30
+ {%- endif %}
31
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
32
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
33
+ {%- elif message.role == "assistant" %}
34
+ {%- set reasoning_content = '' %}
35
+ {%- if message.reasoning_content is string %}
36
+ {%- set reasoning_content = message.reasoning_content %}
37
+ {%- else %}
38
+ {%- if '</think>' in content %}
39
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
40
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
41
+ {%- endif %}
42
+ {%- endif %}
43
+ {%- if loop.index0 > ns.last_query_index %}
44
+ {%- if loop.last or (not loop.last and reasoning_content) %}
45
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
46
+ {%- else %}
47
+ {{- '<|im_start|>' + message.role + '\n' + content }}
48
+ {%- endif %}
49
+ {%- else %}
50
+ {{- '<|im_start|>' + message.role + '\n' + content }}
51
+ {%- endif %}
52
+ {%- if message.tool_calls %}
53
+ {%- for tool_call in message.tool_calls %}
54
+ {%- if (loop.first and content) or (not loop.first) %}
55
+ {{- '\n' }}
56
+ {%- endif %}
57
+ {%- if tool_call.function %}
58
+ {%- set tool_call = tool_call.function %}
59
+ {%- endif %}
60
+ {{- '<tool_call>\n{"name": "' }}
61
+ {{- tool_call.name }}
62
+ {{- '", "arguments": ' }}
63
+ {%- if tool_call.arguments is string %}
64
+ {{- tool_call.arguments }}
65
+ {%- else %}
66
+ {{- tool_call.arguments | tojson }}
67
+ {%- endif %}
68
+ {{- '}\n</tool_call>' }}
69
+ {%- endfor %}
70
+ {%- endif %}
71
+ {{- '<|im_end|>\n' }}
72
+ {%- elif message.role == "tool" %}
73
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
74
+ {{- '<|im_start|>user' }}
75
+ {%- endif %}
76
+ {{- '\n<tool_response>\n' }}
77
+ {{- content }}
78
+ {{- '\n</tool_response>' }}
79
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
80
+ {{- '<|im_end|>\n' }}
81
+ {%- endif %}
82
+ {%- endif %}
83
+ {%- endfor %}
84
+ {%- if add_generation_prompt %}
85
+ {{- '<|im_start|>assistant\n' }}
86
+ {%- if enable_thinking is defined and enable_thinking is false %}
87
+ {{- '<think>\n\n</think>\n\n' }}
88
+ {%- endif %}
89
+ {%- endif %}
adapters/pi1_phaseB/tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "<|im_start|>",
10
+ "<|im_end|>",
11
+ "<|object_ref_start|>",
12
+ "<|object_ref_end|>",
13
+ "<|box_start|>",
14
+ "<|box_end|>",
15
+ "<|quad_start|>",
16
+ "<|quad_end|>",
17
+ "<|vision_start|>",
18
+ "<|vision_end|>",
19
+ "<|vision_pad|>",
20
+ "<|image_pad|>",
21
+ "<|video_pad|>"
22
+ ],
23
+ "is_local": true,
24
+ "local_files_only": false,
25
+ "model_max_length": 131072,
26
+ "pad_token": "<|endoftext|>",
27
+ "split_special_tokens": false,
28
+ "tokenizer_class": "Qwen2Tokenizer",
29
+ "unk_token": null
30
+ }
adapters/pi1_phaseB/train_log.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "losses": [
3
+ 0.5806382041424513,
4
+ 0.5243994968011976,
5
+ 0.49262564033269884,
6
+ 0.4333665704354644,
7
+ 0.373029216285795,
8
+ 0.3593967686872929,
9
+ 0.39847223716787994,
10
+ 0.37207798319868746,
11
+ 0.3787895118817687,
12
+ 0.3720124014187604,
13
+ 0.3636292540933937,
14
+ 0.3601828854967607,
15
+ 0.3554463139735162,
16
+ 0.3865779357030988,
17
+ 0.32862058384343984,
18
+ 0.36783338263630866,
19
+ 0.3428428391227499,
20
+ 0.34551644229795786,
21
+ 0.3680351444054395,
22
+ 0.3469195322133601,
23
+ 0.3622684331610799,
24
+ 0.37623543343506755,
25
+ 0.36850376506336036,
26
+ 0.345283712586388,
27
+ 0.3425974382087588,
28
+ 0.4011214487836696,
29
+ 0.3654101203195751,
30
+ 0.3157559605082497,
31
+ 0.36133123533800243,
32
+ 0.35812310164328665,
33
+ 0.34157210728153586
34
+ ],
35
+ "lrs": [
36
+ 6.666666666666667e-05,
37
+ 9.993008576227247e-05,
38
+ 9.937194443381972e-05,
39
+ 9.826190093588563e-05,
40
+ 9.661236384224129e-05,
41
+ 9.444177243274618e-05,
42
+ 9.177439057064683e-05,
43
+ 8.864003547001915e-05,
44
+ 8.507374438531607e-05,
45
+ 8.111538294891684e-05,
46
+ 7.680919953486048e-05,
47
+ 7.220333063028872e-05,
48
+ 6.734926274378312e-05,
49
+ 6.230125686563068e-05,
50
+ 5.7115741913664264e-05,
51
+ 5.185068394501791e-05,
52
+ 4.6564938185035956e-05,
53
+ 4.131759111665349e-05,
54
+ 3.616729998467365e-05,
55
+ 3.1171637098265064e-05,
56
+ 2.638644626136587e-05,
57
+ 2.1865218525109495e-05,
58
+ 1.7658494240397126e-05,
59
+ 1.3813298094746491e-05,
60
+ 1.037261344883343e-05,
61
+ 7.374901848832683e-06,
62
+ 4.853673085668947e-06,
63
+ 2.8371106072518195e-06,
64
+ 1.3477564710088098e-06,
65
+ 4.02259358460233e-07,
66
+ 1.1188468644907079e-08
67
+ ],
68
+ "wallclock_s": 942,
69
+ "n_examples": 5000,
70
+ "epochs": 1,
71
+ "mode": "distill",
72
+ "lora_rank": 32,
73
+ "total_opt_steps": 156,
74
+ "num_processes": 2
75
+ }
adapters/pi2_phaseA/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: /workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi1_phaseB_merged
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:/workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi1_phaseB_merged
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.19.1
adapters/pi2_phaseA/adapter_config.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "/workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi1_phaseB_merged",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 64,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.0,
22
+ "lora_ga_config": null,
23
+ "megatron_config": null,
24
+ "megatron_core": "megatron.core",
25
+ "modules_to_save": [
26
+ "embed_tokens",
27
+ "lm_head"
28
+ ],
29
+ "peft_type": "LORA",
30
+ "peft_version": "0.19.1",
31
+ "qalora_group_size": 16,
32
+ "r": 32,
33
+ "rank_pattern": {},
34
+ "revision": null,
35
+ "target_modules": [
36
+ "up_proj",
37
+ "o_proj",
38
+ "down_proj",
39
+ "gate_proj",
40
+ "q_proj",
41
+ "v_proj",
42
+ "k_proj"
43
+ ],
44
+ "target_parameters": null,
45
+ "task_type": "CAUSAL_LM",
46
+ "trainable_token_indices": null,
47
+ "use_bdlora": null,
48
+ "use_dora": false,
49
+ "use_qalora": false,
50
+ "use_rslora": false
51
+ }
adapters/pi2_phaseA/chat_template.jinja ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if message.content is string %}
27
+ {%- set content = message.content %}
28
+ {%- else %}
29
+ {%- set content = '' %}
30
+ {%- endif %}
31
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
32
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
33
+ {%- elif message.role == "assistant" %}
34
+ {%- set reasoning_content = '' %}
35
+ {%- if message.reasoning_content is string %}
36
+ {%- set reasoning_content = message.reasoning_content %}
37
+ {%- else %}
38
+ {%- if '</think>' in content %}
39
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
40
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
41
+ {%- endif %}
42
+ {%- endif %}
43
+ {%- if loop.index0 > ns.last_query_index %}
44
+ {%- if loop.last or (not loop.last and reasoning_content) %}
45
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
46
+ {%- else %}
47
+ {{- '<|im_start|>' + message.role + '\n' + content }}
48
+ {%- endif %}
49
+ {%- else %}
50
+ {{- '<|im_start|>' + message.role + '\n' + content }}
51
+ {%- endif %}
52
+ {%- if message.tool_calls %}
53
+ {%- for tool_call in message.tool_calls %}
54
+ {%- if (loop.first and content) or (not loop.first) %}
55
+ {{- '\n' }}
56
+ {%- endif %}
57
+ {%- if tool_call.function %}
58
+ {%- set tool_call = tool_call.function %}
59
+ {%- endif %}
60
+ {{- '<tool_call>\n{"name": "' }}
61
+ {{- tool_call.name }}
62
+ {{- '", "arguments": ' }}
63
+ {%- if tool_call.arguments is string %}
64
+ {{- tool_call.arguments }}
65
+ {%- else %}
66
+ {{- tool_call.arguments | tojson }}
67
+ {%- endif %}
68
+ {{- '}\n</tool_call>' }}
69
+ {%- endfor %}
70
+ {%- endif %}
71
+ {{- '<|im_end|>\n' }}
72
+ {%- elif message.role == "tool" %}
73
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
74
+ {{- '<|im_start|>user' }}
75
+ {%- endif %}
76
+ {{- '\n<tool_response>\n' }}
77
+ {{- content }}
78
+ {{- '\n</tool_response>' }}
79
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
80
+ {{- '<|im_end|>\n' }}
81
+ {%- endif %}
82
+ {%- endif %}
83
+ {%- endfor %}
84
+ {%- if add_generation_prompt %}
85
+ {{- '<|im_start|>assistant\n' }}
86
+ {%- if enable_thinking is defined and enable_thinking is false %}
87
+ {{- '<think>\n\n</think>\n\n' }}
88
+ {%- endif %}
89
+ {%- endif %}
adapters/pi2_phaseA/tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "<|im_start|>",
10
+ "<|im_end|>",
11
+ "<|object_ref_start|>",
12
+ "<|object_ref_end|>",
13
+ "<|box_start|>",
14
+ "<|box_end|>",
15
+ "<|quad_start|>",
16
+ "<|quad_end|>",
17
+ "<|vision_start|>",
18
+ "<|vision_end|>",
19
+ "<|vision_pad|>",
20
+ "<|image_pad|>",
21
+ "<|video_pad|>"
22
+ ],
23
+ "is_local": true,
24
+ "local_files_only": false,
25
+ "model_max_length": 131072,
26
+ "pad_token": "<|endoftext|>",
27
+ "split_special_tokens": false,
28
+ "tokenizer_class": "Qwen2Tokenizer",
29
+ "unk_token": null
30
+ }
adapters/pi2_phaseA/train_log.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "losses": [
3
+ 0.35318275502650065,
4
+ 0.30725667767983394,
5
+ 0.34089534296654167,
6
+ 0.3377619634848088,
7
+ 0.3068910479894839,
8
+ 0.3112259633140638,
9
+ 0.35649422819260507,
10
+ 0.3305382497259416,
11
+ 0.3459838313516229,
12
+ 0.3343647971516475,
13
+ 0.33608090435154736,
14
+ 0.3339404923695838,
15
+ 0.32868876951106357,
16
+ 0.364680266357027,
17
+ 0.31102049429900946,
18
+ 0.3481577365193516,
19
+ 0.3247197750257328,
20
+ 0.33266925923526286,
21
+ 0.35477414953056724,
22
+ 0.33790302132838407,
23
+ 0.35382214419078084,
24
+ 0.36831805281108243,
25
+ 0.3632078022346832,
26
+ 0.34023994151502845,
27
+ 0.3407576463650912,
28
+ 0.40222772396809886,
29
+ 0.3644849105272442,
30
+ 0.31388317197561266,
31
+ 0.36352235367521646,
32
+ 0.35906730571296064,
33
+ 0.34190756385796706
34
+ ],
35
+ "lrs": [
36
+ 6.666666666666667e-05,
37
+ 9.993008576227247e-05,
38
+ 9.937194443381972e-05,
39
+ 9.826190093588563e-05,
40
+ 9.661236384224129e-05,
41
+ 9.444177243274618e-05,
42
+ 9.177439057064683e-05,
43
+ 8.864003547001915e-05,
44
+ 8.507374438531607e-05,
45
+ 8.111538294891684e-05,
46
+ 7.680919953486048e-05,
47
+ 7.220333063028872e-05,
48
+ 6.734926274378312e-05,
49
+ 6.230125686563068e-05,
50
+ 5.7115741913664264e-05,
51
+ 5.185068394501791e-05,
52
+ 4.6564938185035956e-05,
53
+ 4.131759111665349e-05,
54
+ 3.616729998467365e-05,
55
+ 3.1171637098265064e-05,
56
+ 2.638644626136587e-05,
57
+ 2.1865218525109495e-05,
58
+ 1.7658494240397126e-05,
59
+ 1.3813298094746491e-05,
60
+ 1.037261344883343e-05,
61
+ 7.374901848832683e-06,
62
+ 4.853673085668947e-06,
63
+ 2.8371106072518195e-06,
64
+ 1.3477564710088098e-06,
65
+ 4.02259358460233e-07,
66
+ 1.1188468644907079e-08
67
+ ],
68
+ "wallclock_s": 10542,
69
+ "n_examples": 5000,
70
+ "epochs": 1,
71
+ "mode": "bottleneck",
72
+ "lora_rank": 32,
73
+ "total_opt_steps": 156,
74
+ "num_processes": 2
75
+ }
adapters/pi2_phaseB/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: /workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi2_phaseA_merged
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:/workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi2_phaseA_merged
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.19.1
adapters/pi2_phaseB/adapter_config.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "/workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi2_phaseA_merged",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 64,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.0,
22
+ "lora_ga_config": null,
23
+ "megatron_config": null,
24
+ "megatron_core": "megatron.core",
25
+ "modules_to_save": [
26
+ "embed_tokens",
27
+ "lm_head"
28
+ ],
29
+ "peft_type": "LORA",
30
+ "peft_version": "0.19.1",
31
+ "qalora_group_size": 16,
32
+ "r": 32,
33
+ "rank_pattern": {},
34
+ "revision": null,
35
+ "target_modules": [
36
+ "o_proj",
37
+ "down_proj",
38
+ "v_proj",
39
+ "k_proj",
40
+ "q_proj",
41
+ "gate_proj",
42
+ "up_proj"
43
+ ],
44
+ "target_parameters": null,
45
+ "task_type": "CAUSAL_LM",
46
+ "trainable_token_indices": null,
47
+ "use_bdlora": null,
48
+ "use_dora": false,
49
+ "use_qalora": false,
50
+ "use_rslora": false
51
+ }
adapters/pi2_phaseB/chat_template.jinja ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if message.content is string %}
27
+ {%- set content = message.content %}
28
+ {%- else %}
29
+ {%- set content = '' %}
30
+ {%- endif %}
31
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
32
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
33
+ {%- elif message.role == "assistant" %}
34
+ {%- set reasoning_content = '' %}
35
+ {%- if message.reasoning_content is string %}
36
+ {%- set reasoning_content = message.reasoning_content %}
37
+ {%- else %}
38
+ {%- if '</think>' in content %}
39
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
40
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
41
+ {%- endif %}
42
+ {%- endif %}
43
+ {%- if loop.index0 > ns.last_query_index %}
44
+ {%- if loop.last or (not loop.last and reasoning_content) %}
45
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
46
+ {%- else %}
47
+ {{- '<|im_start|>' + message.role + '\n' + content }}
48
+ {%- endif %}
49
+ {%- else %}
50
+ {{- '<|im_start|>' + message.role + '\n' + content }}
51
+ {%- endif %}
52
+ {%- if message.tool_calls %}
53
+ {%- for tool_call in message.tool_calls %}
54
+ {%- if (loop.first and content) or (not loop.first) %}
55
+ {{- '\n' }}
56
+ {%- endif %}
57
+ {%- if tool_call.function %}
58
+ {%- set tool_call = tool_call.function %}
59
+ {%- endif %}
60
+ {{- '<tool_call>\n{"name": "' }}
61
+ {{- tool_call.name }}
62
+ {{- '", "arguments": ' }}
63
+ {%- if tool_call.arguments is string %}
64
+ {{- tool_call.arguments }}
65
+ {%- else %}
66
+ {{- tool_call.arguments | tojson }}
67
+ {%- endif %}
68
+ {{- '}\n</tool_call>' }}
69
+ {%- endfor %}
70
+ {%- endif %}
71
+ {{- '<|im_end|>\n' }}
72
+ {%- elif message.role == "tool" %}
73
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
74
+ {{- '<|im_start|>user' }}
75
+ {%- endif %}
76
+ {{- '\n<tool_response>\n' }}
77
+ {{- content }}
78
+ {{- '\n</tool_response>' }}
79
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
80
+ {{- '<|im_end|>\n' }}
81
+ {%- endif %}
82
+ {%- endif %}
83
+ {%- endfor %}
84
+ {%- if add_generation_prompt %}
85
+ {{- '<|im_start|>assistant\n' }}
86
+ {%- if enable_thinking is defined and enable_thinking is false %}
87
+ {{- '<think>\n\n</think>\n\n' }}
88
+ {%- endif %}
89
+ {%- endif %}
adapters/pi2_phaseB/tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "<|im_start|>",
10
+ "<|im_end|>",
11
+ "<|object_ref_start|>",
12
+ "<|object_ref_end|>",
13
+ "<|box_start|>",
14
+ "<|box_end|>",
15
+ "<|quad_start|>",
16
+ "<|quad_end|>",
17
+ "<|vision_start|>",
18
+ "<|vision_end|>",
19
+ "<|vision_pad|>",
20
+ "<|image_pad|>",
21
+ "<|video_pad|>"
22
+ ],
23
+ "is_local": true,
24
+ "local_files_only": false,
25
+ "model_max_length": 131072,
26
+ "pad_token": "<|endoftext|>",
27
+ "split_special_tokens": false,
28
+ "tokenizer_class": "Qwen2Tokenizer",
29
+ "unk_token": null
30
+ }
adapters/pi2_phaseB/train_log.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "losses": [
3
+ 0.30659591216535775,
4
+ 0.263701402238803,
5
+ 0.2917288892320357,
6
+ 0.2979889345937409,
7
+ 0.2673945975548122,
8
+ 0.27389297530753537,
9
+ 0.31817070026881994,
10
+ 0.29204057007445955,
11
+ 0.3173439687816426,
12
+ 0.2996893103467301,
13
+ 0.3109925156692043,
14
+ 0.31033597550704145,
15
+ 0.30306430828932207,
16
+ 0.34484060467220845,
17
+ 0.29502744183409957,
18
+ 0.332037893566303,
19
+ 0.31040111857582814,
20
+ 0.32000000202679074,
21
+ 0.34648944581858815,
22
+ 0.33038036652142183,
23
+ 0.34962533053476363,
24
+ 0.3640994099027012,
25
+ 0.3611091356840916,
26
+ 0.3375700225355104,
27
+ 0.3406386500224471,
28
+ 0.4031711193849333,
29
+ 0.3656635078601539,
30
+ 0.31611953389365227,
31
+ 0.36588236822863107,
32
+ 0.36202733669779263,
33
+ 0.34657563852961176
34
+ ],
35
+ "lrs": [
36
+ 6.666666666666667e-05,
37
+ 9.993008576227247e-05,
38
+ 9.937194443381972e-05,
39
+ 9.826190093588563e-05,
40
+ 9.661236384224129e-05,
41
+ 9.444177243274618e-05,
42
+ 9.177439057064683e-05,
43
+ 8.864003547001915e-05,
44
+ 8.507374438531607e-05,
45
+ 8.111538294891684e-05,
46
+ 7.680919953486048e-05,
47
+ 7.220333063028872e-05,
48
+ 6.734926274378312e-05,
49
+ 6.230125686563068e-05,
50
+ 5.7115741913664264e-05,
51
+ 5.185068394501791e-05,
52
+ 4.6564938185035956e-05,
53
+ 4.131759111665349e-05,
54
+ 3.616729998467365e-05,
55
+ 3.1171637098265064e-05,
56
+ 2.638644626136587e-05,
57
+ 2.1865218525109495e-05,
58
+ 1.7658494240397126e-05,
59
+ 1.3813298094746491e-05,
60
+ 1.037261344883343e-05,
61
+ 7.374901848832683e-06,
62
+ 4.853673085668947e-06,
63
+ 2.8371106072518195e-06,
64
+ 1.3477564710088098e-06,
65
+ 4.02259358460233e-07,
66
+ 1.1188468644907079e-08
67
+ ],
68
+ "wallclock_s": 962,
69
+ "n_examples": 5000,
70
+ "epochs": 1,
71
+ "mode": "distill",
72
+ "lora_rank": 32,
73
+ "total_opt_steps": 156,
74
+ "num_processes": 2
75
+ }
adapters/pi3_phaseA/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: /workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi2_phaseB_merged
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:/workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi2_phaseB_merged
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.19.1
adapters/pi3_phaseA/adapter_config.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "/workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi2_phaseB_merged",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 64,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.0,
22
+ "lora_ga_config": null,
23
+ "megatron_config": null,
24
+ "megatron_core": "megatron.core",
25
+ "modules_to_save": [
26
+ "embed_tokens",
27
+ "lm_head"
28
+ ],
29
+ "peft_type": "LORA",
30
+ "peft_version": "0.19.1",
31
+ "qalora_group_size": 16,
32
+ "r": 32,
33
+ "rank_pattern": {},
34
+ "revision": null,
35
+ "target_modules": [
36
+ "q_proj",
37
+ "gate_proj",
38
+ "up_proj",
39
+ "v_proj",
40
+ "o_proj",
41
+ "k_proj",
42
+ "down_proj"
43
+ ],
44
+ "target_parameters": null,
45
+ "task_type": "CAUSAL_LM",
46
+ "trainable_token_indices": null,
47
+ "use_bdlora": null,
48
+ "use_dora": false,
49
+ "use_qalora": false,
50
+ "use_rslora": false
51
+ }
adapters/pi3_phaseA/chat_template.jinja ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if message.content is string %}
27
+ {%- set content = message.content %}
28
+ {%- else %}
29
+ {%- set content = '' %}
30
+ {%- endif %}
31
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
32
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
33
+ {%- elif message.role == "assistant" %}
34
+ {%- set reasoning_content = '' %}
35
+ {%- if message.reasoning_content is string %}
36
+ {%- set reasoning_content = message.reasoning_content %}
37
+ {%- else %}
38
+ {%- if '</think>' in content %}
39
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
40
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
41
+ {%- endif %}
42
+ {%- endif %}
43
+ {%- if loop.index0 > ns.last_query_index %}
44
+ {%- if loop.last or (not loop.last and reasoning_content) %}
45
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
46
+ {%- else %}
47
+ {{- '<|im_start|>' + message.role + '\n' + content }}
48
+ {%- endif %}
49
+ {%- else %}
50
+ {{- '<|im_start|>' + message.role + '\n' + content }}
51
+ {%- endif %}
52
+ {%- if message.tool_calls %}
53
+ {%- for tool_call in message.tool_calls %}
54
+ {%- if (loop.first and content) or (not loop.first) %}
55
+ {{- '\n' }}
56
+ {%- endif %}
57
+ {%- if tool_call.function %}
58
+ {%- set tool_call = tool_call.function %}
59
+ {%- endif %}
60
+ {{- '<tool_call>\n{"name": "' }}
61
+ {{- tool_call.name }}
62
+ {{- '", "arguments": ' }}
63
+ {%- if tool_call.arguments is string %}
64
+ {{- tool_call.arguments }}
65
+ {%- else %}
66
+ {{- tool_call.arguments | tojson }}
67
+ {%- endif %}
68
+ {{- '}\n</tool_call>' }}
69
+ {%- endfor %}
70
+ {%- endif %}
71
+ {{- '<|im_end|>\n' }}
72
+ {%- elif message.role == "tool" %}
73
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
74
+ {{- '<|im_start|>user' }}
75
+ {%- endif %}
76
+ {{- '\n<tool_response>\n' }}
77
+ {{- content }}
78
+ {{- '\n</tool_response>' }}
79
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
80
+ {{- '<|im_end|>\n' }}
81
+ {%- endif %}
82
+ {%- endif %}
83
+ {%- endfor %}
84
+ {%- if add_generation_prompt %}
85
+ {{- '<|im_start|>assistant\n' }}
86
+ {%- if enable_thinking is defined and enable_thinking is false %}
87
+ {{- '<think>\n\n</think>\n\n' }}
88
+ {%- endif %}
89
+ {%- endif %}
adapters/pi3_phaseA/tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "<|im_start|>",
10
+ "<|im_end|>",
11
+ "<|object_ref_start|>",
12
+ "<|object_ref_end|>",
13
+ "<|box_start|>",
14
+ "<|box_end|>",
15
+ "<|quad_start|>",
16
+ "<|quad_end|>",
17
+ "<|vision_start|>",
18
+ "<|vision_end|>",
19
+ "<|vision_pad|>",
20
+ "<|image_pad|>",
21
+ "<|video_pad|>"
22
+ ],
23
+ "is_local": true,
24
+ "local_files_only": false,
25
+ "model_max_length": 131072,
26
+ "pad_token": "<|endoftext|>",
27
+ "split_special_tokens": false,
28
+ "tokenizer_class": "Qwen2Tokenizer",
29
+ "unk_token": null
30
+ }
adapters/pi3_phaseA/train_log.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "losses": [
3
+ 0.2751981415989576,
4
+ 0.22893165266577853,
5
+ 0.24834042781731114,
6
+ 0.2638388895960816,
7
+ 0.23417620072723366,
8
+ 0.24215587118524126,
9
+ 0.2851439667691011,
10
+ 0.2613255390126142,
11
+ 0.29196599322604017,
12
+ 0.27143954291314004,
13
+ 0.2879517031600699,
14
+ 0.28859320177070913,
15
+ 0.28058985095995015,
16
+ 0.32684289703611286,
17
+ 0.2814401665353216,
18
+ 0.316986553161405,
19
+ 0.29619569072965535,
20
+ 0.31013335563475264,
21
+ 0.3364840148482472,
22
+ 0.3246558667509817,
23
+ 0.3450019852258265,
24
+ 0.36042039859457875,
25
+ 0.359003718016902,
26
+ 0.33632316675502805,
27
+ 0.34056199092883616,
28
+ 0.4061928411683766,
29
+ 0.3674649982713163,
30
+ 0.3189254213997629,
31
+ 0.3698235720337834,
32
+ 0.3649632065091282,
33
+ 0.34908437901176514
34
+ ],
35
+ "lrs": [
36
+ 6.666666666666667e-05,
37
+ 9.993008576227247e-05,
38
+ 9.937194443381972e-05,
39
+ 9.826190093588563e-05,
40
+ 9.661236384224129e-05,
41
+ 9.444177243274618e-05,
42
+ 9.177439057064683e-05,
43
+ 8.864003547001915e-05,
44
+ 8.507374438531607e-05,
45
+ 8.111538294891684e-05,
46
+ 7.680919953486048e-05,
47
+ 7.220333063028872e-05,
48
+ 6.734926274378312e-05,
49
+ 6.230125686563068e-05,
50
+ 5.7115741913664264e-05,
51
+ 5.185068394501791e-05,
52
+ 4.6564938185035956e-05,
53
+ 4.131759111665349e-05,
54
+ 3.616729998467365e-05,
55
+ 3.1171637098265064e-05,
56
+ 2.638644626136587e-05,
57
+ 2.1865218525109495e-05,
58
+ 1.7658494240397126e-05,
59
+ 1.3813298094746491e-05,
60
+ 1.037261344883343e-05,
61
+ 7.374901848832683e-06,
62
+ 4.853673085668947e-06,
63
+ 2.8371106072518195e-06,
64
+ 1.3477564710088098e-06,
65
+ 4.02259358460233e-07,
66
+ 1.1188468644907079e-08
67
+ ],
68
+ "wallclock_s": 10572,
69
+ "n_examples": 5000,
70
+ "epochs": 1,
71
+ "mode": "bottleneck",
72
+ "lora_rank": 32,
73
+ "total_opt_steps": 156,
74
+ "num_processes": 2
75
+ }
adapters/pi3_phaseB/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: /workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi3_phaseA_merged
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:/workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi3_phaseA_merged
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.19.1
adapters/pi3_phaseB/adapter_config.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "/workspace/ThinkingWithoutWordsRepro/runs/qwen3-4b-abs/pi3_phaseA_merged",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 64,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.0,
22
+ "lora_ga_config": null,
23
+ "megatron_config": null,
24
+ "megatron_core": "megatron.core",
25
+ "modules_to_save": [
26
+ "embed_tokens",
27
+ "lm_head"
28
+ ],
29
+ "peft_type": "LORA",
30
+ "peft_version": "0.19.1",
31
+ "qalora_group_size": 16,
32
+ "r": 32,
33
+ "rank_pattern": {},
34
+ "revision": null,
35
+ "target_modules": [
36
+ "o_proj",
37
+ "v_proj",
38
+ "down_proj",
39
+ "up_proj",
40
+ "gate_proj",
41
+ "k_proj",
42
+ "q_proj"
43
+ ],
44
+ "target_parameters": null,
45
+ "task_type": "CAUSAL_LM",
46
+ "trainable_token_indices": null,
47
+ "use_bdlora": null,
48
+ "use_dora": false,
49
+ "use_qalora": false,
50
+ "use_rslora": false
51
+ }
adapters/pi3_phaseB/chat_template.jinja ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if message.content is string %}
27
+ {%- set content = message.content %}
28
+ {%- else %}
29
+ {%- set content = '' %}
30
+ {%- endif %}
31
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
32
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
33
+ {%- elif message.role == "assistant" %}
34
+ {%- set reasoning_content = '' %}
35
+ {%- if message.reasoning_content is string %}
36
+ {%- set reasoning_content = message.reasoning_content %}
37
+ {%- else %}
38
+ {%- if '</think>' in content %}
39
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
40
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
41
+ {%- endif %}
42
+ {%- endif %}
43
+ {%- if loop.index0 > ns.last_query_index %}
44
+ {%- if loop.last or (not loop.last and reasoning_content) %}
45
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
46
+ {%- else %}
47
+ {{- '<|im_start|>' + message.role + '\n' + content }}
48
+ {%- endif %}
49
+ {%- else %}
50
+ {{- '<|im_start|>' + message.role + '\n' + content }}
51
+ {%- endif %}
52
+ {%- if message.tool_calls %}
53
+ {%- for tool_call in message.tool_calls %}
54
+ {%- if (loop.first and content) or (not loop.first) %}
55
+ {{- '\n' }}
56
+ {%- endif %}
57
+ {%- if tool_call.function %}
58
+ {%- set tool_call = tool_call.function %}
59
+ {%- endif %}
60
+ {{- '<tool_call>\n{"name": "' }}
61
+ {{- tool_call.name }}
62
+ {{- '", "arguments": ' }}
63
+ {%- if tool_call.arguments is string %}
64
+ {{- tool_call.arguments }}
65
+ {%- else %}
66
+ {{- tool_call.arguments | tojson }}
67
+ {%- endif %}
68
+ {{- '}\n</tool_call>' }}
69
+ {%- endfor %}
70
+ {%- endif %}
71
+ {{- '<|im_end|>\n' }}
72
+ {%- elif message.role == "tool" %}
73
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
74
+ {{- '<|im_start|>user' }}
75
+ {%- endif %}
76
+ {{- '\n<tool_response>\n' }}
77
+ {{- content }}
78
+ {{- '\n</tool_response>' }}
79
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
80
+ {{- '<|im_end|>\n' }}
81
+ {%- endif %}
82
+ {%- endif %}
83
+ {%- endfor %}
84
+ {%- if add_generation_prompt %}
85
+ {{- '<|im_start|>assistant\n' }}
86
+ {%- if enable_thinking is defined and enable_thinking is false %}
87
+ {{- '<think>\n\n</think>\n\n' }}
88
+ {%- endif %}
89
+ {%- endif %}
adapters/pi3_phaseB/tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "<|im_start|>",
10
+ "<|im_end|>",
11
+ "<|object_ref_start|>",
12
+ "<|object_ref_end|>",
13
+ "<|box_start|>",
14
+ "<|box_end|>",
15
+ "<|quad_start|>",
16
+ "<|quad_end|>",
17
+ "<|vision_start|>",
18
+ "<|vision_end|>",
19
+ "<|vision_pad|>",
20
+ "<|image_pad|>",
21
+ "<|video_pad|>"
22
+ ],
23
+ "is_local": true,
24
+ "local_files_only": false,
25
+ "model_max_length": 131072,
26
+ "pad_token": "<|endoftext|>",
27
+ "split_special_tokens": false,
28
+ "tokenizer_class": "Qwen2Tokenizer",
29
+ "unk_token": null
30
+ }
adapters/pi3_phaseB/train_log.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "losses": [
3
+ 0.24705615827115252,
4
+ 0.19863037412578705,
5
+ 0.20961245244543533,
6
+ 0.23088445312132536,
7
+ 0.20231807196396404,
8
+ 0.21607737393860588,
9
+ 0.2541877782714437,
10
+ 0.23250892728756298,
11
+ 0.26798237174225503,
12
+ 0.2450843573300517,
13
+ 0.26578180299547965,
14
+ 0.26796957241640484,
15
+ 0.25621661408949875,
16
+ 0.30977868895424765,
17
+ 0.26767159106384497,
18
+ 0.3011666447739117,
19
+ 0.2829906645929441,
20
+ 0.2978396859412896,
21
+ 0.3276536007644609,
22
+ 0.3177644655283075,
23
+ 0.3392833017860539,
24
+ 0.35701846808369736,
25
+ 0.3581092601059936,
26
+ 0.3342221752507612,
27
+ 0.34096981105394664,
28
+ 0.4076683118997607,
29
+ 0.36950865692924706,
30
+ 0.32229581456631423,
31
+ 0.37231990886793936,
32
+ 0.36889135412639007,
33
+ 0.35545787583687344
34
+ ],
35
+ "lrs": [
36
+ 6.666666666666667e-05,
37
+ 9.993008576227247e-05,
38
+ 9.937194443381972e-05,
39
+ 9.826190093588563e-05,
40
+ 9.661236384224129e-05,
41
+ 9.444177243274618e-05,
42
+ 9.177439057064683e-05,
43
+ 8.864003547001915e-05,
44
+ 8.507374438531607e-05,
45
+ 8.111538294891684e-05,
46
+ 7.680919953486048e-05,
47
+ 7.220333063028872e-05,
48
+ 6.734926274378312e-05,
49
+ 6.230125686563068e-05,
50
+ 5.7115741913664264e-05,
51
+ 5.185068394501791e-05,
52
+ 4.6564938185035956e-05,
53
+ 4.131759111665349e-05,
54
+ 3.616729998467365e-05,
55
+ 3.1171637098265064e-05,
56
+ 2.638644626136587e-05,
57
+ 2.1865218525109495e-05,
58
+ 1.7658494240397126e-05,
59
+ 1.3813298094746491e-05,
60
+ 1.037261344883343e-05,
61
+ 7.374901848832683e-06,
62
+ 4.853673085668947e-06,
63
+ 2.8371106072518195e-06,
64
+ 1.3477564710088098e-06,
65
+ 4.02259358460233e-07,
66
+ 1.1188468644907079e-08
67
+ ],
68
+ "wallclock_s": 947,
69
+ "n_examples": 5000,
70
+ "epochs": 1,
71
+ "mode": "distill",
72
+ "lora_rank": 32,
73
+ "total_opt_steps": 156,
74
+ "num_processes": 2
75
+ }
docs/20260511.md ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Abstract-CoT (arXiv:2604.22709v2) — Production Run on Qwen3-4B
2
+
3
+ **Date:** 2026-05-11
4
+ **Scope:** Full T=3 PI warm-up at N=5000, seq_len=8192, LoRA — comparison target is the same paper's "Abstract-CoT (Warm-up)" row in Table 1.
5
+ **Hardware:** 2× NVIDIA A100-SXM4-80GB (this machine, fresh clone). New checkpoint.
6
+ **Status:** Pipeline ran end-to-end in **~11 hours**, under the 12 hr budget. All three engineering wins (vLLM gen_traces, LR-schedule fix, seq_len 8k) shipped. T=3 on-policy iteration ran cleanly per-round but did **not** improve over T=1 at this LoRA/data scale (within noise).
7
+
8
+ ---
9
+
10
+ ## Headline numbers
11
+
12
+ | Method | MATH-500 acc | Mean tokens (reasoning + response) |
13
+ |---|---|---|
14
+ | Paper Baseline (Qwen3-4B, verbal CoT) | 83.2 | 1087 |
15
+ | Smoke Baseline (prior repo, 2× A100-40GB) | 83.60 | 1067 |
16
+ | **This-run Baseline** (2× A100-80GB, vLLM 0.19.1) | **84.60** | **1045** |
17
+ | Paper Abstract-CoT (Warm-up) | 86.2 | 168 |
18
+ | Smoke Warm-up (T=1, N=5k, 1ep, LoRA, seq 2k, T=0.7, m_min=16) | 73.20 | 433 |
19
+ | **Validation** (T=1, N=500, 1ep, LoRA, seq 8k, T=0.7, m_min=16) | 73.40 | 558 |
20
+ | **This-run Warm-up** (T=3, N=5k, 1ep, LoRA, seq 8k, T=0.7, m_min=16) | **72.00** | **432** |
21
+
22
+ Reading:
23
+ - **Baseline reproduces paper.** vLLM 0.19.1 (downgraded from 0.20.2 to match CUDA 12.8) works correctly on this box.
24
+ - **T=3 did not beat T=1** at this scale: 72.0 vs 73.2/73.4, well within the noise of temp=0.7 abstract-trace sampling.
25
+ - **Mean total tokens dropped** (432 vs smoke 433 and validation 558), suggesting the on-policy traces *did* push the model toward shorter responses; accuracy just didn't lift.
26
+
27
+ ---
28
+
29
+ ## Hardware actually available
30
+
31
+ ```
32
+ GPU 0: A100-SXM4-80GB vol ECC unc: 0 matmul OK, vLLM OK, sustained 100% util for 11 hr clean
33
+ GPU 1: A100-SXM4-80GB vol ECC unc: 0 matmul OK, vLLM OK, sustained 100% util for 11 hr clean
34
+ ```
35
+
36
+ Both GPUs usable. CUDA 12.8 / driver 570.195.03. 1.4 TiB system RAM, 128 CPUs. 146 GB free overlay disk at start; ended at 88 GB used.
37
+
38
+ Compute per GPU is identical to the smoke's 40GB cards (same GA100 silicon). The 80GB lifts the seq_len cap and unblocks the full-FT path (not used here — we kept LoRA per the smoke's recommendation for this budget).
39
+
40
+ **vLLM TP**: Qwen3-4B has 32 attention heads → TP must divide 32. **TP=2 fits perfectly** on this 2-GPU box (no idle card during eval).
41
+
42
+ ---
43
+
44
+ ## What changed vs. the smoke
45
+
46
+ Listed in rough order of impact / engineering work.
47
+
48
+ ### 1. **vLLM port for `gen_traces`** (biggest engineering win)
49
+
50
+ Replaced HF `model.generate()` + custom `LogitsProcessor` with vLLM `LLM.generate()` + `SamplingParams.allowed_token_ids` to enforce the V_abs ∪ {END_ABS} alphabet directly in the sampler. No custom logits processor needed — vLLM's `allowed_token_ids` does exactly this efficiently inside the kernels.
51
+
52
+ Measured throughput on N=5000:
53
+
54
+ | Mode | Prefix | max_model_len | Wall | Rate |
55
+ |---|---|---|---|---|
56
+ | Phase B teacher (no CoT) | ~150 tok | 4096 | **17 s** | **294/s** |
57
+ | Phase A teacher (with CoT) | ~5500 tok | 8192 | **887–891 s** | **5.6/s** |
58
+ | Smoke HF baseline | — | — | 11 min (=660 s) on 5k | 7.6/s |
59
+
60
+ vLLM speedup: **40× on Phase B teacher**, **9× on Phase A teacher** vs. the smoke's HF generate. The Phase A teacher path is prefill-dominated by 7800-token CoT prefixes; even there vLLM beats batched HF generate.
61
+
62
+ ### 2. **Cosine LR schedule bug — root-cause fix**
63
+
64
+ The smoke report described an LR curve that went `1e-4 → 5e-7 → bounce back to 1e-4`, but its diagnosis ("`total_steps` was computed before `accelerator.prepare()`") was wrong — the source code already computed `total_steps` after `prepare()`.
65
+
66
+ **Actual root cause:** `accelerator.prepare(sched)` returns an `AcceleratedScheduler` that, under default settings (`split_batches=False`, `step_with_optimizer=True`), advances the underlying scheduler `num_processes` times per `sched.step()` call. With 2 GPUs, the cosine completes in half the calls, then bounces back to peak (the `get_cosine_schedule_with_warmup` function's `num_cycles=0.5` curve returns to max once `progress > 1`).
67
+
68
+ **Fix** in `src/train_phase_lora.py`:
69
+
70
+ ```python
71
+ total_opt_steps = steps_per_epoch * args.epochs
72
+ total_steps = total_opt_steps * accelerator.num_processes # NEW
73
+ sched = get_cosine_schedule_with_warmup(
74
+ opt, num_warmup_steps=max(1, total_steps // 20),
75
+ num_training_steps=total_steps,
76
+ )
77
+ ```
78
+
79
+ **Verified** end-to-end. Round-3 Phase A train log: peak 1.0e-4 at step 5, monotonic cosine descent to 1.12e-8 at step 155, **no bounce-back**. Identical curve on every Phase A and Phase B of every round.
80
+
81
+ A standalone reproduction of both the bug and the fix (no GPU needed) is in this session's log; can be reconstructed by instantiating the scheduler with `total_steps = total_opt_steps * 2` and stepping it `total_opt_steps × 2` times.
82
+
83
+ Also added `train_log.json` "lrs" key alongside "losses" so future audits can verify offline.
84
+
85
+ ### 3. **`max_len` 2048 → 8192**
86
+
87
+ The smoke truncated 98% of Dolci-Think CoTs from the right (median CoT is 18.8k tokens). At seq_len 8192, ~60% of CoTs fit fully and the rest only have their tail removed — meaningful reasoning makes it into the bottleneck. Measured per-step time at seq_len 8192 on this box: **71.9 s/step** (vs. 12.7 s/step at seq_len 2048 in the smoke) — a 5.66× slowdown, dominated by self-attention now scaling quadratically over a longer window.
88
+
89
+ ### 4. **T=1 → T=3 (full PI warm-up)**
90
+
91
+ The smoke only did one PI round (random Z̃ → bottleneck SFT → self-distill). This run did three: round 2 and round 3 use on-policy Z̃ generated via constrained decoding from the previous round's model. Per-round loss curves (Phase A `[bottleneck]`):
92
+
93
+ | Round | step 5 loss | step 155 loss | Notes |
94
+ |---|---|---|---|
95
+ | 1 | 3.49 | 0.85 | starts from random Z̃ — model is learning the bottleneck structure |
96
+ | 2 | 0.35 | 0.34 | Z̃ now carries signal — model converges fast |
97
+ | 3 | 0.27 | 0.35 | even cleaner start; the on-policy traces are doing what they should |
98
+
99
+ Phase B `[distill]` starting loss: 0.49 → 0.29 → 0.21 across rounds — same story.
100
+
101
+ So the optimizer is clearly working with the on-policy bottleneck signal. The accuracy lift just didn't show up at this LoRA/data scale (see "Quality observation" below).
102
+
103
+ ### 5. **Misc fixes**
104
+
105
+ - **Shell syntax bug** in `scripts/03_phase_a.sh`: the apostrophe inside `${OUT:?OUT must be the output dir for this phase's LoRA adapter}` opened an unterminated single-quoted string under bash 5.2. Replaced with apostrophe-free wording.
106
+ - **`max_model_len` too tight on Phase B teacher** (first production attempt): set to 1024, but some Dolci user prompts are 1.5–2.5k tokens. Validation at N=500 didn't sample the tail. Bumped to **3072 / 4096** (prefix / model_len), and added defensive left-truncation of X when even X alone exceeds the budget. Re-runs completed instantly.
107
+ - `run_smoke.sh` now accepts `DATA_FILE` and `SKIP_BASELINE` env overrides so the same 6k-row dolci file can serve both validation (N=500) and production (N=5000).
108
+
109
+ ---
110
+
111
+ ## Per-stage wall times (this run)
112
+
113
+ | Stage | Per-occurrence | × T=3 |
114
+ |---|---|---|
115
+ | Phase A (`bottleneck`, 156 opt steps @ ~72 s/step, seq 8k) | **~2.94 hr** | **8.81 hr** |
116
+ | Phase B (`distill`, 156 opt steps @ ~6 s/step) | **~15.7 min** | **47 min** |
117
+ | gen_traces — Phase A teacher (vLLM TP=2, max_model_len=8192) | **~14.85 min** | ~29.7 min (rounds 2, 3 only) |
118
+ | gen_traces — Phase B teacher (vLLM TP=2, max_model_len=4096) | **17 s** | 51 s |
119
+ | Merge LoRA → full HF (CPU-bound write) | ~30 s | 3 min |
120
+ | Final eval (MATH-500, vLLM TP=2, T=0.7 abstract / T=0 answer) | — | **23 s** for the 3-stage decode |
121
+ | Pre-flight (extend Qwen3-4B, baseline calibration) | — | ~10 min |
122
+ | **Total wall** | | **~11h 0m** (04:07 → 15:08) |
123
+
124
+ Includes ~5 min of failed gen_traces + restart from the `max_model_len=1024` issue.
125
+
126
+ ---
127
+
128
+ ## Configuration used
129
+
130
+ ```bash
131
+ RUNS_DIR=$PWD/runs \
132
+ DATA_FILE=$PWD/data/dolci_6000.jsonl \
133
+ SKIP_BASELINE=1 \
134
+ N=5000 T=3 EPOCHS=1 \
135
+ bash scripts/run_smoke.sh
136
+ ```
137
+
138
+ With the in-script defaults:
139
+ - `MAX_LEN=8192` (Phase A + Phase B training cap)
140
+ - `MICRO_BATCH=1`, `GRAD_ACCUM=16`, **effective batch 32** (2 GPUs × 1 × 16)
141
+ - `LR=1e-4`, cosine schedule, 5% warmup
142
+ - LoRA: `r=32`, `alpha=64`, target `{q,k,v,o,gate,up,down}_proj`, `modules_to_save=["embed_tokens","lm_head"]` (842.9 M / 4.86 B = 17.3% trainable)
143
+ - Abstract eval: `m_min=16`, `m_max=128`, `abs_temp=0.7`, answer `temp=0.0`, `tp=2`
144
+
145
+ ---
146
+
147
+ ## Quality observation — why didn't T=3 help?
148
+
149
+ Per-round Phase A starting loss dropped 3.49 → 0.35 → 0.27, showing the on-policy abstract traces are doing what the paper says they should: they begin to carry signal from CoT through Z̃. But MATH-500 accuracy stayed at 72 ± 1.2 across T=1 and T=3.
150
+
151
+ Hypotheses, in rough order of credibility:
152
+
153
+ 1. **LoRA caps the gain.** With ~17% trainable params and the embedding table dominating those, the model's base "answer-from-prompt" reflex is too strong for the bottleneck to redirect at this scale. The smoke report flagged this as the biggest gap to the paper, and our T=3 result is consistent.
154
+ 2. **N=5000 is still tiny.** The paper used 600k; we used 5k. On-policy refinement needs enough novel `(x, c)` pairs to keep producing diverse `Z̃` shapes; at 5k the same examples just get revisited with marginally different traces.
155
+ 3. **Eval stochasticity.** Abstract trace decode uses temp=0.7. 1–2 pt variance between runs at N=500 is normal. The validation result (73.4) was likely a lucky upper; production (72.0) is within noise of the smoke (73.2).
156
+ 4. **seq_len 8k may have let too much CoT signal "leak" through Z̃ during teacher generation**, making `Z̃` less of a compression target. Counterintuitive — the smoke argued for longer seq — but the bottleneck quality is the *delta* between what reaches Z̃ and what Y can use directly. Worth ablating.
157
+
158
+ ---
159
+
160
+ ## What's next, ranked
161
+
162
+ 1. **Full fine-tuning** instead of LoRA. With 2× 80GB and `enforce_eager=True` for the optimizer side, ZeRO-3 (no offload) becomes feasible on Qwen3-4B at seq_len 8k. Estimated ~16-20 hr at the current config; biggest expected lift.
163
+ 2. **N → 30k–60k** (still T=3, still seq 8k, still LoRA). Roughly extrapolates to ~30–60 hr — out of a one-day budget but the right next step if we get a 2–3 day budget. The full-FT path at 5k would be more diagnostic per hour.
164
+ 3. **More epochs.** Paper uses 3 epochs/phase; we used 1. 3× more wall but should help the Adam states settle.
165
+ 4. Re-eval the current `pi3_phaseB_merged` at multiple seeds + temperatures to bound the eval stochasticity tighter. 5–10 min each.
166
+
167
+ ---
168
+
169
+ ## File layout (under `/workspace/ThinkingWithoutWordsRepro/`)
170
+
171
+ ```
172
+ runs/
173
+ baseline_math500.jsonl # 84.60% (this-run baseline)
174
+ abstract_math500_T3_N5000.jsonl # 72.00% (final result)
175
+ qwen3-4b-abs/
176
+ base/ # Qwen3-4B + V_abs (M=64) + delimiters
177
+ pi1_phaseA/ pi1_phaseA_merged/ # round 1 LoRA + merged
178
+ pi1_phaseB/ pi1_phaseB_merged/ # round 1 Phase B
179
+ pi1_phaseB_teacher_traces.jsonl # on-policy Z̃ for round-1 self-distill
180
+ pi2_phaseA_teacher_traces.jsonl # bottleneck teacher for round 2 (full-CoT)
181
+ pi2_phaseA/ pi2_phaseA_merged/ # round 2
182
+ pi2_phaseB_teacher_traces.jsonl
183
+ pi2_phaseB/ pi2_phaseB_merged/
184
+ pi3_phaseA_teacher_traces.jsonl
185
+ pi3_phaseA/ pi3_phaseA_merged/
186
+ pi3_phaseB_teacher_traces.jsonl
187
+ pi3_phaseB/ pi3_phaseB_merged/ # ← FINAL warm-up model
188
+
189
+ data/
190
+ math500.jsonl # 500 problems
191
+ dolci_6000.jsonl # 6k filtered Dolci-Think examples (used N=5000 of them)
192
+
193
+ docs/
194
+ 20260510SMOKE_REPORT.md # prior run on 2× A100-40GB
195
+ 20260511.md # this report
196
+ ```
197
+
198
+ Train logs (`runs/qwen3-4b-abs/pi*/train_log.json`) include `losses`, `lrs`, `total_opt_steps`, `num_processes`, `wallclock_s` per phase — sufficient to plot and re-verify the LR fix offline.
final/chat_template.jinja ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if message.content is string %}
27
+ {%- set content = message.content %}
28
+ {%- else %}
29
+ {%- set content = '' %}
30
+ {%- endif %}
31
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
32
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
33
+ {%- elif message.role == "assistant" %}
34
+ {%- set reasoning_content = '' %}
35
+ {%- if message.reasoning_content is string %}
36
+ {%- set reasoning_content = message.reasoning_content %}
37
+ {%- else %}
38
+ {%- if '</think>' in content %}
39
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
40
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
41
+ {%- endif %}
42
+ {%- endif %}
43
+ {%- if loop.index0 > ns.last_query_index %}
44
+ {%- if loop.last or (not loop.last and reasoning_content) %}
45
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
46
+ {%- else %}
47
+ {{- '<|im_start|>' + message.role + '\n' + content }}
48
+ {%- endif %}
49
+ {%- else %}
50
+ {{- '<|im_start|>' + message.role + '\n' + content }}
51
+ {%- endif %}
52
+ {%- if message.tool_calls %}
53
+ {%- for tool_call in message.tool_calls %}
54
+ {%- if (loop.first and content) or (not loop.first) %}
55
+ {{- '\n' }}
56
+ {%- endif %}
57
+ {%- if tool_call.function %}
58
+ {%- set tool_call = tool_call.function %}
59
+ {%- endif %}
60
+ {{- '<tool_call>\n{"name": "' }}
61
+ {{- tool_call.name }}
62
+ {{- '", "arguments": ' }}
63
+ {%- if tool_call.arguments is string %}
64
+ {{- tool_call.arguments }}
65
+ {%- else %}
66
+ {{- tool_call.arguments | tojson }}
67
+ {%- endif %}
68
+ {{- '}\n</tool_call>' }}
69
+ {%- endfor %}
70
+ {%- endif %}
71
+ {{- '<|im_end|>\n' }}
72
+ {%- elif message.role == "tool" %}
73
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
74
+ {{- '<|im_start|>user' }}
75
+ {%- endif %}
76
+ {{- '\n<tool_response>\n' }}
77
+ {{- content }}
78
+ {{- '\n</tool_response>' }}
79
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
80
+ {{- '<|im_end|>\n' }}
81
+ {%- endif %}
82
+ {%- endif %}
83
+ {%- endfor %}
84
+ {%- if add_generation_prompt %}
85
+ {{- '<|im_start|>assistant\n' }}
86
+ {%- if enable_thinking is defined and enable_thinking is false %}
87
+ {{- '<think>\n\n</think>\n\n' }}
88
+ {%- endif %}
89
+ {%- endif %}
final/config.json ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "dtype": "bfloat16",
9
+ "eos_token_id": 151645,
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 2560,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 9728,
15
+ "layer_types": [
16
+ "full_attention",
17
+ "full_attention",
18
+ "full_attention",
19
+ "full_attention",
20
+ "full_attention",
21
+ "full_attention",
22
+ "full_attention",
23
+ "full_attention",
24
+ "full_attention",
25
+ "full_attention",
26
+ "full_attention",
27
+ "full_attention",
28
+ "full_attention",
29
+ "full_attention",
30
+ "full_attention",
31
+ "full_attention",
32
+ "full_attention",
33
+ "full_attention",
34
+ "full_attention",
35
+ "full_attention",
36
+ "full_attention",
37
+ "full_attention",
38
+ "full_attention",
39
+ "full_attention",
40
+ "full_attention",
41
+ "full_attention",
42
+ "full_attention",
43
+ "full_attention",
44
+ "full_attention",
45
+ "full_attention",
46
+ "full_attention",
47
+ "full_attention",
48
+ "full_attention",
49
+ "full_attention",
50
+ "full_attention",
51
+ "full_attention"
52
+ ],
53
+ "max_position_embeddings": 40960,
54
+ "max_window_layers": 36,
55
+ "model_type": "qwen3",
56
+ "num_attention_heads": 32,
57
+ "num_hidden_layers": 36,
58
+ "num_key_value_heads": 8,
59
+ "pad_token_id": null,
60
+ "rms_norm_eps": 1e-06,
61
+ "rope_parameters": {
62
+ "rope_theta": 1000000,
63
+ "rope_type": "default"
64
+ },
65
+ "sliding_window": null,
66
+ "tie_word_embeddings": false,
67
+ "transformers_version": "5.8.0",
68
+ "use_cache": true,
69
+ "use_sliding_window": false,
70
+ "vocab_size": 151735
71
+ }
final/generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "temperature": 0.6,
10
+ "top_k": 20,
11
+ "top_p": 0.95,
12
+ "transformers_version": "5.8.0"
13
+ }
final/tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "<|im_start|>",
10
+ "<|im_end|>",
11
+ "<|object_ref_start|>",
12
+ "<|object_ref_end|>",
13
+ "<|box_start|>",
14
+ "<|box_end|>",
15
+ "<|quad_start|>",
16
+ "<|quad_end|>",
17
+ "<|vision_start|>",
18
+ "<|vision_end|>",
19
+ "<|vision_pad|>",
20
+ "<|image_pad|>",
21
+ "<|video_pad|>"
22
+ ],
23
+ "is_local": true,
24
+ "local_files_only": false,
25
+ "model_max_length": 131072,
26
+ "pad_token": "<|endoftext|>",
27
+ "split_special_tokens": false,
28
+ "tokenizer_class": "Qwen2Tokenizer",
29
+ "unk_token": null
30
+ }
results/abstract_math500_T3_N5000.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
round1/chat_template.jinja ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if message.content is string %}
27
+ {%- set content = message.content %}
28
+ {%- else %}
29
+ {%- set content = '' %}
30
+ {%- endif %}
31
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
32
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
33
+ {%- elif message.role == "assistant" %}
34
+ {%- set reasoning_content = '' %}
35
+ {%- if message.reasoning_content is string %}
36
+ {%- set reasoning_content = message.reasoning_content %}
37
+ {%- else %}
38
+ {%- if '</think>' in content %}
39
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
40
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
41
+ {%- endif %}
42
+ {%- endif %}
43
+ {%- if loop.index0 > ns.last_query_index %}
44
+ {%- if loop.last or (not loop.last and reasoning_content) %}
45
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
46
+ {%- else %}
47
+ {{- '<|im_start|>' + message.role + '\n' + content }}
48
+ {%- endif %}
49
+ {%- else %}
50
+ {{- '<|im_start|>' + message.role + '\n' + content }}
51
+ {%- endif %}
52
+ {%- if message.tool_calls %}
53
+ {%- for tool_call in message.tool_calls %}
54
+ {%- if (loop.first and content) or (not loop.first) %}
55
+ {{- '\n' }}
56
+ {%- endif %}
57
+ {%- if tool_call.function %}
58
+ {%- set tool_call = tool_call.function %}
59
+ {%- endif %}
60
+ {{- '<tool_call>\n{"name": "' }}
61
+ {{- tool_call.name }}
62
+ {{- '", "arguments": ' }}
63
+ {%- if tool_call.arguments is string %}
64
+ {{- tool_call.arguments }}
65
+ {%- else %}
66
+ {{- tool_call.arguments | tojson }}
67
+ {%- endif %}
68
+ {{- '}\n</tool_call>' }}
69
+ {%- endfor %}
70
+ {%- endif %}
71
+ {{- '<|im_end|>\n' }}
72
+ {%- elif message.role == "tool" %}
73
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
74
+ {{- '<|im_start|>user' }}
75
+ {%- endif %}
76
+ {{- '\n<tool_response>\n' }}
77
+ {{- content }}
78
+ {{- '\n</tool_response>' }}
79
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
80
+ {{- '<|im_end|>\n' }}
81
+ {%- endif %}
82
+ {%- endif %}
83
+ {%- endfor %}
84
+ {%- if add_generation_prompt %}
85
+ {{- '<|im_start|>assistant\n' }}
86
+ {%- if enable_thinking is defined and enable_thinking is false %}
87
+ {{- '<think>\n\n</think>\n\n' }}
88
+ {%- endif %}
89
+ {%- endif %}
round1/config.json ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "dtype": "bfloat16",
9
+ "eos_token_id": 151645,
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 2560,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 9728,
15
+ "layer_types": [
16
+ "full_attention",
17
+ "full_attention",
18
+ "full_attention",
19
+ "full_attention",
20
+ "full_attention",
21
+ "full_attention",
22
+ "full_attention",
23
+ "full_attention",
24
+ "full_attention",
25
+ "full_attention",
26
+ "full_attention",
27
+ "full_attention",
28
+ "full_attention",
29
+ "full_attention",
30
+ "full_attention",
31
+ "full_attention",
32
+ "full_attention",
33
+ "full_attention",
34
+ "full_attention",
35
+ "full_attention",
36
+ "full_attention",
37
+ "full_attention",
38
+ "full_attention",
39
+ "full_attention",
40
+ "full_attention",
41
+ "full_attention",
42
+ "full_attention",
43
+ "full_attention",
44
+ "full_attention",
45
+ "full_attention",
46
+ "full_attention",
47
+ "full_attention",
48
+ "full_attention",
49
+ "full_attention",
50
+ "full_attention",
51
+ "full_attention"
52
+ ],
53
+ "max_position_embeddings": 40960,
54
+ "max_window_layers": 36,
55
+ "model_type": "qwen3",
56
+ "num_attention_heads": 32,
57
+ "num_hidden_layers": 36,
58
+ "num_key_value_heads": 8,
59
+ "pad_token_id": null,
60
+ "rms_norm_eps": 1e-06,
61
+ "rope_parameters": {
62
+ "rope_theta": 1000000,
63
+ "rope_type": "default"
64
+ },
65
+ "sliding_window": null,
66
+ "tie_word_embeddings": false,
67
+ "transformers_version": "5.8.0",
68
+ "use_cache": true,
69
+ "use_sliding_window": false,
70
+ "vocab_size": 151735
71
+ }
round1/generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "temperature": 0.6,
10
+ "top_k": 20,
11
+ "top_p": 0.95,
12
+ "transformers_version": "5.8.0"
13
+ }
round1/tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "<|im_start|>",
10
+ "<|im_end|>",
11
+ "<|object_ref_start|>",
12
+ "<|object_ref_end|>",
13
+ "<|box_start|>",
14
+ "<|box_end|>",
15
+ "<|quad_start|>",
16
+ "<|quad_end|>",
17
+ "<|vision_start|>",
18
+ "<|vision_end|>",
19
+ "<|vision_pad|>",
20
+ "<|image_pad|>",
21
+ "<|video_pad|>"
22
+ ],
23
+ "is_local": true,
24
+ "local_files_only": false,
25
+ "model_max_length": 131072,
26
+ "pad_token": "<|endoftext|>",
27
+ "split_special_tokens": false,
28
+ "tokenizer_class": "Qwen2Tokenizer",
29
+ "unk_token": null
30
+ }
round2/chat_template.jinja ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if message.content is string %}
27
+ {%- set content = message.content %}
28
+ {%- else %}
29
+ {%- set content = '' %}
30
+ {%- endif %}
31
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
32
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
33
+ {%- elif message.role == "assistant" %}
34
+ {%- set reasoning_content = '' %}
35
+ {%- if message.reasoning_content is string %}
36
+ {%- set reasoning_content = message.reasoning_content %}
37
+ {%- else %}
38
+ {%- if '</think>' in content %}
39
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
40
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
41
+ {%- endif %}
42
+ {%- endif %}
43
+ {%- if loop.index0 > ns.last_query_index %}
44
+ {%- if loop.last or (not loop.last and reasoning_content) %}
45
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
46
+ {%- else %}
47
+ {{- '<|im_start|>' + message.role + '\n' + content }}
48
+ {%- endif %}
49
+ {%- else %}
50
+ {{- '<|im_start|>' + message.role + '\n' + content }}
51
+ {%- endif %}
52
+ {%- if message.tool_calls %}
53
+ {%- for tool_call in message.tool_calls %}
54
+ {%- if (loop.first and content) or (not loop.first) %}
55
+ {{- '\n' }}
56
+ {%- endif %}
57
+ {%- if tool_call.function %}
58
+ {%- set tool_call = tool_call.function %}
59
+ {%- endif %}
60
+ {{- '<tool_call>\n{"name": "' }}
61
+ {{- tool_call.name }}
62
+ {{- '", "arguments": ' }}
63
+ {%- if tool_call.arguments is string %}
64
+ {{- tool_call.arguments }}
65
+ {%- else %}
66
+ {{- tool_call.arguments | tojson }}
67
+ {%- endif %}
68
+ {{- '}\n</tool_call>' }}
69
+ {%- endfor %}
70
+ {%- endif %}
71
+ {{- '<|im_end|>\n' }}
72
+ {%- elif message.role == "tool" %}
73
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
74
+ {{- '<|im_start|>user' }}
75
+ {%- endif %}
76
+ {{- '\n<tool_response>\n' }}
77
+ {{- content }}
78
+ {{- '\n</tool_response>' }}
79
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
80
+ {{- '<|im_end|>\n' }}
81
+ {%- endif %}
82
+ {%- endif %}
83
+ {%- endfor %}
84
+ {%- if add_generation_prompt %}
85
+ {{- '<|im_start|>assistant\n' }}
86
+ {%- if enable_thinking is defined and enable_thinking is false %}
87
+ {{- '<think>\n\n</think>\n\n' }}
88
+ {%- endif %}
89
+ {%- endif %}
round2/config.json ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "dtype": "bfloat16",
9
+ "eos_token_id": 151645,
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 2560,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 9728,
15
+ "layer_types": [
16
+ "full_attention",
17
+ "full_attention",
18
+ "full_attention",
19
+ "full_attention",
20
+ "full_attention",
21
+ "full_attention",
22
+ "full_attention",
23
+ "full_attention",
24
+ "full_attention",
25
+ "full_attention",
26
+ "full_attention",
27
+ "full_attention",
28
+ "full_attention",
29
+ "full_attention",
30
+ "full_attention",
31
+ "full_attention",
32
+ "full_attention",
33
+ "full_attention",
34
+ "full_attention",
35
+ "full_attention",
36
+ "full_attention",
37
+ "full_attention",
38
+ "full_attention",
39
+ "full_attention",
40
+ "full_attention",
41
+ "full_attention",
42
+ "full_attention",
43
+ "full_attention",
44
+ "full_attention",
45
+ "full_attention",
46
+ "full_attention",
47
+ "full_attention",
48
+ "full_attention",
49
+ "full_attention",
50
+ "full_attention",
51
+ "full_attention"
52
+ ],
53
+ "max_position_embeddings": 40960,
54
+ "max_window_layers": 36,
55
+ "model_type": "qwen3",
56
+ "num_attention_heads": 32,
57
+ "num_hidden_layers": 36,
58
+ "num_key_value_heads": 8,
59
+ "pad_token_id": null,
60
+ "rms_norm_eps": 1e-06,
61
+ "rope_parameters": {
62
+ "rope_theta": 1000000,
63
+ "rope_type": "default"
64
+ },
65
+ "sliding_window": null,
66
+ "tie_word_embeddings": false,
67
+ "transformers_version": "5.8.0",
68
+ "use_cache": true,
69
+ "use_sliding_window": false,
70
+ "vocab_size": 151735
71
+ }
round2/generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "temperature": 0.6,
10
+ "top_k": 20,
11
+ "top_p": 0.95,
12
+ "transformers_version": "5.8.0"
13
+ }
round2/tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "<|im_start|>",
10
+ "<|im_end|>",
11
+ "<|object_ref_start|>",
12
+ "<|object_ref_end|>",
13
+ "<|box_start|>",
14
+ "<|box_end|>",
15
+ "<|quad_start|>",
16
+ "<|quad_end|>",
17
+ "<|vision_start|>",
18
+ "<|vision_end|>",
19
+ "<|vision_pad|>",
20
+ "<|image_pad|>",
21
+ "<|video_pad|>"
22
+ ],
23
+ "is_local": true,
24
+ "local_files_only": false,
25
+ "model_max_length": 131072,
26
+ "pad_token": "<|endoftext|>",
27
+ "split_special_tokens": false,
28
+ "tokenizer_class": "Qwen2Tokenizer",
29
+ "unk_token": null
30
+ }
teacher_traces/pi1_phaseB_teacher_traces.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
teacher_traces/pi2_phaseA_teacher_traces.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
teacher_traces/pi2_phaseB_teacher_traces.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
teacher_traces/pi3_phaseA_teacher_traces.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
teacher_traces/pi3_phaseB_teacher_traces.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
train_logs/pi1_phaseA.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "losses": [
3
+ 3.488373909890652,
4
+ 2.407912985235453,
5
+ 1.7763818703591823,
6
+ 1.4919575069099664,
7
+ 0.9603039052337408,
8
+ 0.9057566102594137,
9
+ 0.8390422958880663,
10
+ 0.9064983982592821,
11
+ 0.7787921145558357,
12
+ 0.78125288952142,
13
+ 0.8250404935330152,
14
+ 0.8201050635427236,
15
+ 0.8527244713157416,
16
+ 0.7438110310584307,
17
+ 0.9024393826723098,
18
+ 0.8181035034358501,
19
+ 0.8188389737159014,
20
+ 0.8235138654708862,
21
+ 0.8209620092064143,
22
+ 0.8652047950774431,
23
+ 0.7606889367103576,
24
+ 0.884891077503562,
25
+ 0.7458537224680185,
26
+ 0.9200425513088704,
27
+ 0.8810815073549747,
28
+ 0.8085172940045595,
29
+ 0.8518438508734107,
30
+ 0.813002785295248,
31
+ 0.8703699089586735,
32
+ 0.8568720713257789,
33
+ 0.8453036542981863
34
+ ],
35
+ "lrs": [
36
+ 6.666666666666667e-05,
37
+ 9.993008576227247e-05,
38
+ 9.937194443381972e-05,
39
+ 9.826190093588563e-05,
40
+ 9.661236384224129e-05,
41
+ 9.444177243274618e-05,
42
+ 9.177439057064683e-05,
43
+ 8.864003547001915e-05,
44
+ 8.507374438531607e-05,
45
+ 8.111538294891684e-05,
46
+ 7.680919953486048e-05,
47
+ 7.220333063028872e-05,
48
+ 6.734926274378312e-05,
49
+ 6.230125686563068e-05,
50
+ 5.7115741913664264e-05,
51
+ 5.185068394501791e-05,
52
+ 4.6564938185035956e-05,
53
+ 4.131759111665349e-05,
54
+ 3.616729998467365e-05,
55
+ 3.1171637098265064e-05,
56
+ 2.638644626136587e-05,
57
+ 2.1865218525109495e-05,
58
+ 1.7658494240397126e-05,
59
+ 1.3813298094746491e-05,
60
+ 1.037261344883343e-05,
61
+ 7.374901848832683e-06,
62
+ 4.853673085668947e-06,
63
+ 2.8371106072518195e-06,
64
+ 1.3477564710088098e-06,
65
+ 4.02259358460233e-07,
66
+ 1.1188468644907079e-08
67
+ ],
68
+ "wallclock_s": 10715,
69
+ "n_examples": 5000,
70
+ "epochs": 1,
71
+ "mode": "bottleneck",
72
+ "lora_rank": 32,
73
+ "total_opt_steps": 156,
74
+ "num_processes": 2
75
+ }
train_logs/pi1_phaseB.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "losses": [
3
+ 0.5806382041424513,
4
+ 0.5243994968011976,
5
+ 0.49262564033269884,
6
+ 0.4333665704354644,
7
+ 0.373029216285795,
8
+ 0.3593967686872929,
9
+ 0.39847223716787994,
10
+ 0.37207798319868746,
11
+ 0.3787895118817687,
12
+ 0.3720124014187604,
13
+ 0.3636292540933937,
14
+ 0.3601828854967607,
15
+ 0.3554463139735162,
16
+ 0.3865779357030988,
17
+ 0.32862058384343984,
18
+ 0.36783338263630866,
19
+ 0.3428428391227499,
20
+ 0.34551644229795786,
21
+ 0.3680351444054395,
22
+ 0.3469195322133601,
23
+ 0.3622684331610799,
24
+ 0.37623543343506755,
25
+ 0.36850376506336036,
26
+ 0.345283712586388,
27
+ 0.3425974382087588,
28
+ 0.4011214487836696,
29
+ 0.3654101203195751,
30
+ 0.3157559605082497,
31
+ 0.36133123533800243,
32
+ 0.35812310164328665,
33
+ 0.34157210728153586
34
+ ],
35
+ "lrs": [
36
+ 6.666666666666667e-05,
37
+ 9.993008576227247e-05,
38
+ 9.937194443381972e-05,
39
+ 9.826190093588563e-05,
40
+ 9.661236384224129e-05,
41
+ 9.444177243274618e-05,
42
+ 9.177439057064683e-05,
43
+ 8.864003547001915e-05,
44
+ 8.507374438531607e-05,
45
+ 8.111538294891684e-05,
46
+ 7.680919953486048e-05,
47
+ 7.220333063028872e-05,
48
+ 6.734926274378312e-05,
49
+ 6.230125686563068e-05,
50
+ 5.7115741913664264e-05,
51
+ 5.185068394501791e-05,
52
+ 4.6564938185035956e-05,
53
+ 4.131759111665349e-05,
54
+ 3.616729998467365e-05,
55
+ 3.1171637098265064e-05,
56
+ 2.638644626136587e-05,
57
+ 2.1865218525109495e-05,
58
+ 1.7658494240397126e-05,
59
+ 1.3813298094746491e-05,
60
+ 1.037261344883343e-05,
61
+ 7.374901848832683e-06,
62
+ 4.853673085668947e-06,
63
+ 2.8371106072518195e-06,
64
+ 1.3477564710088098e-06,
65
+ 4.02259358460233e-07,
66
+ 1.1188468644907079e-08
67
+ ],
68
+ "wallclock_s": 942,
69
+ "n_examples": 5000,
70
+ "epochs": 1,
71
+ "mode": "distill",
72
+ "lora_rank": 32,
73
+ "total_opt_steps": 156,
74
+ "num_processes": 2
75
+ }
train_logs/pi2_phaseA.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "losses": [
3
+ 0.35318275502650065,
4
+ 0.30725667767983394,
5
+ 0.34089534296654167,
6
+ 0.3377619634848088,
7
+ 0.3068910479894839,
8
+ 0.3112259633140638,
9
+ 0.35649422819260507,
10
+ 0.3305382497259416,
11
+ 0.3459838313516229,
12
+ 0.3343647971516475,
13
+ 0.33608090435154736,
14
+ 0.3339404923695838,
15
+ 0.32868876951106357,
16
+ 0.364680266357027,
17
+ 0.31102049429900946,
18
+ 0.3481577365193516,
19
+ 0.3247197750257328,
20
+ 0.33266925923526286,
21
+ 0.35477414953056724,
22
+ 0.33790302132838407,
23
+ 0.35382214419078084,
24
+ 0.36831805281108243,
25
+ 0.3632078022346832,
26
+ 0.34023994151502845,
27
+ 0.3407576463650912,
28
+ 0.40222772396809886,
29
+ 0.3644849105272442,
30
+ 0.31388317197561266,
31
+ 0.36352235367521646,
32
+ 0.35906730571296064,
33
+ 0.34190756385796706
34
+ ],
35
+ "lrs": [
36
+ 6.666666666666667e-05,
37
+ 9.993008576227247e-05,
38
+ 9.937194443381972e-05,
39
+ 9.826190093588563e-05,
40
+ 9.661236384224129e-05,
41
+ 9.444177243274618e-05,
42
+ 9.177439057064683e-05,
43
+ 8.864003547001915e-05,
44
+ 8.507374438531607e-05,
45
+ 8.111538294891684e-05,
46
+ 7.680919953486048e-05,
47
+ 7.220333063028872e-05,
48
+ 6.734926274378312e-05,
49
+ 6.230125686563068e-05,
50
+ 5.7115741913664264e-05,
51
+ 5.185068394501791e-05,
52
+ 4.6564938185035956e-05,
53
+ 4.131759111665349e-05,
54
+ 3.616729998467365e-05,
55
+ 3.1171637098265064e-05,
56
+ 2.638644626136587e-05,
57
+ 2.1865218525109495e-05,
58
+ 1.7658494240397126e-05,
59
+ 1.3813298094746491e-05,
60
+ 1.037261344883343e-05,
61
+ 7.374901848832683e-06,
62
+ 4.853673085668947e-06,
63
+ 2.8371106072518195e-06,
64
+ 1.3477564710088098e-06,
65
+ 4.02259358460233e-07,
66
+ 1.1188468644907079e-08
67
+ ],
68
+ "wallclock_s": 10542,
69
+ "n_examples": 5000,
70
+ "epochs": 1,
71
+ "mode": "bottleneck",
72
+ "lora_rank": 32,
73
+ "total_opt_steps": 156,
74
+ "num_processes": 2
75
+ }
train_logs/pi2_phaseB.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "losses": [
3
+ 0.30659591216535775,
4
+ 0.263701402238803,
5
+ 0.2917288892320357,
6
+ 0.2979889345937409,
7
+ 0.2673945975548122,
8
+ 0.27389297530753537,
9
+ 0.31817070026881994,
10
+ 0.29204057007445955,
11
+ 0.3173439687816426,
12
+ 0.2996893103467301,
13
+ 0.3109925156692043,
14
+ 0.31033597550704145,
15
+ 0.30306430828932207,
16
+ 0.34484060467220845,
17
+ 0.29502744183409957,
18
+ 0.332037893566303,
19
+ 0.31040111857582814,
20
+ 0.32000000202679074,
21
+ 0.34648944581858815,
22
+ 0.33038036652142183,
23
+ 0.34962533053476363,
24
+ 0.3640994099027012,
25
+ 0.3611091356840916,
26
+ 0.3375700225355104,
27
+ 0.3406386500224471,
28
+ 0.4031711193849333,
29
+ 0.3656635078601539,
30
+ 0.31611953389365227,
31
+ 0.36588236822863107,
32
+ 0.36202733669779263,
33
+ 0.34657563852961176
34
+ ],
35
+ "lrs": [
36
+ 6.666666666666667e-05,
37
+ 9.993008576227247e-05,
38
+ 9.937194443381972e-05,
39
+ 9.826190093588563e-05,
40
+ 9.661236384224129e-05,
41
+ 9.444177243274618e-05,
42
+ 9.177439057064683e-05,
43
+ 8.864003547001915e-05,
44
+ 8.507374438531607e-05,
45
+ 8.111538294891684e-05,
46
+ 7.680919953486048e-05,
47
+ 7.220333063028872e-05,
48
+ 6.734926274378312e-05,
49
+ 6.230125686563068e-05,
50
+ 5.7115741913664264e-05,
51
+ 5.185068394501791e-05,
52
+ 4.6564938185035956e-05,
53
+ 4.131759111665349e-05,
54
+ 3.616729998467365e-05,
55
+ 3.1171637098265064e-05,
56
+ 2.638644626136587e-05,
57
+ 2.1865218525109495e-05,
58
+ 1.7658494240397126e-05,
59
+ 1.3813298094746491e-05,
60
+ 1.037261344883343e-05,
61
+ 7.374901848832683e-06,
62
+ 4.853673085668947e-06,
63
+ 2.8371106072518195e-06,
64
+ 1.3477564710088098e-06,
65
+ 4.02259358460233e-07,
66
+ 1.1188468644907079e-08
67
+ ],
68
+ "wallclock_s": 962,
69
+ "n_examples": 5000,
70
+ "epochs": 1,
71
+ "mode": "distill",
72
+ "lora_rank": 32,
73
+ "total_opt_steps": 156,
74
+ "num_processes": 2
75
+ }