ersanbil commited on
Commit
e6d643c
·
verified ·
1 Parent(s): 42d8b56

Roka v0.2 initial release — decontaminated SFT, Apache 2.0

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ gguf/roka-v0.2-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright [yyyy] [name of copyright owner]
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
README.md ADDED
@@ -0,0 +1,243 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - tr
4
+ - en
5
+ license: apache-2.0
6
+ library_name: transformers
7
+ base_model: AlicanKiraz0/Kara-Kumru-v1.0-2B
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - turkish
11
+ - tool-calling
12
+ - function-calling
13
+ - hermes
14
+ - kara-kumru
15
+ - mistral
16
+ - gguf
17
+ ---
18
+
19
+ # Roka — Turkish Tool-Calling Fine-Tune of Kara-Kumru 2B
20
+
21
+ Roka is a supervised fine-tune of `AlicanKiraz0/Kara-Kumru-v1.0-2B` that teaches a 2B-parameter Turkish language model to use five tools (web search, calculator, date/time, weather, URL reader) via a Hermes-style `<tool_call>…</tool_call>` output format.
22
+
23
+ This is a **v0.2 research preview**, released for reproducibility and community feedback. It is not a production-grade tool-calling agent and has known weaknesses (see *Limitations*).
24
+
25
+ The v0.2 training set is fully decontaminated against the evaluation set: no test-set query appears verbatim in train or validation.
26
+
27
+ ## Model at a glance
28
+
29
+ | | |
30
+ |---|---|
31
+ | **Base model** | `AlicanKiraz0/Kara-Kumru-v1.0-2B` (Mistral architecture, Llama-3 chat template, Turkish-pretrained) |
32
+ | **Upstream base** | `vngrs-ai/Kumru-2B` |
33
+ | **Parameters** | ~2.15B |
34
+ | **Fine-tuning** | Full fine-tuning, 3 epochs, LR 5e-5 linear, bf16, TRL SFTTrainer |
35
+ | **Hardware** | Single NVIDIA A6000 (~65 min / epoch ~22 min) |
36
+ | **Languages** | Primarily Turkish; ~13% of the training mix is English (Glaive-sourced synthetic tool-calling examples) |
37
+ | **License** | Apache 2.0 (inherited from base chain) |
38
+
39
+ ## Tool set
40
+
41
+ | Tool | Description |
42
+ |---|---|
43
+ | `web_search` | Internet search (DuckDuckGo) |
44
+ | `calculator` | Arithmetic expression evaluator |
45
+ | `datetime` | Date/time and calendar arithmetic (9 actions: `today`, `now`, `day_of_week`, `add_days`, `date_diff`, `days_until`, `day_of_year`, `end_of_month`, `days_until_weekday`) |
46
+ | `hava_durumu` | Weather query by city name |
47
+ | `sayfa_oku` | URL content reader |
48
+
49
+ The model is trained to emit tool calls as:
50
+
51
+ ```
52
+ <tool_call>
53
+ {"name": "datetime", "arguments": {"action": "today"}}
54
+ </tool_call>
55
+ ```
56
+
57
+ Tool results are fed back to the model wrapped in `<tool_response>…</tool_response>` inside a user turn, and the model synthesizes a final Turkish answer.
58
+
59
+ ## Evaluation
60
+
61
+ The test set contains 260 Turkish prompts spread over six categories (simple tool calls, fullflow multi-step, parallel, multiple tools, irrelevance, adversarial). Scoring uses an alignment-aware harness (`scripts/rescore_aligned.py`) that normalizes equivalent datetime actions and accepts semantically equivalent arithmetic expressions.
62
+
63
+ ### Overall results (Roka v0.2, April 2026)
64
+
65
+ | View | n | Full-Match | Tool-Call Acc. | Name Acc. | Arg Acc. |
66
+ |---|---|---|---|---|---|
67
+ | **All test (held-out)** | 260 | **73.5%** | 93.1% | 71.9% | 60.6% |
68
+
69
+ Every test query was verified to be absent from both `data/train.jsonl` and `data/val.jsonl`, so the 73.5% number above is a genuinely held-out measurement. See *Decontamination history* below for why this is lower than an earlier, un-decontaminated run.
70
+
71
+ ### Per-subcategory results
72
+
73
+ | Subcategory | n | Full-Match |
74
+ |---|---|---|
75
+ | simple/web_search | 30 | **93.3%** |
76
+ | simple/weather | 20 | **100.0%** |
77
+ | simple/url_reader | 15 | **100.0%** |
78
+ | simple/calculator | 20 | 70.0% |
79
+ | simple/datetime | 15 | 46.7% |
80
+ | fullflow | 35 | **80.0%** |
81
+ | multiple | 45 | 64.4% |
82
+ | parallel | 15 | **0.0%** |
83
+ | adversarial/turkish_special | 10 | 90.0% |
84
+ | adversarial/edge_case | 5 | 40.0% |
85
+ | adversarial/ambiguous | 15 | 26.7% |
86
+ | irrelevance/greeting | 15 | **100.0%** |
87
+ | irrelevance/identity | 10 | **100.0%** |
88
+ | irrelevance/opinion | 10 | **100.0%** |
89
+
90
+ **Parallel tool calls score 0% because the training mix does not contain parallel-call examples.** This is a known gap, not a reproducibility failure.
91
+
92
+ ### Decontamination history
93
+
94
+ During preparation for this release we audited the training set and found that **44 of the 260 test queries appeared verbatim in train/val** (8 in simple/datetime, 6 in simple/web_search, 7 in multiple, and the rest in irrelevance/identity and irrelevance/greeting). We removed all 76 matching train examples and 6 matching val examples, and retrained on the clean split. That retraining is the model reported above.
95
+
96
+ For transparency we also report the before-and-after numbers on the 216 test queries that were **not** affected by the decontamination (i.e., the genuinely held-out subset from the *pre-cleanup* model's perspective):
97
+
98
+ | Model | Training data | Clean-216 FM |
99
+ |---|---|---|
100
+ | v0.1 pre-clean | original (with 76 overlaps) | 78.2% |
101
+ | **v0.2** (released) | decontaminated | **73.6%** |
102
+
103
+ The ~4.6-point drop is informative: it is *not* contamination-inflation. The removed training examples were pattern-providing (datetime variants, fullflow web-search turns, distractor augmentations of the same base queries), and losing them cost about 4.6 points of generalization even on held-out queries. The cost of honest decontamination was larger than the narrow definition of "memorization gain" would predict. We report the post-decontamination number because it is the only one that is defensible as a held-out measurement. A future v0.3 will attempt to recover the gap by adding clean synthetic replacements for the removed examples.
104
+
105
+ ## Development journey (brief)
106
+
107
+ Arriving at the final model required an honest amount of dead-ends.
108
+
109
+ 1. **Baseline (Run 10)** — 62.7% aligned FM with an earlier pipeline, before any of the spec-005 data work.
110
+ 2. **Phase A v1–v4 collapse** — four consecutive training runs where loss converged to near-zero but test-set Full-Match stayed at 0/260. All of them passed `loss` sanity checks, so the failure was invisible from inside the run.
111
+ 3. **Root cause** — TRL issue [#3910](https://github.com/huggingface/trl/issues/3910): the `max_seq_length` argument was silently renamed to `max_length` (default 1024) in TRL 0.20+. Every assistant turn longer than 1024 tokens (≈75% of our fullflow examples) was being truncated before it contributed to the loss. The model trained to completion on fragments, not on full tool-calling traces. Fix: pass `max_length=4096` explicitly.
112
+ 4. **Data iterations**
113
+ - Removed the `unit` argument from all `hava_durumu` training examples (the test set does not supply it). `simple/weather` Full-Match rose from 10% to 100%.
114
+ - Added 45 supplementary `datetime` examples covering `day_of_year`, `end_of_month`, and `days_until_weekday` — test actions that were absent from the R10 training data.
115
+ - Those supplementary examples caused a regression on `day_of_week` queries ("23 Nisan hangi güne denk geliyor?" was mis-routed to `day_of_year`). A targeted set of 30 `day_of_week` contrast examples fixed it.
116
+ 5. **Final v0.1 model** — 4,778 training / 509 validation examples, 795 optimizer steps. Reported 76.9% all-test, 78.2% on the clean-216 subset.
117
+ 6. **v0.2 — decontamination** — 76 train and 6 val examples whose first user turn matched a test query were removed, producing a 4,702 / 503 split. Retraining on this split gave the 73.5% number now reported above. The 4.6-point drop on the clean-216 subset between v0.1 and v0.2 is the cost of honest decontamination — see *Decontamination history*.
118
+
119
+ Total compute used across Phase A and v0.2: ~5 A6000-hours.
120
+
121
+ ## Limitations
122
+
123
+ - **Multi-turn pattern lock-in.** The SFT mix contains very few multi-turn tool-calling sequences. If the user starts with a chit-chat turn ("selam"), the model tends to stay in plain-chat mode on subsequent turns and skip the tool call. The provided `scripts/serve_ui.py` works around this by feeding only the current user message (without prior turns) into the tool-decision loop.
124
+ - **Parallel tool calls: 0%.** Not trained.
125
+ - **`hava_durumu` has no temporal parameter.** Queries like "yarın İstanbul'da hava" still produce `{"city": "İstanbul"}` because that is what the schema allows. The fix is a schema change + data regeneration, not a prompt change.
126
+ - **Adversarial/ambiguous: 40%.** The model is easily nudged off-task by ambiguous phrasing.
127
+ - **Long-passage synthesis is brittle.** When `sayfa_oku` returns several paragraphs, the synthesized summary sometimes fragments quotes in an unnatural way.
128
+ - **Hermes parser coupling.** Native OpenAI-style `tool_calls` parsing via `llama-server` requires the provided `training/roka_tool_template.jinja` chat template and requires the client to pass the full list of 5 tools. Passing a subset confuses llama.cpp's Hermes detector.
129
+ - **Scoring discrepancy.** The in-training `training/eval.py` scorer disagrees slightly with the alignment-aware rescorer. Only the rescored numbers are reported above. Resolving the discrepancy is open work.
130
+
131
+ ## Training data
132
+
133
+ - **4,778 train / 509 validation** examples, Hermes-format chat turns.
134
+ - **~72% Turkish, ~13% English, ~15% short/symbolic.** The English fraction is Glaive-sourced synthetic tool-calling data retained for multi-tool pattern coverage.
135
+ - **Deterministic generators** for `calculator`, `datetime`, `hava_durumu` (in `training/generators/`).
136
+ - **Real DuckDuckGo search results** cached in `data/ddg_cache.json` and used to construct `web_search` fullflow examples.
137
+ - **PII scan**: only two flagged matches in user-facing content, both false positives (embedded WSJ article IDs). No email addresses, Turkish ID numbers, credit cards, or IP addresses found.
138
+
139
+ ## Contamination verification
140
+
141
+ The released v0.2 model is trained on a split where **no test query appears verbatim** in either train or validation. The decontamination script (`scripts/decontaminate.py`) normalizes whitespace and case before matching. The pre-decontamination overlap distribution (all removed in v0.2) was:
142
+
143
+ | Subcategory | Overlap (removed) |
144
+ |---|---|
145
+ | irrelevance/identity | 8 / 10 |
146
+ | irrelevance/greeting | 11 / 15 |
147
+ | simple/datetime | 8 / 15 |
148
+ | simple/web_search | 6 / 30 |
149
+ | multiple | 7 / 45 |
150
+ | adversarial/turkish_special | 1 / 10 |
151
+ | adversarial/opinion | 1 / 10 |
152
+ | simple/weather | 1 / 20 |
153
+ | fullflow | 1 / 35 |
154
+
155
+ Because augmentation variants of each base query (masked/distractor versions) shared the same user turn, removing 44 unique queries deleted 76 train examples and 6 val examples in total. The remaining 4,702 / 503 split is what v0.2 was trained on.
156
+
157
+ This decontamination is **exact-string**, not fuzzy. Near-duplicates (paraphrases that return the same tool call) are still present. Closing the paraphrase loophole requires a more elaborate embedding-based deduplication pass, which is left for v0.3.
158
+
159
+ ## Repository layout
160
+
161
+ ```
162
+ src/ Inference clients (transformers & llama-server)
163
+ training/
164
+ tools.py Tool schemas + training system prompt
165
+ train.py TRL SFTTrainer entry point
166
+ eval.py Test-set scorer (in-training)
167
+ roka_tool_template.jinja llama-server chat template with Hermes detection hook
168
+ generators/ Deterministic data generators per tool
169
+ scripts/
170
+ work_pipeline.py End-to-end pod orchestration
171
+ pod_run_and_dump.py On-pod training → prediction dump → HF upload
172
+ rescore_aligned.py Alignment-aware rescorer (authoritative numbers)
173
+ serve_ui.py FastAPI chat UI wrapping the agent
174
+ data/
175
+ train.jsonl, val.jsonl, test_set.json
176
+ specs/005-post-run10-75/ Spec, plan, and task list for this iteration
177
+ ```
178
+
179
+ ## Reproducibility
180
+
181
+ 1. Clone the repo and install requirements:
182
+ ```bash
183
+ pip install -r requirements.txt
184
+ ```
185
+ 2. Regenerate the training set (deterministic):
186
+ ```bash
187
+ python -m training.build_dataset
188
+ ```
189
+ 3. Train (RunPod-hosted, ~1 GPU-hour on an A6000):
190
+ ```bash
191
+ python -m scripts.work_pipeline
192
+ ```
193
+ 4. Rescore predictions with the alignment-aware harness:
194
+ ```bash
195
+ python -m scripts.rescore_aligned --predictions .work/artifacts/predictions/<run_id>.json
196
+ ```
197
+
198
+ The training recipe is fully specified in `training/config.yaml`. The only hyperparameter that is unusually specific is `max_length: 4096` in `training/train.py` — removing it reproduces the Phase A v1–v4 collapse described above.
199
+
200
+ ## Intended use and out-of-scope use
201
+
202
+ **Intended**: Turkish-language tool-calling agents for well-defined tools, research on small-model function calling, educational demonstrations of the SFT pipeline.
203
+
204
+ **Out of scope**:
205
+ - Safety-critical applications. The model has not been evaluated for harmful-content refusal beyond what Kara-Kumru inherits from its base.
206
+ - Parallel / agentic planning over large tool catalogs.
207
+ - Multi-turn conversational agents that need to preserve long prior context.
208
+ - Any application that requires the model to use tools not present in the training schema.
209
+
210
+ ## License
211
+
212
+ This repository and the released weights are distributed under the **Apache License 2.0**, inherited from both `AlicanKiraz0/Kara-Kumru-v1.0-2B` and its upstream base `vngrs-ai/Kumru-2B`. See `LICENSE`.
213
+
214
+ ## Citation
215
+
216
+ If you use Roka in research, please cite both the base model and this work:
217
+
218
+ ```bibtex
219
+ @misc{roka_2026,
220
+ title = {Roka: Turkish Tool-Calling Fine-Tune of Kara-Kumru 2B},
221
+ author = {Bilgin, Ersan},
222
+ year = {2026},
223
+ url = {https://huggingface.co/ersanbil/roka}
224
+ }
225
+
226
+ @misc{karakumru_2025,
227
+ title = {Kara-Kumru-v1.0-2B},
228
+ author = {Kiraz, Alican},
229
+ year = {2025},
230
+ url = {https://huggingface.co/AlicanKiraz0/Kara-Kumru-v1.0-2B}
231
+ }
232
+ ```
233
+
234
+ ## Acknowledgements
235
+
236
+ - **vngrs-ai** for the open Turkish base model `Kumru-2B`.
237
+ - **Alican Kiraz** for the Turkish-conversational fine-tune `Kara-Kumru-v1.0-2B`.
238
+ - **Hugging Face TRL / Unsloth** for the training stack.
239
+ - **Glaive-AI function-calling dataset** for the English portion of the multi-tool synthetic mix.
240
+
241
+ ## Contact and feedback
242
+
243
+ Issues and pull requests are welcome on the GitHub mirror. This is a research preview — please file bugs for any behavior that contradicts the documented limitations above; those are the interesting cases.
chat_template.jinja ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>
2
+
3
+ '+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>
4
+
5
+ ' }}{% endif %}
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MistralForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 2,
7
+ "dtype": "bfloat16",
8
+ "eos_token_id": 3,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 3072,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 10752,
14
+ "max_position_embeddings": 8192,
15
+ "model_type": "mistral",
16
+ "num_attention_heads": 16,
17
+ "num_hidden_layers": 18,
18
+ "num_key_value_heads": 4,
19
+ "pad_token_id": 0,
20
+ "rms_norm_eps": 1e-05,
21
+ "rope_parameters": {
22
+ "rope_theta": 500000,
23
+ "rope_type": "default"
24
+ },
25
+ "sliding_window": null,
26
+ "tie_word_embeddings": false,
27
+ "transformers_version": "5.5.4",
28
+ "use_cache": false,
29
+ "vocab_size": 50180
30
+ }
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 2,
4
+ "eos_token_id": [
5
+ 3
6
+ ],
7
+ "pad_token_id": 0,
8
+ "transformers_version": "5.5.4"
9
+ }
gguf/roka-v0.2-Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e565e14a0cc4c5b0de9376fd7838143bdc09321e5cbf0845a6cf14c4790b7671
3
+ size 1458575232
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8bf89c2de19eaef8351c7c9d34d6dc6d696fa2d79898d187420192d57cd7519c
3
+ size 4750344848
roka_tool_template.jinja ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {#-
2
+ Roka tool-calling chat template — Llama-3 header format + Hermes-style <tool_call> parsing.
3
+
4
+ Compatible with llama.cpp's Hermes-2-Pro output parser (detected by the literal
5
+ "<tool_call>" string in this template source). Uses Kara-Kumru's Llama-3 special
6
+ tokens: <|start_header_id|>, <|end_header_id|>, <|eot_id|>.
7
+
8
+ Design notes:
9
+ - The model was fine-tuned with an exact Turkish system prompt from
10
+ `training/tools.py::get_system_prompt()`. That prompt already contains the
11
+ tool schemas and calling conventions. We therefore DO NOT synthesize a
12
+ generic Hermes-English tool block from `tools=[...]` — doing so made the
13
+ model go meta / mis-route, because it never saw that format in training.
14
+ - Callers should pass the training system prompt via `messages[0]` (role=system).
15
+ Any `tools=[...]` parameter is accepted but ignored when building the prompt;
16
+ its only job is to let llama.cpp expose the parsed result via OpenAI's
17
+ `tool_calls` field.
18
+ - The literal string "<tool_call>" below (inside a comment hint) is what
19
+ llama.cpp uses to detect Hermes-2-Pro output format: do not remove it.
20
+
21
+ Format hint for llama.cpp detector: the assistant wraps tool calls as
22
+ <tool_call>{"name": "...", "arguments": {...}}</tool_call>.
23
+ -#}
24
+ {%- macro llama3_header(role) -%}
25
+ <|start_header_id|>{{ role }}<|end_header_id|>
26
+
27
+ {% endmacro -%}
28
+
29
+ {{- bos_token -}}
30
+
31
+ {%- for message in messages -%}
32
+ {%- if message.role == 'system' -%}
33
+ {{- llama3_header('system') -}}
34
+ {{- message.content | trim -}}
35
+ <|eot_id|>
36
+ {%- elif message.role == 'user' -%}
37
+ {{- llama3_header('user') -}}
38
+ {{- message.content | trim -}}
39
+ <|eot_id|>
40
+ {%- elif message.role == 'assistant' -%}
41
+ {{- llama3_header('assistant') -}}
42
+ {%- if message.content -%}
43
+ {{- message.content | trim -}}
44
+ {%- endif -%}
45
+ {%- if message.tool_calls is defined and message.tool_calls -%}
46
+ {%- for tc in message.tool_calls -%}
47
+ {%- if tc.function is defined -%}
48
+ {%- set call = tc.function -%}
49
+ {%- else -%}
50
+ {%- set call = tc -%}
51
+ {%- endif %}
52
+ <tool_call>
53
+ {%- if call.arguments is string %}
54
+ {"name": "{{ call.name }}", "arguments": {{ call.arguments }}}
55
+ {%- else %}
56
+ {{ {'name': call.name, 'arguments': call.arguments} | tojson }}
57
+ {%- endif %}
58
+ </tool_call>
59
+ {%- endfor -%}
60
+ {%- endif -%}
61
+ <|eot_id|>
62
+ {%- elif message.role == 'tool' -%}
63
+ {{- llama3_header('user') -}}
64
+ <tool_response>
65
+ {{ message.content }}
66
+ </tool_response>
67
+ <|eot_id|>
68
+ {%- endif -%}
69
+ {%- endfor -%}
70
+
71
+ {%- if add_generation_prompt -%}
72
+ {{- llama3_header('assistant') -}}
73
+ {%- endif -%}
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "bos_token": "<BOS>",
4
+ "clean_up_tokenization_spaces": true,
5
+ "eos_token": "<EOS>",
6
+ "extra_special_tokens": [
7
+ "<tool_call>",
8
+ "</tool_call>",
9
+ "<tool_response>",
10
+ "</tool_response>"
11
+ ],
12
+ "is_local": false,
13
+ "model_max_length": 1000000000000000019884624838656,
14
+ "pad_token": "<PAD>",
15
+ "tokenizer_class": "TokenizersBackend",
16
+ "unk_token": "<UNK>"
17
+ }