athena129 commited on
Commit
04b86fd
·
verified ·
1 Parent(s): 543338e

Initial release: Gemma4Defense-2B v3.4 (private)

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,271 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ license_link: https://ai.google.dev/gemma/terms
4
+ library_name: transformers
5
+ pipeline_tag: text-generation
6
+ base_model: google/gemma-4-E2B-it
7
+ tags:
8
+ - cybersecurity
9
+ - cti
10
+ - cwe-classification
11
+ - vulnerability-analysis
12
+ - security
13
+ - lora
14
+ - peft
15
+ language:
16
+ - en
17
+ metrics:
18
+ - accuracy
19
+ model-index:
20
+ - name: Gemma4Defense-2B
21
+ results:
22
+ - task:
23
+ type: text-classification
24
+ name: CWE Classification (CTI-RCM)
25
+ dataset:
26
+ name: CTI-Bench
27
+ type: cti-bench
28
+ split: cti-rcm
29
+ metrics:
30
+ - type: accuracy
31
+ value: 0.6754
32
+ name: strict_acc (5-trial mean)
33
+ verified: false
34
+ - task:
35
+ type: multiple-choice
36
+ name: Cyber Threat Intel Multiple Choice (CTI-MCQ)
37
+ dataset:
38
+ name: CTI-Bench
39
+ type: cti-bench
40
+ split: cti-mcq
41
+ metrics:
42
+ - type: accuracy
43
+ value: 0.6042
44
+ name: strict_acc (5-trial mean)
45
+ verified: false
46
+ ---
47
+
48
+ # Gemma4Defense-2B — Model Card
49
+
50
+ ## Model Information
51
+
52
+ Gemma4Defense-2B is a 2.3B-parameter language model specialized for defensive cybersecurity tasks, fine-tuned from Google's [Gemma-4-E2B-it](https://huggingface.co/google/gemma-4-E2B-it). It is purpose-built for two evaluation skills measured by [CTI-Bench](https://github.com/xashru/cti-bench): mapping CVE descriptions to their CWE category (CTI-RCM) and answering cyber threat intelligence multiple-choice questions (CTI-MCQ).
53
+
54
+ Under the evaluation protocol of [Foundation-Sec-8B (arXiv:2504.21039)](https://arxiv.org/abs/2504.21039), Gemma4Defense-2B retains **98.6% of Foundation-Sec-Instruct-8B's CTI-RCM accuracy** while exceeding its CTI-MCQ by **+10.5 points**, at approximately one-quarter the parameter count.
55
+
56
+ A companion model trained with the same recipe on Qwen3-4B-Instruct-2507 — [CyberSecQwen-4B](https://huggingface.co/athena129/CyberSecQwen-4B) — converges to the same CTI-RCM accuracy within 0.9 points (0.6664 vs 0.6754), demonstrating that the result is recipe-driven rather than substrate-specific.
57
+
58
+ | | |
59
+ |---|---|
60
+ | Base model | google/gemma-4-E2B-it |
61
+ | Parameters | 2.3B effective |
62
+ | Architecture | Gemma-4 (text + vision + audio; fine-tuned for text-only inference) |
63
+ | Adapter | LoRA r=64, alpha=64, dropout=0.05 |
64
+ | Precision | bfloat16 |
65
+ | Languages | English |
66
+ | License | Gemma Terms of Use |
67
+
68
+ ## Intended Use
69
+
70
+ ### Intended Use Cases
71
+
72
+ Gemma4Defense-2B is intended for security practitioners, researchers, and engineers working on:
73
+
74
+ - **CWE classification** — mapping vulnerability descriptions (CVEs, advisories) to MITRE CWE categories
75
+ - **Cyber threat intelligence Q&A** — answering structured questions about cybersecurity concepts, attacks, controls
76
+ - **Defensive analysis assistants** — supporting human analysts who triage CVEs, prioritize patches, or document threat-actor behavior
77
+ - **Cybersecurity benchmarking** — as a reference for compact-model performance on CTI-Bench RCM/MCQ subsets
78
+
79
+ ### Downstream Use
80
+
81
+ The model can be used as a building block in:
82
+
83
+ - Security operations center (SOC) ticket triage tools that suggest a likely CWE for an incoming CVE
84
+ - Vulnerability management dashboards that pre-classify CVE feeds before human review
85
+ - Educational tutoring assistants for cybersecurity coursework grounded in CTI-Bench-style content
86
+ - Internal cyber knowledge bases / chat assistants for security teams
87
+
88
+ ### Out-of-Scope Use
89
+
90
+ The following uses are out-of-scope and are neither recommended nor intended use cases:
91
+
92
+ 1. **Generating harmful content** — the model must not be used to produce exploit code, weaponized proof-of-concept payloads, attacker tradecraft, or instructions that materially aid offensive operations.
93
+ 2. **Critical security decisions without human oversight** — the model should not auto-execute remediation, blocklist updates, account lockouts, or any action whose reversal carries cost; outputs are advisory and require qualified human review.
94
+ 3. **Legal or medical advice** — the model is trained on cybersecurity domain content and is not appropriate for legal, medical, or other regulated-advice contexts.
95
+ 4. **Non-security use cases** — general chat, code generation, summarization, translation, or other domains outside its specialization will produce lower-quality output than purpose-built models.
96
+ 5. **Violation of laws or regulations** — including but not limited to unauthorized vulnerability scanning, illegal data access, or misuse contrary to applicable cybersecurity statutes (CFAA, GDPR, etc.).
97
+
98
+ ## How to Get Started with the Model
99
+
100
+ ```python
101
+ from transformers import AutoModelForCausalLM, AutoTokenizer
102
+ import torch
103
+
104
+ model_id = "athena129/Gemma4Defense-2B"
105
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
106
+ model = AutoModelForCausalLM.from_pretrained(
107
+ model_id,
108
+ torch_dtype=torch.bfloat16,
109
+ device_map="auto",
110
+ )
111
+
112
+ cve = ("A deserialization vulnerability in the destruct() function of Laravel "
113
+ "v8.5.9 allows attackers to execute arbitrary commands.")
114
+
115
+ messages = [{
116
+ "role": "user",
117
+ "content": (
118
+ "Analyze the following CVE description and map it to the appropriate CWE. "
119
+ "Provide a brief justification for your choice. "
120
+ "Ensure the last line of your response contains only the CWE ID.\n\n"
121
+ f"CVE Description: {cve}"
122
+ ),
123
+ }]
124
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
125
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
126
+ output = model.generate(**inputs, max_new_tokens=256, temperature=0.3, do_sample=True)
127
+ print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
128
+ ```
129
+
130
+ ## Training and Evaluation
131
+
132
+ ### Training Data
133
+
134
+ The model was trained on a combined cybersecurity corpus of approximately **12,500 supervised records**:
135
+
136
+ - **CTI-RCM 2021 (decontaminated)** — CVE → CWE classification examples drawn from MITRE/NVD public records dated 2021. Items appearing in the CTI-Bench evaluation splits were explicitly removed prior to training. (~6,776 records)
137
+ - **CVE / CTI synthetic Q&A** — defensive-analyst-style cyber question–answer pairs grounded in CVE descriptions, designed to teach domain reasoning while preserving terse-answer formats. (~5,776 records)
138
+
139
+ Decontamination matters here: an earlier internal version (v3) of this work showed roughly 72% test-set overlap when trained on undeduplicated CTI corpora, producing inflated CTI-RCM scores that did not generalize. The released v3.4 model trains exclusively on the 2021 cohort with overlap items removed.
140
+
141
+ ### Training Setup
142
+
143
+ | Hyperparameter | Value |
144
+ |---|---|
145
+ | Adapter | LoRA, r=64, alpha=64, dropout=0.05 |
146
+ | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
147
+ | Learning rate | 5e-5 |
148
+ | Schedule | cosine, warmup_ratio=0.05 |
149
+ | Weight decay | 0.01 |
150
+ | Per-device batch size | 2 |
151
+ | Gradient accumulation | 8 (effective batch = 16) |
152
+ | Epochs | 10 (cumulative across v3.1 → v3.4 incremental training, with adapter resumption) |
153
+ | Max sequence length | 4096 |
154
+ | Precision | bfloat16 |
155
+ | Attention implementation | sdpa |
156
+ | Random seed | 42 |
157
+
158
+ Notes on attention: Gemma-4 has dual head_dim per layer (256 on sliding-attention layers, 512 on global-attention layers). On AMD MI300X (gfx942), FlashAttention-2 via Composable Kernels is bounded at head_dim=256 by the hardware shared-memory budget, so this model was trained with PyTorch's `sdpa` implementation rather than FA2. The companion CyberSecQwen-4B model uses FA2 because Qwen3-4B's head_dim=128 fits within the limit.
159
+
160
+ The base model was Gemma-4-E2B-it, an instruction-tuned variant. Training was performed on AMD MI300X 192GB hardware via the AMD Developer Cloud, using PyTorch + ROCm + Hugging Face transformers, peft, and trl 0.29.1 inside the official `vllm/vllm-openai-rocm` Docker image.
161
+
162
+ ### Evaluation
163
+
164
+ Evaluated under the [Foundation-Sec-8B protocol (arXiv:2504.21039 §B.3-B.4)](https://arxiv.org/abs/2504.21039): zero-shot for instruction-tuned models, 5-shot for pretrained base models, dataset's own `Prompt` column as the user message, no system prompt, temperature 0.3, max-tokens 512, concurrency 32. Reported numbers are the mean of **5 independent trials** with random sampling seeds; standard deviations are reported alongside.
165
+
166
+ #### Headline result
167
+
168
+ | Benchmark | Metric | Gemma4Defense-2B | Foundation-Sec-Instruct-8B | Δ |
169
+ |---|---|---:|---:|---:|
170
+ | **CTI-MCQ** (2,500 items) | strict_acc, 5-trial mean ± std | **0.6042 ± 0.0090** | 0.4996 | **+10.5 pp ✅** |
171
+ | **CTI-RCM** (1,000 items) | strict_acc, 5-trial mean ± std | **0.6754 ± 0.0035** | 0.6850 | -1.0 pp (within ~3σ of measurement noise) |
172
+
173
+ #### Pre / post fine-tune comparison
174
+
175
+ The improvement attributable to this fine-tune over its starting checkpoint:
176
+
177
+ | Stage | CTI-RCM | CTI-MCQ |
178
+ |---|---:|---:|
179
+ | Gemma-4-E2B-it (raw, instruction-tuned base) | 0.580 | 0.578 |
180
+ | **Gemma4Defense-2B (this fine-tune)** | **0.6754** | **0.6042** |
181
+ | **Lift** | **+9.5 pp** | **+2.6 pp** |
182
+
183
+ The CTI-MCQ lift is intentionally small in absolute terms: Gemma-4-E2B-it already has strong multiple-choice format priors, and the fine-tune is designed to preserve that ability while specializing on CTI-RCM rather than displacing it. The much smaller `instruction-tuned-then-domain-SFT` displacement effect is documented in the project's accompanying lessons.
184
+
185
+ #### Comparison to other cybersecurity-relevant models we evaluated
186
+
187
+ All numbers below were measured by us under the protocol above (with the noted shot count), not quoted from third-party papers. CyberPal-2.0-20B numbers reflect a single-trial run at our protocol — its own paper reports 0.874 / 0.757 using a different prompt template (Figure 11 of arXiv:2510.14113); the +2pp MCQ match validated our harness, while the RCM gap likely reflects the template difference.
188
+
189
+ | Model | Size | CTI-RCM | CTI-MCQ | Notes |
190
+ |---|---:|---:|---:|---|
191
+ | Foundation-Sec-8B (base) | 8B | 0.745 | 0.655 | 5-shot pretrained reference |
192
+ | Foundation-Sec-Instruct-8B | 8B | **0.685** | **0.500** | 0-shot, our TARGET |
193
+ | CyberPal-2.0-20B (cyber-pal-security/CyberOss-2.0-20B) | 20B | 0.728* | 0.738* | independently verified at our protocol |
194
+ | **Gemma4Defense-2B** (this model) | 2.3B | **0.6754 ± 0.0035** | **0.6042 ± 0.0090** | 5-trial mean ± std |
195
+ | [CyberSecQwen-4B](https://huggingface.co/athena129/CyberSecQwen-4B) (companion) | 4B | 0.6664 ± 0.0023 | 0.5868 ± 0.0029 | same recipe, different substrate |
196
+ | Gemma-4-E4B-it (raw) | 5.1B effective | 0.618 | 0.666 | 0-shot |
197
+ | Gemma-4-E2B-it (raw) | 2.3B | 0.580 | 0.578 | 0-shot, our base |
198
+ | Gemma-4-E4B-base (raw) | 5.1B effective | 0.588 | 0.666 | 5-shot |
199
+ | Gemma-4-E2B-base (raw) | 2.3B | 0.490 | 0.570 | 5-shot |
200
+
201
+ \* Single-trial values from our independent reproduction.
202
+
203
+ #### Key highlights
204
+
205
+ - Beats Foundation-Sec-Instruct-8B on CTI-MCQ by +10.5 points at approximately one-quarter the parameter count.
206
+ - Stays within ~1 point of Foundation-Sec-Instruct-8B on CTI-RCM under the same evaluation protocol.
207
+ - Cross-substrate companion ([CyberSecQwen-4B](https://huggingface.co/athena129/CyberSecQwen-4B)) reproduces the CTI-RCM result within 0.9 points using the same recipe on a different model family.
208
+ - Independent reproduction of CyberPal-2.0-20B at the Foundation-Sec protocol confirms its CTI-MCQ accuracy within 2 points of its paper claim.
209
+
210
+ ## Limitations
211
+
212
+ 1. **Domain-specific knowledge limitations.** The model is trained on cybersecurity domain text and is not a general assistant. Tasks outside this domain will produce lower-quality output than purpose-built general models.
213
+
214
+ 2. **Time-anchored training data.** The CTI-RCM training cohort is drawn from 2021 records. Vulnerability classes that emerged or rose in prevalence after 2021 (e.g., AI/ML-specific weaknesses, recent supply-chain CWEs) are under-represented in training and will be classified less accurately.
215
+
216
+ 3. **English-only.** All training and evaluation data are in English; multilingual cyber tasks will degrade.
217
+
218
+ 4. **CTI-RCM gap.** Foundation-Sec-Instruct-8B remains slightly stronger on CTI-RCM under this protocol (-1.0 point gap, within multi-trial measurement noise but still real). Production deployments where CWE classification is the primary metric should benchmark both models on their specific input distribution.
219
+
220
+ 5. **No safety RLHF.** The model is supervised-fine-tuned only; the training data emphasizes defensive-analyst framing but no formal reinforcement-learning safety alignment was applied.
221
+
222
+ 6. **Multimodal architecture inherited.** Gemma-4 ships as a multimodal base with vision and audio towers. This release contains only the text-language-model weights extracted post-merge; downstream tooling that expects the multimodal config should consume the published `Gemma4ForCausalLM` config (already declared in the repo).
223
+
224
+ ### Recommendations
225
+
226
+ 1. **Always have qualified security professionals review model outputs before implementation** for any operational use case (patch prioritization, ticket routing, blocklisting).
227
+ 2. **Use this model as an assistive tool rather than a replacement for expert human judgment**, especially for novel vulnerability classes outside the 2021 training cohort.
228
+ 3. **Validate on your own input distribution** before deployment. Public CTI-Bench performance does not perfectly transfer to internal advisory feeds, vendor-proprietary CWE taxonomies, or non-English content.
229
+ 4. **Monitor for drift.** As new CVE / CWE patterns emerge, periodically re-evaluate; consider supplementing with retrieval over a current vulnerability knowledge base for time-sensitive queries.
230
+ 5. **Apply standard prompt-injection mitigations** when wrapping the model in agentic workflows that accept external content (advisory feeds, scraped pages); domain-SFT does not confer prompt-injection resistance.
231
+
232
+ ## Companion Model
233
+
234
+ [CyberSecQwen-4B](https://huggingface.co/athena129/CyberSecQwen-4B) is a sister release fine-tuned with the same training corpus and hyperparameters, on the Qwen3-4B-Instruct-2507 base. The two models converge to within 0.9 points on CTI-RCM (0.6754 Gemma vs 0.6664 Qwen, 5-trial mean) — the same recipe produces equivalent task performance across two distinct model families. The Qwen variant is licensed Apache 2.0 and is available for use cases where the Gemma terms are not a fit.
235
+
236
+ ## Citation
237
+
238
+ If you use this model, please cite:
239
+
240
+ ```bibtex
241
+ @misc{gemma4defense2026,
242
+ title = {Gemma4Defense-2B: A Compact CTI Specialist Fine-Tuned from Gemma-4-E2B-it},
243
+ author = {Mulia, Samuel},
244
+ year = {2026},
245
+ publisher = {Hugging Face},
246
+ url = {https://huggingface.co/athena129/Gemma4Defense-2B}
247
+ }
248
+ ```
249
+
250
+ The evaluation protocol is from:
251
+
252
+ ```bibtex
253
+ @article{foundation-sec-8b,
254
+ title = {Foundation-Sec-8B: A Cybersecurity-Specialized Language Model},
255
+ author = {Cisco Foundation AI},
256
+ journal = {arXiv preprint arXiv:2504.21039},
257
+ year = {2025},
258
+ url = {https://arxiv.org/abs/2504.21039}
259
+ }
260
+ ```
261
+
262
+ The benchmark is from:
263
+
264
+ ```bibtex
265
+ @misc{cti-bench,
266
+ title = {CTI-Bench: A Benchmark Suite for Cybersecurity LLMs},
267
+ author = {Alam, Md Tanvirul and Bhusal, Dipkamal and Park, Youngja and Rastogi, Nidhi},
268
+ year = {2024},
269
+ url = {https://github.com/xashru/cti-bench}
270
+ }
271
+ ```
chat_template.jinja ADDED
@@ -0,0 +1,351 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- macro format_parameters(properties, required, filter_keys=false) -%}
2
+ {%- set standard_keys = ['description', 'type', 'properties', 'required', 'nullable'] -%}
3
+ {%- set ns = namespace(found_first=false) -%}
4
+ {%- for key, value in properties | dictsort -%}
5
+ {%- set add_comma = false -%}
6
+ {%- if not filter_keys or key not in standard_keys -%}
7
+ {%- if ns.found_first %},{% endif -%}
8
+ {%- set ns.found_first = true -%}
9
+ {{ key }}:{
10
+ {%- if value['description'] -%}
11
+ description:<|"|>{{ value['description'] }}<|"|>
12
+ {%- set add_comma = true -%}
13
+ {%- endif -%}
14
+ {%- if value['type'] | upper == 'STRING' -%}
15
+ {%- if value['enum'] -%}
16
+ {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
17
+ enum:{{ format_argument(value['enum']) }}
18
+ {%- endif -%}
19
+ {%- elif value['type'] | upper == 'ARRAY' -%}
20
+ {%- if value['items'] is mapping and value['items'] -%}
21
+ {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
22
+ items:{
23
+ {%- set ns_items = namespace(found_first=false) -%}
24
+ {%- for item_key, item_value in value['items'] | dictsort -%}
25
+ {%- if item_value is not none -%}
26
+ {%- if ns_items.found_first %},{% endif -%}
27
+ {%- set ns_items.found_first = true -%}
28
+ {%- if item_key == 'properties' -%}
29
+ properties:{
30
+ {%- if item_value is mapping -%}
31
+ {{- format_parameters(item_value, value['items']['required'] | default([])) -}}
32
+ {%- endif -%}
33
+ }
34
+ {%- elif item_key == 'required' -%}
35
+ required:[
36
+ {%- for req_item in item_value -%}
37
+ <|"|>{{- req_item -}}<|"|>
38
+ {%- if not loop.last %},{% endif -%}
39
+ {%- endfor -%}
40
+ ]
41
+ {%- elif item_key == 'type' -%}
42
+ {%- if item_value is string -%}
43
+ type:{{ format_argument(item_value | upper) }}
44
+ {%- else -%}
45
+ type:{{ format_argument(item_value | map('upper') | list) }}
46
+ {%- endif -%}
47
+ {%- else -%}
48
+ {{ item_key }}:{{ format_argument(item_value) }}
49
+ {%- endif -%}
50
+ {%- endif -%}
51
+ {%- endfor -%}
52
+ }
53
+ {%- endif -%}
54
+ {%- endif -%}
55
+ {%- if value['nullable'] %}
56
+ {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
57
+ nullable:true
58
+ {%- endif -%}
59
+ {%- if value['type'] | upper == 'OBJECT' -%}
60
+ {%- if value['properties'] is defined and value['properties'] is mapping -%}
61
+ {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
62
+ properties:{
63
+ {{- format_parameters(value['properties'], value['required'] | default([])) -}}
64
+ }
65
+ {%- elif value is mapping -%}
66
+ {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
67
+ properties:{
68
+ {{- format_parameters(value, value['required'] | default([]), filter_keys=true) -}}
69
+ }
70
+ {%- endif -%}
71
+ {%- if value['required'] -%}
72
+ {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
73
+ required:[
74
+ {%- for item in value['required'] | default([]) -%}
75
+ <|"|>{{- item -}}<|"|>
76
+ {%- if not loop.last %},{% endif -%}
77
+ {%- endfor -%}
78
+ ]
79
+ {%- endif -%}
80
+ {%- endif -%}
81
+ {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
82
+ type:<|"|>{{ value['type'] | upper }}<|"|>}
83
+ {%- endif -%}
84
+ {%- endfor -%}
85
+ {%- endmacro -%}
86
+ {%- macro format_function_declaration(tool_data) -%}
87
+ declaration:{{- tool_data['function']['name'] -}}{description:<|"|>{{- tool_data['function']['description'] -}}<|"|>
88
+ {%- set params = tool_data['function']['parameters'] -%}
89
+ {%- if params -%}
90
+ ,parameters:{
91
+ {%- if params['properties'] -%}
92
+ properties:{ {{- format_parameters(params['properties'], params['required']) -}} },
93
+ {%- endif -%}
94
+ {%- if params['required'] -%}
95
+ required:[
96
+ {%- for item in params['required'] -%}
97
+ <|"|>{{- item -}}<|"|>
98
+ {{- ',' if not loop.last -}}
99
+ {%- endfor -%}
100
+ ],
101
+ {%- endif -%}
102
+ {%- if params['type'] -%}
103
+ type:<|"|>{{- params['type'] | upper -}}<|"|>}
104
+ {%- endif -%}
105
+ {%- endif -%}
106
+ {%- if 'response' in tool_data['function'] -%}
107
+ {%- set response_declaration = tool_data['function']['response'] -%}
108
+ ,response:{
109
+ {%- if response_declaration['description'] -%}
110
+ description:<|"|>{{- response_declaration['description'] -}}<|"|>,
111
+ {%- endif -%}
112
+ {%- if response_declaration['type'] | upper == 'OBJECT' -%}
113
+ type:<|"|>{{- response_declaration['type'] | upper -}}<|"|>}
114
+ {%- endif -%}
115
+ {%- endif -%}
116
+ }
117
+ {%- endmacro -%}
118
+ {%- macro format_argument(argument, escape_keys=True) -%}
119
+ {%- if argument is string -%}
120
+ {{- '<|"|>' + argument + '<|"|>' -}}
121
+ {%- elif argument is boolean -%}
122
+ {{- 'true' if argument else 'false' -}}
123
+ {%- elif argument is mapping -%}
124
+ {{- '{' -}}
125
+ {%- set ns = namespace(found_first=false) -%}
126
+ {%- for key, value in argument | dictsort -%}
127
+ {%- if ns.found_first %},{% endif -%}
128
+ {%- set ns.found_first = true -%}
129
+ {%- if escape_keys -%}
130
+ {{- '<|"|>' + key + '<|"|>' -}}
131
+ {%- else -%}
132
+ {{- key -}}
133
+ {%- endif -%}
134
+ :{{- format_argument(value, escape_keys=escape_keys) -}}
135
+ {%- endfor -%}
136
+ {{- '}' -}}
137
+ {%- elif argument is sequence -%}
138
+ {{- '[' -}}
139
+ {%- for item in argument -%}
140
+ {{- format_argument(item, escape_keys=escape_keys) -}}
141
+ {%- if not loop.last %},{% endif -%}
142
+ {%- endfor -%}
143
+ {{- ']' -}}
144
+ {%- else -%}
145
+ {{- argument -}}
146
+ {%- endif -%}
147
+ {%- endmacro -%}
148
+ {%- macro strip_thinking(text) -%}
149
+ {%- set ns = namespace(result='') -%}
150
+ {%- for part in text.split('<channel|>') -%}
151
+ {%- if '<|channel>' in part -%}
152
+ {%- set ns.result = ns.result + part.split('<|channel>')[0] -%}
153
+ {%- else -%}
154
+ {%- set ns.result = ns.result + part -%}
155
+ {%- endif -%}
156
+ {%- endfor -%}
157
+ {{- ns.result | trim -}}
158
+ {%- endmacro -%}
159
+
160
+ {%- macro format_tool_response_block(tool_name, response) -%}
161
+ {{- '<|tool_response>' -}}
162
+ {%- if response is mapping -%}
163
+ {{- 'response:' + tool_name + '{' -}}
164
+ {%- for key, value in response | dictsort -%}
165
+ {{- key -}}:{{- format_argument(value, escape_keys=False) -}}
166
+ {%- if not loop.last %},{% endif -%}
167
+ {%- endfor -%}
168
+ {{- '}' -}}
169
+ {%- else -%}
170
+ {{- 'response:' + tool_name + '{value:' + format_argument(response, escape_keys=False) + '}' -}}
171
+ {%- endif -%}
172
+ {{- '<tool_response|>' -}}
173
+ {%- endmacro -%}
174
+
175
+ {%- set ns = namespace(prev_message_type=None) -%}
176
+ {%- set loop_messages = messages -%}
177
+ {{- bos_token -}}
178
+ {#- Handle System/Tool Definitions Block -#}
179
+ {%- if (enable_thinking is defined and enable_thinking) or tools or messages[0]['role'] in ['system', 'developer'] -%}
180
+ {{- '<|turn>system\n' -}}
181
+ {#- Inject Thinking token at the very top of the FIRST system turn -#}
182
+ {%- if enable_thinking is defined and enable_thinking -%}
183
+ {{- '<|think|>\n' -}}
184
+ {%- set ns.prev_message_type = 'think' -%}
185
+ {%- endif -%}
186
+ {%- if messages[0]['role'] in ['system', 'developer'] -%}
187
+ {%- if messages[0]['content'] is string -%}
188
+ {{- messages[0]['content'] | trim -}}
189
+ {%- elif messages[0]['content'] is sequence -%}
190
+ {%- for item in messages[0]['content'] -%}
191
+ {{- item['text'] | trim + ' '-}}
192
+ {%- endfor -%}
193
+ {%- endif -%}
194
+ {%- set loop_messages = messages[1:] -%}
195
+ {%- endif -%}
196
+ {%- if tools -%}
197
+ {%- for tool in tools %}
198
+ {{- '<|tool>' -}}
199
+ {{- format_function_declaration(tool) | trim -}}
200
+ {{- '<tool|>' -}}
201
+ {%- endfor %}
202
+ {%- set ns.prev_message_type = 'tool' -%}
203
+ {%- endif -%}
204
+ {{- '<turn|>\n' -}}
205
+ {%- endif %}
206
+
207
+ {#- Pre-scan: find last user message index for reasoning guard -#}
208
+ {%- set ns_turn = namespace(last_user_idx=-1) -%}
209
+ {%- for i in range(loop_messages | length) -%}
210
+ {%- if loop_messages[i]['role'] == 'user' -%}
211
+ {%- set ns_turn.last_user_idx = i -%}
212
+ {%- endif -%}
213
+ {%- endfor -%}
214
+
215
+ {#- Loop through messages -#}
216
+ {%- for message in loop_messages -%}
217
+ {%- if message['role'] != 'tool' -%}
218
+ {%- set ns.prev_message_type = None -%}
219
+ {%- set role = 'model' if message['role'] == 'assistant' else message['role'] -%}
220
+ {#- Detect continuation: suppress duplicate <|turn>model when previous non-tool message was also assistant -#}
221
+ {%- set prev_nt = namespace(role=None, found=false) -%}
222
+ {%- if loop.index0 > 0 -%}
223
+ {%- for j in range(loop.index0 - 1, -1, -1) -%}
224
+ {%- if not prev_nt.found -%}
225
+ {%- if loop_messages[j]['role'] != 'tool' -%}
226
+ {%- set prev_nt.role = loop_messages[j]['role'] -%}
227
+ {%- set prev_nt.found = true -%}
228
+ {%- endif -%}
229
+ {%- endif -%}
230
+ {%- endfor -%}
231
+ {%- endif -%}
232
+ {%- set continue_same_model_turn = (role == 'model' and prev_nt.role == 'assistant') -%}
233
+ {%- if not continue_same_model_turn -%}
234
+ {{- '<|turn>' + role + '\n' }}
235
+ {%- endif -%}
236
+
237
+ {#- Render reasoning/reasoning_content as thinking channel -#}
238
+ {%- set thinking_text = message.get('reasoning') or message.get('reasoning_content') -%}
239
+ {%- if thinking_text and loop.index0 > ns_turn.last_user_idx and message.get('tool_calls') -%}
240
+ {{- '<|channel>thought\n' + thinking_text + '\n<channel|>' -}}
241
+ {%- endif -%}
242
+
243
+ {%- if message['tool_calls'] -%}
244
+ {%- for tool_call in message['tool_calls'] -%}
245
+ {%- set function = tool_call['function'] -%}
246
+ {{- '<|tool_call>call:' + function['name'] + '{' -}}
247
+ {%- if function['arguments'] is mapping -%}
248
+ {%- set ns_args = namespace(found_first=false) -%}
249
+ {%- for key, value in function['arguments'] | dictsort -%}
250
+ {%- if ns_args.found_first %},{% endif -%}
251
+ {%- set ns_args.found_first = true -%}
252
+ {{- key -}}:{{- format_argument(value, escape_keys=False) -}}
253
+ {%- endfor -%}
254
+ {%- elif function['arguments'] is string -%}
255
+ {{- function['arguments'] -}}
256
+ {%- endif -%}
257
+ {{- '}<tool_call|>' -}}
258
+ {%- endfor -%}
259
+ {%- set ns.prev_message_type = 'tool_call' -%}
260
+ {%- endif -%}
261
+
262
+ {%- set ns_tr_out = namespace(flag=false) -%}
263
+ {%- if message.get('tool_responses') -%}
264
+ {#- Legacy: tool_responses embedded on the assistant message (Google/Gemma native) -#}
265
+ {%- for tool_response in message['tool_responses'] -%}
266
+ {{- format_tool_response_block(tool_response['name'] | default('unknown'), tool_response['response']) -}}
267
+ {%- set ns_tr_out.flag = true -%}
268
+ {%- set ns.prev_message_type = 'tool_response' -%}
269
+ {%- endfor -%}
270
+ {%- elif message.get('tool_calls') -%}
271
+ {#- OpenAI Chat Completions: forward-scan consecutive role:tool messages -#}
272
+ {%- set ns_tool_scan = namespace(stopped=false) -%}
273
+ {%- for k in range(loop.index0 + 1, loop_messages | length) -%}
274
+ {%- if ns_tool_scan.stopped -%}
275
+ {%- elif loop_messages[k]['role'] != 'tool' -%}
276
+ {%- set ns_tool_scan.stopped = true -%}
277
+ {%- else -%}
278
+ {%- set follow = loop_messages[k] -%}
279
+ {#- Resolve tool_call_id to function name -#}
280
+ {%- set ns_tname = namespace(name=follow.get('name') | default('unknown')) -%}
281
+ {%- for tc in message['tool_calls'] -%}
282
+ {%- if tc.get('id') == follow.get('tool_call_id') -%}
283
+ {%- set ns_tname.name = tc['function']['name'] -%}
284
+ {%- endif -%}
285
+ {%- endfor -%}
286
+ {#- Handle content as string or content-parts array -#}
287
+ {%- set tool_body = follow.get('content') -%}
288
+ {%- if tool_body is string -%}
289
+ {{- format_tool_response_block(ns_tname.name, tool_body) -}}
290
+ {%- elif tool_body is sequence and tool_body is not string -%}
291
+ {%- set ns_txt = namespace(s='') -%}
292
+ {%- for part in tool_body -%}
293
+ {%- if part.get('type') == 'text' -%}
294
+ {%- set ns_txt.s = ns_txt.s + (part.get('text') | default('')) -%}
295
+ {%- endif -%}
296
+ {%- endfor -%}
297
+ {{- format_tool_response_block(ns_tname.name, ns_txt.s) -}}
298
+ {%- else -%}
299
+ {{- format_tool_response_block(ns_tname.name, tool_body) -}}
300
+ {%- endif -%}
301
+ {%- set ns_tr_out.flag = true -%}
302
+ {%- set ns.prev_message_type = 'tool_response' -%}
303
+ {%- endif -%}
304
+ {%- endfor -%}
305
+ {%- endif -%}
306
+
307
+ {%- set captured_content -%}
308
+ {%- if message['content'] is string -%}
309
+ {%- if role == 'model' -%}
310
+ {{- strip_thinking(message['content']) -}}
311
+ {%- else -%}
312
+ {{- message['content'] | trim -}}
313
+ {%- endif -%}
314
+ {%- elif message['content'] is sequence -%}
315
+ {%- for item in message['content'] -%}
316
+ {%- if item['type'] == 'text' -%}
317
+ {%- if role == 'model' -%}
318
+ {{- strip_thinking(item['text']) -}}
319
+ {%- else -%}
320
+ {{- item['text'] | trim -}}
321
+ {%- endif -%}
322
+ {%- elif item['type'] == 'image' -%}
323
+ {{- '<|image|>' -}}
324
+ {%- set ns.prev_message_type = 'image' -%}
325
+ {%- elif item['type'] == 'audio' -%}
326
+ {{- '<|audio|>' -}}
327
+ {%- set ns.prev_message_type = 'audio' -%}
328
+ {%- elif item['type'] == 'video' -%}
329
+ {{- '<|video|>' -}}
330
+ {%- set ns.prev_message_type = 'video' -%}
331
+ {%- endif -%}
332
+ {%- endfor -%}
333
+ {%- endif -%}
334
+ {%- endset -%}
335
+
336
+ {{- captured_content -}}
337
+ {%- set has_content = captured_content | trim | length > 0 -%}
338
+
339
+ {%- if ns.prev_message_type == 'tool_call' and not ns_tr_out.flag -%}
340
+ {{- '<|tool_response>' -}}
341
+ {%- elif not (ns_tr_out.flag and not has_content) -%}
342
+ {{- '<turn|>\n' -}}
343
+ {%- endif -%}
344
+ {%- endif -%}
345
+ {%- endfor -%}
346
+
347
+ {%- if add_generation_prompt -%}
348
+ {%- if ns.prev_message_type != 'tool_response' and ns.prev_message_type != 'tool_call' -%}
349
+ {{- '<|turn>model\n' -}}
350
+ {%- endif -%}
351
+ {%- endif -%}
config.json ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "attention_bias": false,
3
+ "attention_dropout": 0.0,
4
+ "attention_k_eq_v": false,
5
+ "bos_token_id": 2,
6
+ "dtype": "bfloat16",
7
+ "enable_moe_block": false,
8
+ "eos_token_id": 1,
9
+ "expert_intermediate_size": null,
10
+ "final_logit_softcapping": 30.0,
11
+ "global_head_dim": 512,
12
+ "head_dim": 256,
13
+ "hidden_activation": "gelu_pytorch_tanh",
14
+ "hidden_size": 1536,
15
+ "hidden_size_per_layer_input": 256,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 6144,
18
+ "layer_types": [
19
+ "sliding_attention",
20
+ "sliding_attention",
21
+ "sliding_attention",
22
+ "sliding_attention",
23
+ "full_attention",
24
+ "sliding_attention",
25
+ "sliding_attention",
26
+ "sliding_attention",
27
+ "sliding_attention",
28
+ "full_attention",
29
+ "sliding_attention",
30
+ "sliding_attention",
31
+ "sliding_attention",
32
+ "sliding_attention",
33
+ "full_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "sliding_attention",
37
+ "sliding_attention",
38
+ "full_attention",
39
+ "sliding_attention",
40
+ "sliding_attention",
41
+ "sliding_attention",
42
+ "sliding_attention",
43
+ "full_attention",
44
+ "sliding_attention",
45
+ "sliding_attention",
46
+ "sliding_attention",
47
+ "sliding_attention",
48
+ "full_attention",
49
+ "sliding_attention",
50
+ "sliding_attention",
51
+ "sliding_attention",
52
+ "sliding_attention",
53
+ "full_attention"
54
+ ],
55
+ "max_position_embeddings": 131072,
56
+ "model_type": "gemma4_text",
57
+ "num_attention_heads": 8,
58
+ "num_experts": null,
59
+ "num_global_key_value_heads": null,
60
+ "num_hidden_layers": 35,
61
+ "num_key_value_heads": 1,
62
+ "num_kv_shared_layers": 20,
63
+ "pad_token_id": 0,
64
+ "rms_norm_eps": 1e-06,
65
+ "rope_parameters": {
66
+ "full_attention": {
67
+ "partial_rotary_factor": 0.25,
68
+ "rope_theta": 1000000.0,
69
+ "rope_type": "proportional"
70
+ },
71
+ "sliding_attention": {
72
+ "rope_theta": 10000.0,
73
+ "rope_type": "default"
74
+ }
75
+ },
76
+ "sliding_window": 512,
77
+ "tie_word_embeddings": true,
78
+ "top_k_experts": null,
79
+ "use_bidirectional_attention": null,
80
+ "use_cache": true,
81
+ "use_double_wide_mlp": true,
82
+ "vocab_size": 262144,
83
+ "vocab_size_per_layer_input": 262144,
84
+ "architectures": [
85
+ "Gemma4ForCausalLM"
86
+ ],
87
+ "transformers_version": "5.5.0.dev0"
88
+ }
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 2,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 1,
6
+ 106,
7
+ 50
8
+ ],
9
+ "pad_token_id": 0,
10
+ "temperature": 1.0,
11
+ "top_k": 64,
12
+ "top_p": 0.95,
13
+ "transformers_version": "5.5.0.dev0"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d0cb2919216614ad21ac1912119ca65ad94b6bacfe6d8de0a03bffade1f3574
3
+ size 9302047606
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f
3
+ size 32169626
tokenizer_config.json ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "audio_token": "<|audio|>",
3
+ "backend": "tokenizers",
4
+ "boa_token": "<|audio>",
5
+ "boi_token": "<|image>",
6
+ "bos_token": "<bos>",
7
+ "eoa_token": "<audio|>",
8
+ "eoc_token": "<channel|>",
9
+ "eoi_token": "<image|>",
10
+ "eos_token": "<eos>",
11
+ "eot_token": "<turn|>",
12
+ "escape_token": "<|\"|>",
13
+ "etc_token": "<tool_call|>",
14
+ "etd_token": "<tool|>",
15
+ "etr_token": "<tool_response|>",
16
+ "extra_special_tokens": [
17
+ "<|video|>"
18
+ ],
19
+ "image_token": "<|image|>",
20
+ "mask_token": "<mask>",
21
+ "model_max_length": 1000000000000000019884624838656,
22
+ "pad_token": "<pad>",
23
+ "padding_side": "left",
24
+ "processor_class": "Gemma4Processor",
25
+ "response_schema": {
26
+ "type": "object",
27
+ "properties": {
28
+ "role": {
29
+ "const": "assistant"
30
+ },
31
+ "thinking": {
32
+ "type": "string"
33
+ },
34
+ "content": {
35
+ "type": "string"
36
+ },
37
+ "tool_calls": {
38
+ "x-regex-iterator": "<\\|tool_call>(.*?)<tool_call\\|>",
39
+ "type": "array",
40
+ "items": {
41
+ "type": "object",
42
+ "properties": {
43
+ "type": {
44
+ "const": "function"
45
+ },
46
+ "function": {
47
+ "type": "object",
48
+ "x-regex": "call\\:(?P<name>\\w+)(?P<arguments>\\{.*\\})",
49
+ "properties": {
50
+ "name": {
51
+ "type": "string"
52
+ },
53
+ "arguments": {
54
+ "type": "object",
55
+ "x-parser": "gemma4-tool-call",
56
+ "additionalProperties": {}
57
+ }
58
+ }
59
+ }
60
+ }
61
+ }
62
+ }
63
+ },
64
+ "x-regex": "(\\<\\|channel\\>thought\\n(?P<thinking>.*?)\\<channel\\|\\>)?(?P<tool_calls>\\<\\|tool_call\\>.*\\<tool_call\\|\\>)?(?P<content>(?:(?!\\<turn\\|\\>)(?!\\<\\|tool_response\\>).)+)?(?:\\<turn\\|\\>|\\<\\|tool_response\\>)?"
65
+ },
66
+ "soc_token": "<|channel>",
67
+ "sot_token": "<|turn>",
68
+ "stc_token": "<|tool_call>",
69
+ "std_token": "<|tool>",
70
+ "str_token": "<|tool_response>",
71
+ "think_token": "<|think|>",
72
+ "tokenizer_class": "GemmaTokenizer",
73
+ "unk_token": "<unk>"
74
+ }