hmahadik commited on
Commit
eef4acc
·
verified ·
1 Parent(s): 837c537

v9: schema-free inference, 100% smoke, sub-second cold prefill

Browse files
Files changed (5) hide show
  1. .gitattributes +1 -0
  2. README.md +148 -210
  3. functiongemma-physical-ai-v9-Q5_K_M.gguf +3 -0
  4. token_map.json +20 -27
  5. tools.json +33 -44
.gitattributes CHANGED
@@ -42,3 +42,4 @@ functiongemma-physical-ai-v6-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
42
  functiongemma-physical-ai-v7-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
43
  moonshine/decoder.vmfb filter=lfs diff=lfs merge=lfs -text
44
  moonshine/decoder_with_past.vmfb filter=lfs diff=lfs merge=lfs -text
 
 
42
  functiongemma-physical-ai-v7-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
43
  moonshine/decoder.vmfb filter=lfs diff=lfs merge=lfs -text
44
  moonshine/decoder_with_past.vmfb filter=lfs diff=lfs merge=lfs -text
45
+ functiongemma-physical-ai-v9-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -17,100 +17,114 @@ pipeline_tag: text-generation
17
  inference: false
18
  ---
19
 
20
- # FunctionGemma 270M — Physical AI
21
 
22
  Fine-tuned [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it)
23
  for voice-controlled physical-AI / household-IoT actions on a Synaptics
24
  SL2619 "Coral" edge board (Google IO 2026 demo).
25
 
26
- | Revision | File | Tool count | Notes |
27
- |----------|------|-----------:|-------|
28
- | **v7 (current)** | [`functiongemma-physical-ai-v7-Q5_K_M.gguf`](./functiongemma-physical-ai-v7-Q5_K_M.gguf) | 10 | `list_alarms` removed; alarm-query prompts route via `respond()`. 250-row eval: **86.8%** overall, **92.8%** single-tool, **75.0%** multi-tool exact-match, **0.0%** parse failure. |
29
- | v6 (previous) | [`functiongemma-physical-ai-v6-Q5_K_M.gguf`](./functiongemma-physical-ai-v6-Q5_K_M.gguf) | 11 | Camera + vision dropped. Single-tool routing 95.5%, multi-tool exact-match 23.9%. |
30
- | v4c (legacy) | [`functiongemma-physical-ai-Q4_K_M.gguf`](./functiongemma-physical-ai-Q4_K_M.gguf) | 13 | Earlier checkpoint, includes camera/scene tools. |
 
31
 
32
- Schema ships as [`tools.json`](./tools.json) (10 tools, current). Token-to-tool
33
  mapping is in [`token_map.json`](./token_map.json).
34
 
35
- ## Output format function tokens
36
-
37
- Tool calls emit as **function tokens**: each tool name compiles to a single
38
- special-vocabulary token (`<tool_0>` `<tool_9>` for v7) and a single
39
- `<end>` terminator. A complete call decodes in roughly 8–15 output tokens,
40
- vs ~30–80 for native FunctionGemma's
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  `<start_function_call>call:NAME{...}<end_function_call>` syntax. On a
42
  2-core Cortex-A55 this is the difference between sub-second and 2–5 s
43
  voice-UX latency.
44
 
45
- Sample output: `<tool_3>(3,"red")<end>` for `blink_lights(count=3, color="red")`.
46
-
47
- `<tool_0>` → `turn_on_lights`, `<tool_3>` → `blink_lights`,
48
- `<tool_8>` → `get_system_status`, `<tool_9>` → `respond` (v7 numbering;
49
- v6 used `<tool_9>` and `<tool_10>` for those — bumped down by one when
50
- `list_alarms` was removed). Full mapping in [`token_map.json`](./token_map.json).
51
 
52
  > ⚠️ Inference servers MUST stop generation on `<end_of_turn>` (or `<eos>`),
53
- > NOT on `<end>`. Multi-tool sequences emit `<tool_A>(args)<end><tool_B>(args)<end>`,
54
- > so stopping at the first `<end>` truncates legitimate multi-tool output.
 
55
 
56
  ## Quick start (Ollama)
57
 
58
  ```bash
59
  hf download BrinqAI/functiongemma-270m-physical-ai \
60
- functiongemma-physical-ai-Q4_K_M.gguf Modelfile tools.json token_map.json \
61
  --local-dir ./fg-physical-ai
62
 
63
  cd fg-physical-ai
64
  ollama create functiongemma-physical-ai -f Modelfile
65
  ```
66
 
67
- `ollama create -f Modelfile` is the documented install path because the
68
- shipped `Modelfile` bakes in the stop tokens (`<end>`, `<end_of_turn>`,
69
- `<eos>`) and decode parameters (`temperature=0`, `num_ctx=1024`,
70
- `num_predict=80`). Direct `ollama pull hf.co/...` does not apply these,
71
- and the function-token output will run past `<end>` until it hits
72
- `num_predict`. Only use the direct-pull path if your client injects stops
73
- at request time (the Python snippet below does this via `options.stop`).
74
 
75
  ## Calling the model
76
 
77
- The model expects prompts built via the FunctionGemma chat template
78
- (developer role + user role, with the tools list passed in via
79
- `tokenizer.apply_chat_template(..., tools=tools)`). Send to Ollama with
80
- `raw=true` so it forwards the prompt verbatim. Plain `ollama run` from the
81
- CLI does **not** pass tools and will degenerate to chat-style refusals.
82
-
83
- Standalone client (depends on `transformers` for the chat template, plus
84
- the `tools.json` and `token_map.json` files in the same directory):
85
 
86
  ```python
87
  import json
88
  import re
89
  import urllib.request
90
- from transformers import AutoTokenizer
91
 
92
  OLLAMA_URL = "http://localhost:11434"
93
  MODEL = "functiongemma-physical-ai"
94
 
95
- tools = json.load(open("tools.json"))["tools"]
96
  reverse_token_map = json.load(open("token_map.json"))["reverse"]
97
- tokenizer = AutoTokenizer.from_pretrained("google/functiongemma-270m-it")
98
 
99
 
100
  def build_prompt(user_text: str) -> str:
101
- msgs = [
102
- {
103
- "role": "developer",
104
- "content": (
105
- "You are a model that can do function calling with the "
106
- "following functions\n"
107
- ),
108
- "tool_calls": None,
109
- },
110
- {"role": "user", "content": user_text, "tool_calls": None},
111
- ]
112
- return tokenizer.apply_chat_template(
113
- msgs, tools=tools, tokenize=False, add_generation_prompt=True
114
  )
115
 
116
 
@@ -124,7 +138,7 @@ def call_model(user_text: str) -> str:
124
  "temperature": 0.0,
125
  "top_p": 1.0,
126
  "num_predict": 80,
127
- "stop": ["<end>", "<end_of_turn>", "<eos>"],
128
  },
129
  }).encode()
130
  req = urllib.request.Request(
@@ -145,79 +159,55 @@ def parse_call(raw: str) -> tuple[str | None, str]:
145
  return reverse_token_map.get(tok), args
146
 
147
 
148
- raw = call_model("Turn on the lights")
149
- print(raw) # e.g. <tool_0>()<end>
150
- print(parse_call(raw)) # ('turn_on_lights', '')
151
  ```
152
 
 
 
 
 
153
  ## Training data
154
 
155
- ### v7 (current)
156
-
157
- - **Size**: 2,000 train / 250 eval (`coral_v7_compact.jsonl`).
158
- - **Schema change**: `list_alarms` removed. Out-of-scope alarm-query prompts
159
- ("what alarms do I have?") are deliberately routed through `respond()`
160
- rather than answered by a query tool. Compact token map shifted accordingly:
161
- `get_system_status` is now `<tool_8>` (was `<tool_9>`), `respond` is
162
- `<tool_9>` (was `<tool_10>`).
163
- - **Multi-tool**: 84 of 250 eval rows (33.6%) are multi-tool sequences,
164
- matching the Google mobile-actions distribution.
165
- - **GGUF eval (Q5_K_M, greedy)**: overall **86.8%** (217/250), single-tool
166
- **92.8%** (154/166), multi-tool exact-match **75.0%** (63/84), parse
167
- failure **0.0%** (0/250). Per-tool F1 ranges from 0.74 (`respond`) to
168
- 1.00 (`cancel_alarm`).
169
- - **Known weak spots** (informal on-device REPL): "tell me a joke" / "what
170
- alarms do I have" tend to misroute to `play_buzzer` instead of `respond`
171
- more `respond()` negatives sharing keywords with physical-action tools
172
- would help in v8.
173
-
174
- ### v6 (previous)
175
-
176
- - **Size**: 1,400 train / 150 eval (v5/v6 dataset lineage, `coral_v5_compact.jsonl`).
177
- - **Tool count**: 11. Cameras / vision tools dropped from earlier
178
- checkpoints; alarm-list tool kept.
179
-
180
- ### v4 (legacy)
181
-
182
- - **Size**: 367 train / 100 eval.
183
- - **Multi-tool**: 13% (vs Google mobile-actions 33.4%).
184
- - **Buzzer schema**: pattern-only (binary GPIO on the reference HAT — no
185
- PWM). Old `frequency_hz` / `duration_seconds` prompts are routed
186
- through `respond()` as out-of-scope negatives.
187
 
188
  ## Methodology
189
 
190
- This model uses the **functional-token** approach introduced by Octopus v2
191
- (Chen and Li, 2024): special vocabulary tokens are added for each callable
192
- function so a tool call decodes in a single output token rather than a
193
- multi-token JSON string. On-device this collapses ~30–80-token native
194
- FunctionGemma calls down to ~8–15 tokens, enabling sub-second decode on a
195
- 2-core Cortex-A55.
196
-
197
- The training recipe is a direct port of Brinq's SmartPanel v14 trainer
198
- (full bf16, mean-init for new tokens, completion-only loss mask), adapted
199
- for a smaller dataset:
200
-
201
- - **Full bf16 fine-tune (no LoRA)**.
202
- - **Mean-init** for new `<tool_0>..<tool_9>` and `<end>` special tokens
203
- (init = mean of existing input embeddings; random init under-converges
204
- for tiny models on small datasets).
205
- - **Completion-only loss mask**: hand-rolled, masking everything before
206
- `<start_of_turn>model\n`. TRL 0.25's `completion_only_loss=True` is a
207
- no-op on flat-text data and FunctionGemma's chat template lacks
208
- `{% generation %}` markers required for `assistant_only_loss`.
209
- - **8 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup. (2,000 examples in
210
- v7 — fewer epochs than v4's 15 because dataset size grew 5×.)
211
- - **Tool-token loss weight 4.0** to keep the new function tokens learning
212
- faster than the rest of the vocabulary (Gemma3's 262k-vocab dilutes the
213
- signal otherwise).
214
- - **Effective batch 16** = `per_device_train_batch_size=2 ×
215
- gradient_accumulation_steps=8` (kept this way to avoid the 8 GiB
216
- cross-entropy logit allocation OOM that bites Gemma3's 262k vocab).
217
- - **`max_length=1024`** to fit the full 13-tool schema in the prompt.
218
- - bf16, gradient checkpointing, `adamw_torch_fused`, `weight_decay=0.01`.
219
- - Trained inside `unsloth/unsloth:latest` Docker container with GPU
220
- passthrough.
221
 
222
  ### Citation
223
 
@@ -231,105 +221,54 @@ for a smaller dataset:
231
  }
232
  ```
233
 
234
- ## Eval results
235
-
236
- **v7 checkpoint (2,000 train / 250 eval), Q5_K_M GGUF, greedy decode:**
237
-
238
- | Metric | Result |
239
- |--------|--------|
240
- | Overall accuracy | 217 / 250 = **86.8%** |
241
- | Single-tool accuracy | 154 / 166 = **92.8%** |
242
- | Multi-tool exact-match | 63 / 84 = **75.0%** |
243
- | Parse failure rate | 0 / 250 = **0.0%** |
244
 
245
- Per-tool F1: `cancel_alarm` 1.00, `get_system_status` 0.96, `set_alarm` 0.93,
246
- `set_neopixel_pattern` 0.92, `turn_on_lights` 0.90, `blink_lights` 0.89,
247
- `turn_off_lights` 0.89, `set_led_color` 0.88, `play_buzzer` 0.83,
248
- `respond` 0.74. (`respond` is the lowest because the model occasionally
249
- chooses a physical-action tool with a hallucinated text argument when the
250
- prompt shares keywords with one — an issue the dispatcher's enum validation
251
- catches at runtime.)
252
 
253
- **On-device latency** (SL2619 / 2× Cortex-A55 @ 2 GHz, `performance` governor):
254
- ~42 s cold prefill (one-time), ~1.6 s / turn warm — measured across a 33-prompt
255
- exhaustive REPL run on the actual Coralboard.
 
 
256
 
257
- ## Latency
258
 
259
- - **~1.1 1.3 s** per call on a laptop CPU (Ollama / standalone client above).
260
- - **~1.6 s / turn warm**, ~42 s cold prefill on SL2619 (2× Cortex-A55 @ 2 GHz)
261
- with the CPU governor pinned to `performance`. Measured 2026-05-05 on the
262
- Grinn Coralboard with the v7 GGUF + the `Function_calling/` demo from
263
- [BrinqAI/sl2610-examples](https://github.com/BrinqAI/sl2610-examples/tree/coralboard/functiongemma/Function_calling).
264
 
265
- ## ONNX exports (for compiler toolchains)
 
 
 
 
 
266
 
267
- For compiler-targeted backends (ONNX Runtime, IREE/MLIR, OpenVINO, TensorRT,
268
- Synaptics Torq), the model is also published as ONNX with KV-cache support
269
- (`text-generation-with-past`). Both exports are derived from the same
270
- `coral-functiongemma-v4c-compact` checkpoint as the GGUF above.
271
 
272
- | Path | Precision | Weight init dtype | Size | ORT runnable |
273
- |------|-----------|-------------------|------|--------------|
274
- | `onnx/compact-fp32/model.onnx` | fp32 | 237 / 237 FLOAT | 1.7 GB | yes |
275
- | `onnx/compact-fp16/model.onnx` | fp16 | 237 / 237 FLOAT16 | 833 MB | no see note |
276
-
277
- Both files are structurally valid (`onnx.checker.check_model(..., full_check=True)`
278
- passes). Each export ships with the matching tokenizer and `config.json` so it
279
- can be loaded directly:
280
-
281
- ```python
282
- from transformers import AutoTokenizer
283
- import onnxruntime as ort
284
- import numpy as np, json
285
-
286
- MODEL = "onnx/compact-fp32" # or downloaded local path
287
- tok = AutoTokenizer.from_pretrained(MODEL)
288
- sess = ort.InferenceSession(f"{MODEL}/model.onnx", providers=["CPUExecutionProvider"])
289
-
290
- tools = json.load(open("tools.json"))["tools"]
291
- prompt = tok.apply_chat_template(
292
- [{"role": "developer",
293
- "content": "You are a model that can do function calling with the following functions\n",
294
- "tool_calls": None},
295
- {"role": "user", "content": "Turn on the lights", "tool_calls": None}],
296
- tools=tools, tokenize=False, add_generation_prompt=True,
297
- )
298
- # Then feed input_ids + empty past_key_values.* (shape (1, num_kv_heads, 0, head_dim))
299
- # greedy-decode in a loop, stop on <end>. See repo for full snippet.
300
- ```
301
-
302
- Smoke decode of "Turn on the lights" against the fp32 ONNX returns
303
- `<tool_0>()<end>` (= `turn_on_lights()`), matching the GGUF output.
304
-
305
- ### fp16 + ONNX Runtime caveat
306
-
307
- The fp16 ONNX file is structurally valid but **does not currently load in
308
- ONNX Runtime ≥ 1.20** for this model: ORT's `SimplifiedLayerNormFusion` pass
309
- chokes on the `InsertedPrecisionFreeCast_*` nodes that the fp16 conversion
310
- inserts around Gemma3's RMSNorm layers. The error is graph-optimizer-internal
311
- and reproduces with `ORT_DISABLE_ALL`. This is an ORT bug, not an ONNX-spec
312
- issue — the file passes `onnx.checker` and the graph is well-formed.
313
-
314
- For compiler frontends that consume ONNX directly (IREE / MLIR, TensorRT,
315
- OpenVINO, Synaptics Torq), the fp16 file should ingest fine. For runtime
316
- inference via `onnxruntime` itself, use the fp32 export and let your compiler
317
- or runtime do its own dtype conversion / quantization downstream.
318
 
319
  ## Files
320
 
321
  ```
322
- functiongemma-physical-ai-v7-Q5_K_M.gguf # 248 MB, GGUF Q5_K_M, 10-tool v7 schema (current)
323
- functiongemma-physical-ai-v6-Q5_K_M.gguf # 248 MB, GGUF Q5_K_M, 11-tool v6 schema (previous)
324
- functiongemma-physical-ai-Q4_K_M.gguf # 253 MB, GGUF Q4_K_M, v4c (legacy)
325
- Modelfile # Ollama Modelfile (function-token format)
326
- tools.json # 10-tool schema (mobile-actions format, current)
327
- token_map.json # function-token <-> tool-name map
328
- onnx/compact-fp32/ # ONNX export, fp32, with KV cache (1.7 GB)
329
- onnx/compact-fp16/ # ONNX export, fp16, with KV cache (833 MB) — see ORT caveat above
330
  README.md # this file
331
  ```
332
 
 
 
 
 
333
  ## License
334
 
335
  Released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
@@ -340,8 +279,7 @@ By using this model you agree to those terms. Base model:
340
 
341
  - Base model: <https://huggingface.co/google/functiongemma-270m-it>
342
  - Octopus v2 paper: <https://arxiv.org/abs/2404.01744>
343
- - Hardware demo (Coralboard, Google IO 2026 full physical setup,
344
- WLED-over-USB-CDC, Grinn HAT, end-to-end voice + text REPL):
345
- <https://github.com/BrinqAI/sl2610-examples/tree/coralboard/functiongemma/Function_calling>
346
- (BrinqAI fork of the upstream Synaptics demo repo,
347
- [synaptics-astra-demos/sl2610-examples](https://github.com/synaptics-astra-demos/sl2610-examples)).
 
17
  inference: false
18
  ---
19
 
20
+ # FunctionGemma 270M — Physical AI (v9, Octopus v2)
21
 
22
  Fine-tuned [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it)
23
  for voice-controlled physical-AI / household-IoT actions on a Synaptics
24
  SL2619 "Coral" edge board (Google IO 2026 demo).
25
 
26
+ | Revision | File | Tool count | Headline result |
27
+ |----------|------|-----------:|-----------------|
28
+ | **v9 (current)** | [`functiongemma-physical-ai-v9-Q5_K_M.gguf`](./functiongemma-physical-ai-v9-Q5_K_M.gguf) | 8 | 30/30 (100 %) routing on held-out smoke prompts; **0.55 s cold prefill** on the 2-core A55 (vs ~57 s for v7's schema-in-prompt build — **105× faster**). |
29
+ | v7 (legacy) | [`functiongemma-physical-ai-v7-Q5_K_M.gguf`](./functiongemma-physical-ai-v7-Q5_K_M.gguf) | 10 | 86.8 % overall on a 250-row eval; schema-in-prompt build. |
30
+ | v6 (legacy) | [`functiongemma-physical-ai-v6-Q5_K_M.gguf`](./functiongemma-physical-ai-v6-Q5_K_M.gguf) | 11 | Camera + vision dropped from earlier revs. Schema-in-prompt build. |
31
+ | v4c (legacy) | [`functiongemma-physical-ai-Q4_K_M.gguf`](./functiongemma-physical-ai-Q4_K_M.gguf) | 13 | Earliest published checkpoint. |
32
 
33
+ Schema ships as [`tools.json`](./tools.json) (8 tools, v9). Token-to-tool
34
  mapping is in [`token_map.json`](./token_map.json).
35
 
36
+ ## What changed in v9
37
+
38
+ v9 is a structural rewrite of the training pipeline, not just a dataset
39
+ refresh. Earlier revisions used the upstream FunctionGemma prompt format,
40
+ which injects the full tool schema (~1088 tokens) into every request as a
41
+ `<start_function_declaration>` developer turn. On a 2-core Cortex-A55 that
42
+ prefill cost ~57 s on the first turn — incompatible with a sub-second
43
+ voice UX.
44
+
45
+ v9 follows [Octopus v2](https://arxiv.org/abs/2404.01744) end to end:
46
+
47
+ | | Pre-v9 (schema-in-prompt) | v9 (Octopus v2) |
48
+ |---|---|---|
49
+ | Prompt format | `<start_of_turn>developer\n<start_function_declaration>...{schema}...<end_function_declaration>\n<end_of_turn>\n<start_of_turn>user\n{q}<end_of_turn>\n<start_of_turn>model\n` | `<start_of_turn>user\n{q}<end_of_turn>\n<start_of_turn>model\n` |
50
+ | Tokens per prompt | ~1088 | ~13 |
51
+ | Cold prefill on SL2619 (2-core A55) | ~57 s | **~0.55 s** |
52
+ | Tool routing | learned from in-context schema | learned from `<tool_N>` token weights |
53
+ | Training data shape | `{tools, messages: [dev, user, asst]}` with schema embedded | `{input, output, split}` — flat |
54
+
55
+ The schema in `tools.json` is still the source of truth for dispatcher
56
+ arg validation and is embedded in the GGUF metadata for schema-drift
57
+ checks, but it is **not** loaded into the inference prompt.
58
+
59
+ ## Tool surface (v9, 8 tools)
60
+
61
+ | Token | Name | Args | Purpose |
62
+ |---|---|---|---|
63
+ | `<tool_0>` | `set_status_led` | `led`, `state`, `brightness?` | On/off one or all HAT status LEDs |
64
+ | `<tool_1>` | `blink_status_led` | `led`, `count?`, `speed?` | Discrete blink |
65
+ | `<tool_2>` | `set_neopixel_effect` | `effect`, `color?`, `palette?`, `speed?`, `intensity?` | Animated effect on the ring |
66
+ | `<tool_3>` | `play_buzzer` | `pattern` | `beep`, `double_beep`, `chirp`, `siren`, `alarm`, `success`, `error` |
67
+ | `<tool_4>` | `set_alarm` | `duration` or `time`, `label?` | Timer |
68
+ | `<tool_5>` | `cancel_alarm` | `label?` | Cancel one or all |
69
+ | `<tool_6>` | `get_system_status` | `metric` | `cpu`, `memory`, `temperature`, `npu`, or `all` |
70
+ | `<tool_7>` | `respond` | `message` | Natural-language fallback when no tool fits |
71
+
72
+ Surface routing keyword: `set_neopixel_effect` requires the literal
73
+ substring `neopixels` in the user input. LED-vs-ring ambiguous prompts
74
+ ("turn off the lights") route to `respond()` asking the user to
75
+ disambiguate.
76
+
77
+ ## Output format — functional tokens
78
+
79
+ Tool calls emit as **functional tokens**: each tool name compiles to a
80
+ single special-vocabulary token (`<tool_0>` … `<tool_7>`) plus a single
81
+ `<end>` terminator. A complete call decodes in 8–15 output tokens, vs
82
+ ~30–80 for the upstream native FunctionGemma
83
  `<start_function_call>call:NAME{...}<end_function_call>` syntax. On a
84
  2-core Cortex-A55 this is the difference between sub-second and 2–5 s
85
  voice-UX latency.
86
 
87
+ Sample output: `<tool_0>("red","on")<end>` for `set_status_led(led="red", state="on")`.
 
 
 
 
 
88
 
89
  > ⚠️ Inference servers MUST stop generation on `<end_of_turn>` (or `<eos>`),
90
+ > NOT on `<end>`. The v9 model can emit multi-tool sequences
91
+ > `<tool_A>(args)<end><tool_B>(args)<end>`, so stopping at the first
92
+ > `<end>` truncates legitimate multi-tool output.
93
 
94
  ## Quick start (Ollama)
95
 
96
  ```bash
97
  hf download BrinqAI/functiongemma-270m-physical-ai \
98
+ functiongemma-physical-ai-v9-Q5_K_M.gguf Modelfile tools.json token_map.json \
99
  --local-dir ./fg-physical-ai
100
 
101
  cd fg-physical-ai
102
  ollama create functiongemma-physical-ai -f Modelfile
103
  ```
104
 
105
+ The shipped `Modelfile` bakes in the stop tokens (`<end_of_turn>`, `<eos>`)
106
+ and decode parameters (`temperature=0`, `num_ctx=1024`, `num_predict=80`).
 
 
 
 
 
107
 
108
  ## Calling the model
109
 
110
+ The v9 model expects a **bare user turn** no schema, no tools list. Send
111
+ to Ollama with `raw=true`:
 
 
 
 
 
 
112
 
113
  ```python
114
  import json
115
  import re
116
  import urllib.request
 
117
 
118
  OLLAMA_URL = "http://localhost:11434"
119
  MODEL = "functiongemma-physical-ai"
120
 
 
121
  reverse_token_map = json.load(open("token_map.json"))["reverse"]
 
122
 
123
 
124
  def build_prompt(user_text: str) -> str:
125
+ return (
126
+ f"<start_of_turn>user\n{user_text}<end_of_turn>\n"
127
+ f"<start_of_turn>model\n"
 
 
 
 
 
 
 
 
 
 
128
  )
129
 
130
 
 
138
  "temperature": 0.0,
139
  "top_p": 1.0,
140
  "num_predict": 80,
141
+ "stop": ["<end_of_turn>", "<eos>"],
142
  },
143
  }).encode()
144
  req = urllib.request.Request(
 
159
  return reverse_token_map.get(tok), args
160
 
161
 
162
+ raw = call_model("turn the red LED on")
163
+ print(raw) # e.g. '<tool_0>("red","on")<end>'
164
+ print(parse_call(raw)) # ('set_status_led', '"red","on"')
165
  ```
166
 
167
+ For `llama-cpp-python` directly, use `detokenize(..., special=True)` so
168
+ the `<tool_N>` and `<end>` tokens render in the output instead of being
169
+ stripped.
170
+
171
  ## Training data
172
 
173
+ v9's training data was generated from Haiku-authored phrasing templates
174
+ crossed with deterministic entity pools, then lightly augmented with
175
+ Moonshine-flavored ASR noise (dropped function words, lowercased traces,
176
+ filler-word prepends). The shape matches Brinq's SmartPanel v15 trainer:
177
+ flat `{input, output, split}` records, no tools / messages array.
178
+
179
+ | | v9 |
180
+ |---|---|
181
+ | Train rows | 6,127 |
182
+ | Eval rows | 1,339 |
183
+ | Tools | 8 |
184
+ | Multi-tool fraction | low — single-tool emphasis; multi-tool routines composed at dispatch time |
185
+ | Augmentation | Moonshine-sim noise on ~30 % of records |
186
+
187
+ Per-tool train counts (range 217–1,199; cancel_alarm + play_buzzer are the
188
+ narrowest classes because their natural phrasing variation is smaller
189
+ not a coverage gap).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
 
191
  ## Methodology
192
 
193
+ Direct port of the SmartPanel v15 trainer:
194
+
195
+ - **Full bf16 fine-tune** (no LoRA).
196
+ - **Functional tokens**: `<tool_0>` `<tool_7>` + `<end>` added as
197
+ `additional_special_tokens`; new embeddings **mean-initialized** from the
198
+ existing input embedding matrix (random init under-converges on small
199
+ datasets).
200
+ - **Completion-only loss mask**: hand-rolled labels before
201
+ `<start_of_turn>model\n` are masked to `-100`. The model learns only from
202
+ the assistant turn, not the user prompt.
203
+ - **5 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup, weight decay 0.01.
204
+ - **Effective batch = 16** (`per_device_train_batch_size=8 ×
205
+ gradient_accumulation_steps=2`).
206
+ - **`max_length=256`** the trained prompt format is ~13 tokens and the
207
+ assistant turn fits comfortably under 64 tokens, including respond()
208
+ messages.
209
+ - bf16, gradient checkpointing, `adamw_torch_fused`, `metric_for_best_model="eval_loss"` + `load_best_model_at_end=True`.
210
+ - Training wallclock: **5 min on a single H100** (~15–20 min on a 4090).
 
 
 
 
 
 
 
 
 
 
 
 
 
211
 
212
  ### Citation
213
 
 
221
  }
222
  ```
223
 
224
+ ## Results
 
 
 
 
 
 
 
 
 
225
 
226
+ ### Training metrics (final epoch)
 
 
 
 
 
 
227
 
228
+ | | v9 |
229
+ |---|---|
230
+ | Final train loss | 0.555 |
231
+ | Final eval loss | **0.090** |
232
+ | Mean token accuracy (eval) | **97.9 %** |
233
 
234
+ ### Held-out smoke test (post-train, 30 prompts spanning all 8 tools)
235
 
236
+ | | v9 |
237
+ |---|---|
238
+ | Smoke-test routing accuracy | **30 / 30 (100 %)** |
 
 
239
 
240
+ The 30-prompt suite covers single-tool happy paths for every tool plus
241
+ the failure modes that broke v8: ambiguous LED prompts ("turn off the
242
+ lights"), effect-name without `neopixels` keyword ("do the aurora"),
243
+ unsupported features ("play a tone at 2000 hz"), and out-of-scope
244
+ appliances ("turn on the TV"). All 8 of those route to `respond()` with a
245
+ helpful explanation.
246
 
247
+ ### On-device benchmark (Coralboard, 2-core Cortex-A55 @ 2 GHz, Q5_K_M GGUF)
 
 
 
248
 
249
+ | | v7 (schema-in-prompt, 10 tools) | v9 (Octopus v2, 8 tools) |
250
+ |---|---|---|
251
+ | Prompt tokens | ~1088 | ~13 |
252
+ | **Cold prefill (turn 1)** | **57.3 s** | **0.55 s** (105× faster) |
253
+ | Warm prefill (turn 2+) | ~3 s | ~0.4 s |
254
+ | Decode for a typical call | 0.5–1.2 s | 0.5–1.2 s |
255
+ | End-to-end first-turn (model load 2.3 s + prefill + decode) | ~62 s | ~3 s |
256
+ | Routing on a 29-prompt board bench | n/a directly comparable | **29 / 29 (100 %)** |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
257
 
258
  ## Files
259
 
260
  ```
261
+ functiongemma-physical-ai-v9-Q5_K_M.gguf # ~248 MB, v9 GGUF Q5_K_M weights (Ollama / llama.cpp)
262
+ Modelfile # Ollama Modelfile (functional-token format)
263
+ tools.json # 8-tool schema (v9, canonical mobile-actions format)
264
+ token_map.json # functional-token <-> tool-name map (v9)
 
 
 
 
265
  README.md # this file
266
  ```
267
 
268
+ Legacy v6/v7 GGUFs are kept in repo history for reproducibility but should
269
+ not be used for new deployments — they require the schema-in-prompt
270
+ inference wrapper and pay the ~57 s cold-prefill cost.
271
+
272
  ## License
273
 
274
  Released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
 
279
 
280
  - Base model: <https://huggingface.co/google/functiongemma-270m-it>
281
  - Octopus v2 paper: <https://arxiv.org/abs/2404.01744>
282
+ - Hardware demo + integration code (Synaptics Coralboard, Grinn HAT,
283
+ WLED-over-USB-CDC, full PyQt UI):
284
+ <https://github.com/synaptics-astra-demos/sl2610-examples>
285
+ `Function_calling/`
 
functiongemma-physical-ai-v9-Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:efa8d6f922ba40e0064b8efa3c330f36f843e933b62086c60964fbcafea22852
3
+ size 260047008
token_map.json CHANGED
@@ -1,30 +1,25 @@
1
  {
2
- "version": "0.3.0",
3
- "description": "Compressed token map for FunctionGemma CPU inference on SL2619. 10 tools — list_alarms removed in v7 (no alarm-query path; users cannot list alarms by design).",
4
  "tokens": {
5
- "turn_on_lights": "<tool_0>",
6
- "turn_off_lights": "<tool_1>",
7
- "set_led_color": "<tool_2>",
8
- "blink_lights": "<tool_3>",
9
- "set_neopixel_pattern": "<tool_4>",
10
- "play_buzzer": "<tool_5>",
11
- "set_alarm": "<tool_6>",
12
- "cancel_alarm": "<tool_7>",
13
- "get_system_status": "<tool_8>",
14
- "respond": "<tool_9>"
15
  },
16
  "reverse": {
17
- "<tool_0>": "turn_on_lights",
18
- "<tool_1>": "turn_off_lights",
19
- "<tool_2>": "set_led_color",
20
- "<tool_3>": "blink_lights",
21
- "<tool_4>": "set_neopixel_pattern",
22
- "<tool_5>": "play_buzzer",
23
- "<tool_6>": "set_alarm",
24
- "<tool_7>": "cancel_alarm",
25
- "<tool_8>": "get_system_status",
26
- "<tool_9>": "respond",
27
- "<tool_none>": null
28
  },
29
  "special_tokens": [
30
  "<tool_0>",
@@ -35,11 +30,9 @@
35
  "<tool_5>",
36
  "<tool_6>",
37
  "<tool_7>",
38
- "<tool_8>",
39
- "<tool_9>",
40
- "<tool_none>",
41
  "<end>"
42
  ],
43
  "output_format": "<tool_N>(\"arg1\",\"arg2\",...)<end>",
44
- "notes": "Argument order positional per canonical schema's required-first then optional declaration order. Camera/vision tools (capture_photo, describe_scene) removed in v6. list_alarms removed in v7 — users cannot query alarms; out-of-scope alarm-query prompts route via respond()."
 
45
  }
 
1
  {
2
+ "version": "0.4.0",
3
+ "description": "Compressed token map for FunctionGemma CPU inference on SL2619. v9: 8 tools (set_status_led, blink_status_led, set_neopixel_effect, play_buzzer, set_alarm, cancel_alarm, get_system_status, respond) + <end> terminator. Trained Octopus v2 style functional tokens are the entire output vocabulary the model uses for routing; the tool schema (tools.json) is NOT loaded into the inference prompt. v9 drops the unused <tool_none> sentinel that v8's training pipeline reserved but never emitted.",
4
  "tokens": {
5
+ "set_status_led": "<tool_0>",
6
+ "blink_status_led": "<tool_1>",
7
+ "set_neopixel_effect": "<tool_2>",
8
+ "play_buzzer": "<tool_3>",
9
+ "set_alarm": "<tool_4>",
10
+ "cancel_alarm": "<tool_5>",
11
+ "get_system_status": "<tool_6>",
12
+ "respond": "<tool_7>"
 
 
13
  },
14
  "reverse": {
15
+ "<tool_0>": "set_status_led",
16
+ "<tool_1>": "blink_status_led",
17
+ "<tool_2>": "set_neopixel_effect",
18
+ "<tool_3>": "play_buzzer",
19
+ "<tool_4>": "set_alarm",
20
+ "<tool_5>": "cancel_alarm",
21
+ "<tool_6>": "get_system_status",
22
+ "<tool_7>": "respond"
 
 
 
23
  },
24
  "special_tokens": [
25
  "<tool_0>",
 
30
  "<tool_5>",
31
  "<tool_6>",
32
  "<tool_7>",
 
 
 
33
  "<end>"
34
  ],
35
  "output_format": "<tool_N>(\"arg1\",\"arg2\",...)<end>",
36
+ "prompt_format": "<start_of_turn>user\\n{user_text}<end_of_turn>\\n<start_of_turn>model\\n",
37
+ "notes": "Argument order positional per canonical schema's required-first then optional declaration order. v9 trains Octopus v2 pure (no schema in prompt) — see prompt_format. set_neopixel_effect routing keyword: 'neopixels' (literal substring, case-insensitive) required in user prompt; otherwise the model routes to respond() asking the user to disambiguate (HAT status LEDs vs. neopixel ring)."
38
  }
tools.json CHANGED
@@ -1,95 +1,84 @@
1
  {
2
- "version": "0.2.0",
3
- "description": "Physical AI tool schema for Coral Dev Board (SL2619) FunctionGemma demo. Canonical mobile-actions format. v7: 10 tools (list_alarms removed; out-of-scope alarm-query prompts route via respond()).",
4
  "tools": [
5
  {
6
  "function": {
7
- "name": "turn_on_lights",
8
- "description": "Turn on all RGB LEDs and the Neopixel strip to default white.",
9
- "parameters": {
10
- "type": "OBJECT",
11
- "properties": {}
12
- }
13
- }
14
- },
15
- {
16
- "function": {
17
- "name": "turn_off_lights",
18
- "description": "Turn off all RGB LEDs and the Neopixel strip.",
19
- "parameters": {
20
- "type": "OBJECT",
21
- "properties": {}
22
- }
23
- }
24
- },
25
- {
26
- "function": {
27
- "name": "set_led_color",
28
- "description": "Set the color of RGB LEDs on the HAT or Neopixel strip.",
29
  "parameters": {
30
  "type": "OBJECT",
31
  "properties": {
32
- "color": {
33
  "type": "STRING",
34
- "description": "Color name (e.g. 'red', 'green', 'blue', 'white', 'orange', 'purple') or 6-digit hex (e.g. '#FF8800')."
35
  },
36
- "target": {
37
  "type": "STRING",
38
- "description": "Which lights to set: 'all' (default), 'hat' (RGB LEDs on HAT), or 'strip' (Neopixels)."
39
  },
40
  "brightness": {
41
  "type": "INTEGER",
42
- "description": "Brightness 0-100. Optional, default 100."
43
  }
44
  },
45
- "required": ["color"]
46
  }
47
  }
48
  },
49
  {
50
  "function": {
51
- "name": "blink_lights",
52
- "description": "Blink LEDs a given number of times, optionally in a specific color and speed.",
53
  "parameters": {
54
  "type": "OBJECT",
55
  "properties": {
 
 
 
 
56
  "count": {
57
  "type": "INTEGER",
58
  "description": "Number of blinks. Default 3."
59
  },
60
- "color": {
61
- "type": "STRING",
62
- "description": "Color to blink. Default current color or white."
63
- },
64
  "speed": {
65
  "type": "STRING",
66
  "description": "One of 'slow', 'normal', 'fast'. Default 'normal'."
67
  }
68
- }
 
69
  }
70
  }
71
  },
72
  {
73
  "function": {
74
- "name": "set_neopixel_pattern",
75
- "description": "Play an animated pattern on the Neopixel strip.",
76
  "parameters": {
77
  "type": "OBJECT",
78
  "properties": {
79
- "pattern": {
80
  "type": "STRING",
81
- "description": "One of 'rainbow', 'chase', 'fade', 'pulse', 'sparkle', 'solid'."
82
  },
83
  "color": {
84
  "type": "STRING",
85
- "description": "Color for patterns that need one (chase/fade/pulse/solid). Ignored for rainbow."
 
 
 
 
86
  },
87
  "speed": {
88
  "type": "STRING",
89
  "description": "One of 'slow', 'normal', 'fast'. Default 'normal'."
 
 
 
 
90
  }
91
  },
92
- "required": ["pattern"]
93
  }
94
  }
95
  },
@@ -165,7 +154,7 @@
165
  {
166
  "function": {
167
  "name": "respond",
168
- "description": "Reply to the user in natural language without taking any physical action. Use this when the request is out of scope (no matching tool), ambiguous (needs clarification), purely conversational (greetings, thanks), or impossible on this device. Do NOT use this when any physical-action tool fits the request.",
169
  "parameters": {
170
  "type": "OBJECT",
171
  "properties": {
 
1
  {
2
+ "version": "0.4.0",
3
+ "description": "Physical AI tool schema for Coral Dev Board (SL2619) FunctionGemma demo. Canonical mobile-actions format. v9: same 8-tool surface as v8, but the model is trained Octopus v2 style — functional tokens (<tool_0>..<tool_7>) emitted directly from a minimal user-only prompt with NO tool schema in the context. This shrinks the cold prefill from ~1088 prompt tokens (v7/v8 with the FunctionGemma developer turn) to ~13, taking on-board cold first-turn from ~57s to ~0.5s on a 2-core A55. The schema in this file is purely a developer/runtime contract (dispatcher arg validation, GGUF metadata, documentation) — it is NOT injected into the inference prompt. Surface routing keyword: 'neopixels' (literal) for the ring; 'LED' / 'light' / 'red light' / 'green light' / 'blue light' for the HAT status LEDs.",
4
  "tools": [
5
  {
6
  "function": {
7
+ "name": "set_status_led",
8
+ "description": "Turn one or all of the HAT status LEDs on or off. The HAT has three individual LEDs (red, green, blue), each independently addressable. Invoke when the user mentions 'LED', 'LEDs', or 'the <color> light'.",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  "parameters": {
10
  "type": "OBJECT",
11
  "properties": {
12
+ "led": {
13
  "type": "STRING",
14
+ "description": "Which LED: 'red', 'green', 'blue', or 'all'."
15
  },
16
+ "state": {
17
  "type": "STRING",
18
+ "description": "'on' or 'off'."
19
  },
20
  "brightness": {
21
  "type": "INTEGER",
22
+ "description": "Brightness 0-100. Optional, default 100. Ignored when state is 'off'."
23
  }
24
  },
25
+ "required": ["led", "state"]
26
  }
27
  }
28
  },
29
  {
30
  "function": {
31
+ "name": "blink_status_led",
32
+ "description": "Blink one or all HAT status LEDs a given number of times. Invoke for 'blink/flash the <color> light' or 'blink the LEDs' style requests.",
33
  "parameters": {
34
  "type": "OBJECT",
35
  "properties": {
36
+ "led": {
37
+ "type": "STRING",
38
+ "description": "Which LED: 'red', 'green', 'blue', or 'all'."
39
+ },
40
  "count": {
41
  "type": "INTEGER",
42
  "description": "Number of blinks. Default 3."
43
  },
 
 
 
 
44
  "speed": {
45
  "type": "STRING",
46
  "description": "One of 'slow', 'normal', 'fast'. Default 'normal'."
47
  }
48
+ },
49
+ "required": ["led"]
50
  }
51
  }
52
  },
53
  {
54
  "function": {
55
+ "name": "set_neopixel_effect",
56
+ "description": "Play a visual effect on the neopixel ring (48-pixel WS2812B ring around the 7\" display, driven by WLED). Only invoke when the user explicitly mentions 'neopixels'. Use effect='off' to turn the ring off.",
57
  "parameters": {
58
  "type": "OBJECT",
59
  "properties": {
60
+ "effect": {
61
  "type": "STRING",
62
+ "description": "One of: 'solid', 'pulse', 'fade', 'chase', 'rainbow', 'sparkle', 'off', 'aurora', 'plasma', 'comet', 'twinkle', 'fireworks', 'police', 'heartbeat', 'loading', 'lightning', 'glitter', 'fire', 'sunrise'."
63
  },
64
  "color": {
65
  "type": "STRING",
66
+ "description": "Color name (e.g. 'red', 'teal', 'amber', 'gold', 'violet') or 6-digit hex like '#FF8800'. Used by effects that take a primary color (solid, pulse, fade, chase, sparkle, comet). Ignored for rainbow and palette-driven effects."
67
+ },
68
+ "palette": {
69
+ "type": "STRING",
70
+ "description": "Color palette: 'auto', 'ocean', 'lava', 'forest', 'sunset', 'party', 'sherbet', 'c9', 'aurora', 'beach', 'fire', 'sakura', 'splash', 'pastel'. Most useful with aurora, plasma, fire, twinkle, comet."
71
  },
72
  "speed": {
73
  "type": "STRING",
74
  "description": "One of 'slow', 'normal', 'fast'. Default 'normal'."
75
+ },
76
+ "intensity": {
77
+ "type": "STRING",
78
+ "description": "One of 'low', 'medium', 'high'. Default 'medium'. Controls effect density / depth (sparkle density, fire height, comet tail length, aurora width)."
79
  }
80
  },
81
+ "required": ["effect"]
82
  }
83
  }
84
  },
 
154
  {
155
  "function": {
156
  "name": "respond",
157
+ "description": "Reply to the user in natural language without taking any physical action. Use this when the request is out of scope (no matching tool), ambiguous (needs clarification — e.g. surface keyword missing for LED requests), purely conversational (greetings, thanks), or impossible on this device. Do NOT use this when any physical-action tool fits the request.",
158
  "parameters": {
159
  "type": "OBJECT",
160
  "properties": {