File size: 12,256 Bytes
b6ab551
 
 
 
 
 
 
 
 
 
e515857
 
 
b6ab551
 
 
 
 
 
2a22670
b6ab551
 
6d659ec
 
b6ab551
2a22670
 
 
6d659ec
2a22670
 
b6ab551
2a22670
eef4acc
2a22670
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eef4acc
2a22670
 
 
 
eef4acc
2a22670
eef4acc
2a22670
 
 
 
 
 
eef4acc
2a22670
eef4acc
2a22670
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eef4acc
 
17c8362
e515857
17c8362
 
e515857
2a22670
e515857
 
 
 
17c8362
 
2a22670
 
 
598a857
 
b6ab551
2a22670
 
598a857
 
 
 
 
 
 
 
 
 
 
2a22670
 
598a857
 
eef4acc
 
 
598a857
 
 
 
 
 
 
 
 
 
 
 
 
eef4acc
598a857
 
 
 
 
 
 
 
 
 
 
2a22670
 
598a857
 
2a22670
 
 
 
598a857
 
2a22670
 
 
598a857
e515857
eef4acc
 
 
 
b6ab551
 
2a22670
eef4acc
 
2a22670
 
eef4acc
2a22670
eef4acc
2a22670
 
 
 
 
 
 
 
 
 
 
 
b6ab551
 
 
eef4acc
2a22670
 
 
 
eef4acc
2a22670
 
 
 
 
 
 
 
 
 
 
eef4acc
b6ab551
e515857
 
 
 
 
 
 
 
 
 
2a22670
 
 
 
 
 
 
e515857
b6ab551
eef4acc
b6ab551
eef4acc
3947888
2a22670
eef4acc
2a22670
 
eef4acc
b6ab551
2a22670
b6ab551
2a22670
eef4acc
2a22670
b6ab551
2a22670
 
 
 
 
 
3947888
eef4acc
3947888
2a22670
 
 
 
 
 
 
 
 
 
 
 
 
 
3947888
b6ab551
 
 
2a22670
eef4acc
2a22670
 
35645d8
b6ab551
 
2a22670
 
 
 
 
 
 
 
eef4acc
b6ab551
 
 
 
 
 
 
 
 
e515857
2a22670
eef4acc
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
---
license: gemma
license_link: https://ai.google.dev/gemma/terms
base_model: google/functiongemma-270m-it
language:
  - en
tags:
  - function-calling
  - edge
  - on-device
  - physical-ai
  - iot
  - octopus-v2
  - synaptics-sl2619
  - gemma3
pipeline_tag: text-generation
inference: false
---

# FunctionGemma 270M β€” Physical AI (v10, Octopus v2)

Fine-tuned [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it)
for voice-controlled physical-AI / household-IoT actions on a Synaptics
SL2619 "Coral" edge board (Google IO 2026 demo).

**Current revision:** [`functiongemma-physical-ai-v10-Q5_K_M.gguf`](./functiongemma-physical-ai-v10-Q5_K_M.gguf)
β€” 6 tools, ~248 MB Q5_K_M, ~0.48 s cold prefill on the 2-core
Cortex-A55, 97.9 % mean token accuracy on eval.

Schema ships as [`tools.json`](./tools.json). Token-to-tool mapping is
in [`token_map.json`](./token_map.json).

## Tool surface (6 tools)

| Token | Name | Args | Purpose |
|---|---|---|---|
| `<tool_0>` | `set_lights` | `color?`, `effect?`, `state?` | Drive whatever lights are connected β€” HAT 3-LED indicators or a WLED-driven addressable strip / ring. All three args optional; the model emits only what the user implied. |
| `<tool_1>` | `play_buzzer` | `pattern` | Named pattern on the piezo buzzer: `beep`, `double_beep`, `chirp`, `siren`, `alarm`, `success`, `error`. |
| `<tool_2>` | `set_alarm` | `duration` or `time`, `label?` | Schedule an alarm. Fires the buzzer plus a visible flash. |
| `<tool_3>` | `cancel_alarm` | `label?` | Cancel one alarm by label, or all if no label given. |
| `<tool_4>` | `get_system_status` | `metric` | `cpu`, `memory`, `temperature`, `npu`, or `all`. |
| `<tool_5>` | `respond` | `message` | Natural-language reply when no physical-action tool fits, or when the request is ambiguous and the model needs to ask for clarification. |

The model is **hardware-agnostic** for lighting: it parses user intent
into semantic args (`color`, `effect`, `state`) and leaves the dispatcher
to map those onto whatever LED hardware is detected at launch β€” the
HAT's three indicator LEDs, a WLED-driven strip, or a Neopixel ring. The
user vocabulary is hardware-agnostic too: "lights", "LEDs", "strip",
"indicators" all refer to whatever is wired up.

## Prompt format

The v10 model is trained
[Octopus v2](https://arxiv.org/abs/2404.01744) style: no schema, no
tools list, just a bare user turn.

```
<start_of_turn>user
{user_text}<end_of_turn>
<start_of_turn>model

```

Tool semantics live in the model weights (via the special functional
tokens `<tool_0>` … `<tool_5>` plus `<end>`), not in the prompt. The
`tools.json` schema in this repo is the dispatcher's arg-validation
contract and is embedded in the GGUF metadata for schema-drift checks,
but it is **not** loaded into the inference prompt. Typical prompts are
~13 tokens.

## Output format β€” functional tokens, named args

Tool calls emit as **functional tokens with named arguments**, per the
Mercedes-Benz Octopus v2 convention
([arXiv 2501.02342](https://arxiv.org/abs/2501.02342)). Each tool name
compiles to a single special-vocabulary token (`<tool_0>` … `<tool_5>`);
arguments are written as `name="value"` pairs; a single `<end>` token
terminates the call. The model emits **only the args the user implied**
β€” absent args are simply not present.

Examples:

| User says | Model emits | Resolves to |
|---|---|---|
| `turn the lights red` | `<tool_0>(color="red")<end>` | `set_lights(color="red")` |
| `rainbow on the strip` | `<tool_0>(effect="rainbow")<end>` | `set_lights(effect="rainbow")` |
| `lights off` | `<tool_0>(state="off")<end>` | `set_lights(state="off")` |
| `red sparkle` | `<tool_0>(color="red", effect="sparkle")<end>` | `set_lights(color="red", effect="sparkle")` |
| `set an alarm in 5 minutes` | `<tool_2>(duration="5 minutes")<end>` | `set_alarm(duration="5 minutes")` |
| `cancel all alarms` | `<tool_3>()<end>` | `cancel_alarm()` |
| `what's the cpu` | `<tool_4>(metric="cpu")<end>` | `get_system_status(metric="cpu")` |
| `good morning` | `<tool_5>(message="Good morning. ...")<end>` | `respond(message="...")` |

A complete call decodes in roughly 8–20 output tokens, well inside the
sub-second voice-UX budget on a 2-core Cortex-A55.

> ⚠️ Inference servers MUST stop generation on `<end_of_turn>` (or
> `<eos>`), NOT on `<end>`. The model can emit multi-tool sequences
> `<tool_A>(args)<end><tool_B>(args)<end>`, so stopping at the first
> `<end>` truncates legitimate multi-tool output.

## Quick start (Ollama)

```bash
hf download BrinqAI/functiongemma-270m-physical-ai \
  functiongemma-physical-ai-v10-Q5_K_M.gguf Modelfile tools.json token_map.json \
  --local-dir ./fg-physical-ai

cd fg-physical-ai
ollama create functiongemma-physical-ai -f Modelfile
```

The shipped `Modelfile` bakes in the stop tokens (`<end_of_turn>`,
`<eos>`) and decode parameters (`temperature=0`, `num_ctx=1024`,
`num_predict=80`).

## Calling the model

Send a **bare user turn** β€” no schema, no tools list. With Ollama, use
`raw=true`:

```python
import json
import re
import urllib.request

OLLAMA_URL = "http://localhost:11434"
MODEL = "functiongemma-physical-ai"

reverse_token_map = json.load(open("token_map.json"))["reverse"]

NAMED_ARG_RE = re.compile(r'(\w+)\s*=\s*"((?:[^"\\]|\\.)*)"')


def build_prompt(user_text: str) -> str:
    return (
        f"<start_of_turn>user\n{user_text}<end_of_turn>\n"
        f"<start_of_turn>model\n"
    )


def call_model(user_text: str) -> str:
    body = json.dumps({
        "model": MODEL,
        "prompt": build_prompt(user_text),
        "raw": True,
        "stream": False,
        "options": {
            "temperature": 0.0,
            "top_p": 1.0,
            "num_predict": 80,
            "stop": ["<end_of_turn>", "<eos>"],
        },
    }).encode()
    req = urllib.request.Request(
        f"{OLLAMA_URL}/api/generate",
        data=body,
        headers={"Content-Type": "application/json"},
    )
    with urllib.request.urlopen(req, timeout=60) as resp:
        return json.loads(resp.read())["response"]


def parse_call(raw: str) -> tuple[str | None, dict[str, str]]:
    """Return (tool_name, kwargs). tool_name is None on parse fail."""
    m = re.match(r"\s*(<tool_\d+>)\((.*?)\)<end>", raw)
    if not m:
        return None, {}
    tok, body = m.group(1), m.group(2)
    kwargs = {k: v for k, v in NAMED_ARG_RE.findall(body)}
    return reverse_token_map.get(tok), kwargs


raw = call_model("turn the lights red")
print(raw)               # e.g. '<tool_0>(color="red")<end>'
print(parse_call(raw))   # ('set_lights', {'color': 'red'})
```

For `llama-cpp-python` directly, use `detokenize(..., special=True)` so
the `<tool_N>` and `<end>` tokens render in the output instead of being
stripped.

## Training data

Training data was generated from Haiku-authored phrasing templates
crossed with deterministic entity pools, then lightly augmented with
Moonshine-flavored ASR noise (dropped function words, lowercased traces,
filler-word prepends). Each record is a flat `{input, output}` pair β€”
no tools / messages array, no chat template.

|  |  |
|---|---|
| Train rows | 5,222 |
| Eval rows | 920 |
| Tools | 6 |
| Per-template entity expansion | color Γ— effect Γ— state pools for `set_lights`; pattern pool for `play_buzzer`; duration / time pools for `set_alarm`; metric pool for `get_system_status` |
| ASR-style augmentation | Moonshine-sim noise on a fraction of records (dropped articles, lowercased traces, filler prepends) |
| Multi-tool fraction | None β€” single-tool emphasis; multi-tool routines composed at dispatch time |

The `set_lights` tool also gets explicit **failure-mode rows** that
route bare ambiguous prompts to `respond()` β€” e.g. "rainbow" alone
("Did you mean the lights? Try 'rainbow on the lights'."), "siren" alone
(prompts the user toward `play_buzzer`), and bare "on" / "off"
(asks what the user wants to act on).

## Methodology

- **Full bf16 fine-tune** (no LoRA).
- **Functional tokens**: `<tool_0>` … `<tool_5>` + `<end>` added as
  `additional_special_tokens`; new embeddings **mean-initialized** from
  the existing input-embedding matrix (random init under-converges on
  small datasets at this scale).
- **Completion-only loss mask**: hand-rolled β€” labels before
  `<start_of_turn>model\n` are masked to `-100`. The model learns only
  from the assistant turn, not the user prompt.
- **5 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup, weight decay
  0.01.
- **Effective batch = 16**
  (`per_device_train_batch_size=8 Γ— gradient_accumulation_steps=2`).
- **`max_length=256`** β€” the trained prompt format is ~13 tokens and
  the assistant turn fits comfortably under 64 tokens, including
  `respond()` messages.
- bf16, gradient checkpointing, `adamw_torch_fused`,
  `metric_for_best_model="eval_loss"` + `load_best_model_at_end=True`.
- Training wallclock: **5 min on a single H100** (~15–20 min on a 4090).

### Citation

```bibtex
@article{chen2024octopusv2,
  title   = {Octopus v2: On-device language model for super agent},
  author  = {Chen, Wei and Li, Zhiyuan},
  journal = {arXiv preprint arXiv:2404.01744},
  year    = {2024},
  url     = {https://arxiv.org/abs/2404.01744}
}

@article{merc2025octopusv2,
  title   = {Octopus v2 named-arg function calling},
  journal = {arXiv preprint arXiv:2501.02342},
  year    = {2025},
  url     = {https://arxiv.org/abs/2501.02342}
}
```

## Results

### Training metrics (final epoch)

|  |  |
|---|---|
| Final train loss | 0.493 |
| Final eval loss | **0.046** |
| Mean token accuracy (eval) | **97.9 %** |

### Held-out smoke test (post-train, 36 prompts spanning all 6 tools)

|  |  |
|---|---|
| Smoke-test routing accuracy | **35 / 36 (97.2 %)** |

The 36-prompt suite covers single-tool happy paths for every tool plus
failure modes the model is expected to deflect: ambiguous color words
without a target ("make it red"), effect names without a target
("rainbow"), unsupported features ("play a tone at 2000 hz"), and
out-of-scope appliances. Failure-mode prompts all route to `respond()`
with a helpful clarification message.

### On-device benchmark (Coralboard, 2-core Cortex-A55 @ 2 GHz, Q5_K_M GGUF)

Measured with `llama-cpp-python` 0.3.16, `n_ctx=1024`, `n_threads=2`,
CPU governor `performance`, 8 representative prompts spanning all 6
tools.

|  |  |
|---|---|
| Model load | 2.23 s |
| Prompt tokens | 11–16 (mean ~13) |
| **Cold prefill (turn 1)** | **0.48 s** |
| Warm prefill (turn 2+, avg) | 0.47 s |
| Decode rate | **~9.7 tok/s** |
| Decode time, typical tool call (3–8 output tokens) | 0.3–0.8 s |
| Decode time, `respond()` (~25 output tokens) | ~2.6 s |
| End-to-end first turn (model load + prefill + decode) | ~3.4 s |

## Files

```
functiongemma-physical-ai-v10-Q5_K_M.gguf  # ~248 MB, Q5_K_M weights (Ollama / llama.cpp)
Modelfile                                  # Ollama Modelfile (functional-token format)
tools.json                                 # 6-tool schema, canonical mobile-actions format
token_map.json                             # functional-token <-> tool-name map
README.md                                  # this file
```

Earlier checkpoint GGUFs from the project's development history
(`functiongemma-physical-ai-v9-Q5_K_M.gguf`,
`functiongemma-physical-ai-v7-Q5_K_M.gguf`,
`functiongemma-physical-ai-v6-Q5_K_M.gguf`,
`functiongemma-physical-ai-Q4_K_M.gguf`) remain in the repo for
reproducibility. They use different tool surfaces and (for v7 and
earlier) a different inference-prompt format; new deployments should use
the v10 file above.

## License

Released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
By using this model you agree to those terms. Base model:
[`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it).

## Links

- Base model: <https://huggingface.co/google/functiongemma-270m-it>
- Octopus v2 paper: <https://arxiv.org/abs/2404.01744>
- Mercedes-Benz Octopus v2 (named-arg variant): <https://arxiv.org/abs/2501.02342>
- Hardware demo + integration code (Synaptics Coralboard, Grinn HAT,
  WLED-over-USB-CDC, full PyQt UI):
  <https://github.com/synaptics-astra-demos/sl2610-examples> β†’
  `Function_calling/`