---
license: gemma
license_link: https://ai.google.dev/gemma/terms
base_model: google/functiongemma-270m-it
language:
  - en
tags:
  - function-calling
  - edge
  - on-device
  - physical-ai
  - iot
  - octopus-v2
  - synaptics-sl2619
  - gemma3
pipeline_tag: text-generation
inference: false
---

# FunctionGemma 270M — Physical AI (v10, Octopus v2)

Fine-tuned [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it)
for voice-controlled physical-AI / household-IoT actions on a Synaptics
SL2619 "Coral" edge board (Google IO 2026 demo).

**Current revision:** [`functiongemma-physical-ai-v10-Q5_K_M.gguf`](./functiongemma-physical-ai-v10-Q5_K_M.gguf)
— 6 tools, ~248 MB Q5_K_M, ~0.48 s cold prefill on the 2-core
Cortex-A55, 97.9 % mean token accuracy on eval.

Schema ships as [`tools.json`](./tools.json). Token-to-tool mapping is
in [`token_map.json`](./token_map.json).

## Tool surface (6 tools)

| Token | Name | Args | Purpose |
|---|---|---|---|
| `<tool_0>` | `set_lights` | `color?`, `effect?`, `state?` | Drive whatever lights are connected — HAT 3-LED indicators or a WLED-driven addressable strip / ring. All three args optional; the model emits only what the user implied. |
| `<tool_1>` | `play_buzzer` | `pattern` | Named pattern on the piezo buzzer: `beep`, `double_beep`, `chirp`, `siren`, `alarm`, `success`, `error`. |
| `<tool_2>` | `set_alarm` | `duration` or `time`, `label?` | Schedule an alarm. Fires the buzzer plus a visible flash. |
| `<tool_3>` | `cancel_alarm` | `label?` | Cancel one alarm by label, or all if no label given. |
| `<tool_4>` | `get_system_status` | `metric` | `cpu`, `memory`, `temperature`, `npu`, or `all`. |
| `<tool_5>` | `respond` | `message` | Natural-language reply when no physical-action tool fits, or when the request is ambiguous and the model needs to ask for clarification. |

The model is **hardware-agnostic** for lighting: it parses user intent
into semantic args (`color`, `effect`, `state`) and leaves the dispatcher
to map those onto whatever LED hardware is detected at launch — the
HAT's three indicator LEDs, a WLED-driven strip, or a Neopixel ring. The
user vocabulary is hardware-agnostic too: "lights", "LEDs", "strip",
"indicators" all refer to whatever is wired up.

## Prompt format

The v10 model is trained
[Octopus v2](https://arxiv.org/abs/2404.01744) style: no schema, no
tools list, just a bare user turn.

```
<start_of_turn>user
{user_text}<end_of_turn>
<start_of_turn>model

```

Tool semantics live in the model weights (via the special functional
tokens `<tool_0>` … `<tool_5>` plus `<end>`), not in the prompt. The
`tools.json` schema in this repo is the dispatcher's arg-validation
contract and is embedded in the GGUF metadata for schema-drift checks,
but it is **not** loaded into the inference prompt. Typical prompts are
~13 tokens.

## Output format — functional tokens, named args

Tool calls emit as **functional tokens with named arguments**, per the
Mercedes-Benz Octopus v2 convention
([arXiv 2501.02342](https://arxiv.org/abs/2501.02342)). Each tool name
compiles to a single special-vocabulary token (`<tool_0>` … `<tool_5>`);
arguments are written as `name="value"` pairs; a single `<end>` token
terminates the call. The model emits **only the args the user implied**
— absent args are simply not present.

Examples:

| User says | Model emits | Resolves to |
|---|---|---|
| `turn the lights red` | `<tool_0>(color="red")<end>` | `set_lights(color="red")` |
| `rainbow on the strip` | `<tool_0>(effect="rainbow")<end>` | `set_lights(effect="rainbow")` |
| `lights off` | `<tool_0>(state="off")<end>` | `set_lights(state="off")` |
| `red sparkle` | `<tool_0>(color="red", effect="sparkle")<end>` | `set_lights(color="red", effect="sparkle")` |
| `set an alarm in 5 minutes` | `<tool_2>(duration="5 minutes")<end>` | `set_alarm(duration="5 minutes")` |
| `cancel all alarms` | `<tool_3>()<end>` | `cancel_alarm()` |
| `what's the cpu` | `<tool_4>(metric="cpu")<end>` | `get_system_status(metric="cpu")` |
| `good morning` | `<tool_5>(message="Good morning. ...")<end>` | `respond(message="...")` |

A complete call decodes in roughly 8–20 output tokens, well inside the
sub-second voice-UX budget on a 2-core Cortex-A55.

> ⚠️ Inference servers MUST stop generation on `<end_of_turn>` (or
> `<eos>`), NOT on `<end>`. The model can emit multi-tool sequences
> `<tool_A>(args)<end><tool_B>(args)<end>`, so stopping at the first
> `<end>` truncates legitimate multi-tool output.

## Quick start (Ollama)

```bash
hf download BrinqAI/functiongemma-270m-physical-ai \
  functiongemma-physical-ai-v10-Q5_K_M.gguf Modelfile tools.json token_map.json \
  --local-dir ./fg-physical-ai

cd fg-physical-ai
ollama create functiongemma-physical-ai -f Modelfile
```

The shipped `Modelfile` bakes in the stop tokens (`<end_of_turn>`,
`<eos>`) and decode parameters (`temperature=0`, `num_ctx=1024`,
`num_predict=80`).

## Calling the model

Send a **bare user turn** — no schema, no tools list. With Ollama, use
`raw=true`:

```python
import json
import re
import urllib.request

OLLAMA_URL = "http://localhost:11434"
MODEL = "functiongemma-physical-ai"

reverse_token_map = json.load(open("token_map.json"))["reverse"]

NAMED_ARG_RE = re.compile(r'(\w+)\s*=\s*"((?:[^"\\]|\\.)*)"')


def build_prompt(user_text: str) -> str:
    return (
        f"<start_of_turn>user\n{user_text}<end_of_turn>\n"
        f"<start_of_turn>model\n"
    )


def call_model(user_text: str) -> str:
    body = json.dumps({
        "model": MODEL,
        "prompt": build_prompt(user_text),
        "raw": True,
        "stream": False,
        "options": {
            "temperature": 0.0,
            "top_p": 1.0,
            "num_predict": 80,
            "stop": ["<end_of_turn>", "<eos>"],
        },
    }).encode()
    req = urllib.request.Request(
        f"{OLLAMA_URL}/api/generate",
        data=body,
        headers={"Content-Type": "application/json"},
    )
    with urllib.request.urlopen(req, timeout=60) as resp:
        return json.loads(resp.read())["response"]


def parse_call(raw: str) -> tuple[str | None, dict[str, str]]:
    """Return (tool_name, kwargs). tool_name is None on parse fail."""
    m = re.match(r"\s*(<tool_\d+>)\((.*?)\)<end>", raw)
    if not m:
        return None, {}
    tok, body = m.group(1), m.group(2)
    kwargs = {k: v for k, v in NAMED_ARG_RE.findall(body)}
    return reverse_token_map.get(tok), kwargs


raw = call_model("turn the lights red")
print(raw)               # e.g. '<tool_0>(color="red")<end>'
print(parse_call(raw))   # ('set_lights', {'color': 'red'})
```

For `llama-cpp-python` directly, use `detokenize(..., special=True)` so
the `<tool_N>` and `<end>` tokens render in the output instead of being
stripped.

## Training data

Training data was generated from Haiku-authored phrasing templates
crossed with deterministic entity pools, then lightly augmented with
Moonshine-flavored ASR noise (dropped function words, lowercased traces,
filler-word prepends). Each record is a flat `{input, output}` pair —
no tools / messages array, no chat template.

|  |  |
|---|---|
| Train rows | 5,222 |
| Eval rows | 920 |
| Tools | 6 |
| Per-template entity expansion | color × effect × state pools for `set_lights`; pattern pool for `play_buzzer`; duration / time pools for `set_alarm`; metric pool for `get_system_status` |
| ASR-style augmentation | Moonshine-sim noise on a fraction of records (dropped articles, lowercased traces, filler prepends) |
| Multi-tool fraction | None — single-tool emphasis; multi-tool routines composed at dispatch time |

The `set_lights` tool also gets explicit **failure-mode rows** that
route bare ambiguous prompts to `respond()` — e.g. "rainbow" alone
("Did you mean the lights? Try 'rainbow on the lights'."), "siren" alone
(prompts the user toward `play_buzzer`), and bare "on" / "off"
(asks what the user wants to act on).

## Methodology

- **Full bf16 fine-tune** (no LoRA).
- **Functional tokens**: `<tool_0>` … `<tool_5>` + `<end>` added as
  `additional_special_tokens`; new embeddings **mean-initialized** from
  the existing input-embedding matrix (random init under-converges on
  small datasets at this scale).
- **Completion-only loss mask**: hand-rolled — labels before
  `<start_of_turn>model\n` are masked to `-100`. The model learns only
  from the assistant turn, not the user prompt.
- **5 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup, weight decay
  0.01.
- **Effective batch = 16**
  (`per_device_train_batch_size=8 × gradient_accumulation_steps=2`).
- **`max_length=256`** — the trained prompt format is ~13 tokens and
  the assistant turn fits comfortably under 64 tokens, including
  `respond()` messages.
- bf16, gradient checkpointing, `adamw_torch_fused`,
  `metric_for_best_model="eval_loss"` + `load_best_model_at_end=True`.
- Training wallclock: **5 min on a single H100** (~15–20 min on a 4090).

### Citation

```bibtex
@article{chen2024octopusv2,
  title   = {Octopus v2: On-device language model for super agent},
  author  = {Chen, Wei and Li, Zhiyuan},
  journal = {arXiv preprint arXiv:2404.01744},
  year    = {2024},
  url     = {https://arxiv.org/abs/2404.01744}
}

@article{merc2025octopusv2,
  title   = {Octopus v2 named-arg function calling},
  journal = {arXiv preprint arXiv:2501.02342},
  year    = {2025},
  url     = {https://arxiv.org/abs/2501.02342}
}
```

## Results

### Training metrics (final epoch)

|  |  |
|---|---|
| Final train loss | 0.493 |
| Final eval loss | **0.046** |
| Mean token accuracy (eval) | **97.9 %** |

### Held-out smoke test (post-train, 36 prompts spanning all 6 tools)

|  |  |
|---|---|
| Smoke-test routing accuracy | **35 / 36 (97.2 %)** |

The 36-prompt suite covers single-tool happy paths for every tool plus
failure modes the model is expected to deflect: ambiguous color words
without a target ("make it red"), effect names without a target
("rainbow"), unsupported features ("play a tone at 2000 hz"), and
out-of-scope appliances. Failure-mode prompts all route to `respond()`
with a helpful clarification message.

### On-device benchmark (Coralboard, 2-core Cortex-A55 @ 2 GHz, Q5_K_M GGUF)

Measured with `llama-cpp-python` 0.3.16, `n_ctx=1024`, `n_threads=2`,
CPU governor `performance`, 8 representative prompts spanning all 6
tools.

|  |  |
|---|---|
| Model load | 2.23 s |
| Prompt tokens | 11–16 (mean ~13) |
| **Cold prefill (turn 1)** | **0.48 s** |
| Warm prefill (turn 2+, avg) | 0.47 s |
| Decode rate | **~9.7 tok/s** |
| Decode time, typical tool call (3–8 output tokens) | 0.3–0.8 s |
| Decode time, `respond()` (~25 output tokens) | ~2.6 s |
| End-to-end first turn (model load + prefill + decode) | ~3.4 s |

## Files

```
functiongemma-physical-ai-v10-Q5_K_M.gguf  # ~248 MB, Q5_K_M weights (Ollama / llama.cpp)
Modelfile                                  # Ollama Modelfile (functional-token format)
tools.json                                 # 6-tool schema, canonical mobile-actions format
token_map.json                             # functional-token <-> tool-name map
README.md                                  # this file
```

Earlier checkpoint GGUFs from the project's development history
(`functiongemma-physical-ai-v9-Q5_K_M.gguf`,
`functiongemma-physical-ai-v7-Q5_K_M.gguf`,
`functiongemma-physical-ai-v6-Q5_K_M.gguf`,
`functiongemma-physical-ai-Q4_K_M.gguf`) remain in the repo for
reproducibility. They use different tool surfaces and (for v7 and
earlier) a different inference-prompt format; new deployments should use
the v10 file above.

## License

Released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
By using this model you agree to those terms. Base model:
[`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it).

## Links

- Base model: <https://huggingface.co/google/functiongemma-270m-it>
- Octopus v2 paper: <https://arxiv.org/abs/2404.01744>
- Mercedes-Benz Octopus v2 (named-arg variant): <https://arxiv.org/abs/2501.02342>
- Hardware demo + integration code (Synaptics Coralboard, Grinn HAT,
  WLED-over-USB-CDC, full PyQt UI):
  <https://github.com/synaptics-astra-demos/sl2610-examples> →
  `Function_calling/`