--- license: gemma license_link: https://ai.google.dev/gemma/terms base_model: google/functiongemma-270m-it language: - en tags: - function-calling - edge - on-device - physical-ai - iot - octopus-v2 - synaptics-sl2619 - gemma3 pipeline_tag: text-generation inference: false --- # FunctionGemma 270M — Physical AI (v10, Octopus v2) Fine-tuned [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it) for voice-controlled physical-AI / household-IoT actions on a Synaptics SL2619 "Coral" edge board (Google IO 2026 demo). **Current revision:** [`functiongemma-physical-ai-v10-Q5_K_M.gguf`](./functiongemma-physical-ai-v10-Q5_K_M.gguf) — 6 tools, ~248 MB Q5_K_M, ~0.48 s cold prefill on the 2-core Cortex-A55, 97.9 % mean token accuracy on eval. Schema ships as [`tools.json`](./tools.json). Token-to-tool mapping is in [`token_map.json`](./token_map.json). ## Tool surface (6 tools) | Token | Name | Args | Purpose | |---|---|---|---| | `` | `set_lights` | `color?`, `effect?`, `state?` | Drive whatever lights are connected — HAT 3-LED indicators or a WLED-driven addressable strip / ring. All three args optional; the model emits only what the user implied. | | `` | `play_buzzer` | `pattern` | Named pattern on the piezo buzzer: `beep`, `double_beep`, `chirp`, `siren`, `alarm`, `success`, `error`. | | `` | `set_alarm` | `duration` or `time`, `label?` | Schedule an alarm. Fires the buzzer plus a visible flash. | | `` | `cancel_alarm` | `label?` | Cancel one alarm by label, or all if no label given. | | `` | `get_system_status` | `metric` | `cpu`, `memory`, `temperature`, `npu`, or `all`. | | `` | `respond` | `message` | Natural-language reply when no physical-action tool fits, or when the request is ambiguous and the model needs to ask for clarification. | The model is **hardware-agnostic** for lighting: it parses user intent into semantic args (`color`, `effect`, `state`) and leaves the dispatcher to map those onto whatever LED hardware is detected at launch — the HAT's three indicator LEDs, a WLED-driven strip, or a Neopixel ring. The user vocabulary is hardware-agnostic too: "lights", "LEDs", "strip", "indicators" all refer to whatever is wired up. ## Prompt format The v10 model is trained [Octopus v2](https://arxiv.org/abs/2404.01744) style: no schema, no tools list, just a bare user turn. ``` user {user_text} model ``` Tool semantics live in the model weights (via the special functional tokens `` … `` plus ``), not in the prompt. The `tools.json` schema in this repo is the dispatcher's arg-validation contract and is embedded in the GGUF metadata for schema-drift checks, but it is **not** loaded into the inference prompt. Typical prompts are ~13 tokens. ## Output format — functional tokens, named args Tool calls emit as **functional tokens with named arguments**, per the Mercedes-Benz Octopus v2 convention ([arXiv 2501.02342](https://arxiv.org/abs/2501.02342)). Each tool name compiles to a single special-vocabulary token (`` … ``); arguments are written as `name="value"` pairs; a single `` token terminates the call. The model emits **only the args the user implied** — absent args are simply not present. Examples: | User says | Model emits | Resolves to | |---|---|---| | `turn the lights red` | `(color="red")` | `set_lights(color="red")` | | `rainbow on the strip` | `(effect="rainbow")` | `set_lights(effect="rainbow")` | | `lights off` | `(state="off")` | `set_lights(state="off")` | | `red sparkle` | `(color="red", effect="sparkle")` | `set_lights(color="red", effect="sparkle")` | | `set an alarm in 5 minutes` | `(duration="5 minutes")` | `set_alarm(duration="5 minutes")` | | `cancel all alarms` | `()` | `cancel_alarm()` | | `what's the cpu` | `(metric="cpu")` | `get_system_status(metric="cpu")` | | `good morning` | `(message="Good morning. ...")` | `respond(message="...")` | A complete call decodes in roughly 8–20 output tokens, well inside the sub-second voice-UX budget on a 2-core Cortex-A55. > ⚠️ Inference servers MUST stop generation on `` (or > ``), NOT on ``. The model can emit multi-tool sequences > `(args)(args)`, so stopping at the first > `` truncates legitimate multi-tool output. ## Quick start (Ollama) ```bash hf download BrinqAI/functiongemma-270m-physical-ai \ functiongemma-physical-ai-v10-Q5_K_M.gguf Modelfile tools.json token_map.json \ --local-dir ./fg-physical-ai cd fg-physical-ai ollama create functiongemma-physical-ai -f Modelfile ``` The shipped `Modelfile` bakes in the stop tokens (``, ``) and decode parameters (`temperature=0`, `num_ctx=1024`, `num_predict=80`). ## Calling the model Send a **bare user turn** — no schema, no tools list. With Ollama, use `raw=true`: ```python import json import re import urllib.request OLLAMA_URL = "http://localhost:11434" MODEL = "functiongemma-physical-ai" reverse_token_map = json.load(open("token_map.json"))["reverse"] NAMED_ARG_RE = re.compile(r'(\w+)\s*=\s*"((?:[^"\\]|\\.)*)"') def build_prompt(user_text: str) -> str: return ( f"user\n{user_text}\n" f"model\n" ) def call_model(user_text: str) -> str: body = json.dumps({ "model": MODEL, "prompt": build_prompt(user_text), "raw": True, "stream": False, "options": { "temperature": 0.0, "top_p": 1.0, "num_predict": 80, "stop": ["", ""], }, }).encode() req = urllib.request.Request( f"{OLLAMA_URL}/api/generate", data=body, headers={"Content-Type": "application/json"}, ) with urllib.request.urlopen(req, timeout=60) as resp: return json.loads(resp.read())["response"] def parse_call(raw: str) -> tuple[str | None, dict[str, str]]: """Return (tool_name, kwargs). tool_name is None on parse fail.""" m = re.match(r"\s*()\((.*?)\)", raw) if not m: return None, {} tok, body = m.group(1), m.group(2) kwargs = {k: v for k, v in NAMED_ARG_RE.findall(body)} return reverse_token_map.get(tok), kwargs raw = call_model("turn the lights red") print(raw) # e.g. '(color="red")' print(parse_call(raw)) # ('set_lights', {'color': 'red'}) ``` For `llama-cpp-python` directly, use `detokenize(..., special=True)` so the `` and `` tokens render in the output instead of being stripped. ## Training data Training data was generated from Haiku-authored phrasing templates crossed with deterministic entity pools, then lightly augmented with Moonshine-flavored ASR noise (dropped function words, lowercased traces, filler-word prepends). Each record is a flat `{input, output}` pair — no tools / messages array, no chat template. | | | |---|---| | Train rows | 5,222 | | Eval rows | 920 | | Tools | 6 | | Per-template entity expansion | color × effect × state pools for `set_lights`; pattern pool for `play_buzzer`; duration / time pools for `set_alarm`; metric pool for `get_system_status` | | ASR-style augmentation | Moonshine-sim noise on a fraction of records (dropped articles, lowercased traces, filler prepends) | | Multi-tool fraction | None — single-tool emphasis; multi-tool routines composed at dispatch time | The `set_lights` tool also gets explicit **failure-mode rows** that route bare ambiguous prompts to `respond()` — e.g. "rainbow" alone ("Did you mean the lights? Try 'rainbow on the lights'."), "siren" alone (prompts the user toward `play_buzzer`), and bare "on" / "off" (asks what the user wants to act on). ## Methodology - **Full bf16 fine-tune** (no LoRA). - **Functional tokens**: `` … `` + `` added as `additional_special_tokens`; new embeddings **mean-initialized** from the existing input-embedding matrix (random init under-converges on small datasets at this scale). - **Completion-only loss mask**: hand-rolled — labels before `model\n` are masked to `-100`. The model learns only from the assistant turn, not the user prompt. - **5 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup, weight decay 0.01. - **Effective batch = 16** (`per_device_train_batch_size=8 × gradient_accumulation_steps=2`). - **`max_length=256`** — the trained prompt format is ~13 tokens and the assistant turn fits comfortably under 64 tokens, including `respond()` messages. - bf16, gradient checkpointing, `adamw_torch_fused`, `metric_for_best_model="eval_loss"` + `load_best_model_at_end=True`. - Training wallclock: **5 min on a single H100** (~15–20 min on a 4090). ### Citation ```bibtex @article{chen2024octopusv2, title = {Octopus v2: On-device language model for super agent}, author = {Chen, Wei and Li, Zhiyuan}, journal = {arXiv preprint arXiv:2404.01744}, year = {2024}, url = {https://arxiv.org/abs/2404.01744} } @article{merc2025octopusv2, title = {Octopus v2 named-arg function calling}, journal = {arXiv preprint arXiv:2501.02342}, year = {2025}, url = {https://arxiv.org/abs/2501.02342} } ``` ## Results ### Training metrics (final epoch) | | | |---|---| | Final train loss | 0.493 | | Final eval loss | **0.046** | | Mean token accuracy (eval) | **97.9 %** | ### Held-out smoke test (post-train, 36 prompts spanning all 6 tools) | | | |---|---| | Smoke-test routing accuracy | **35 / 36 (97.2 %)** | The 36-prompt suite covers single-tool happy paths for every tool plus failure modes the model is expected to deflect: ambiguous color words without a target ("make it red"), effect names without a target ("rainbow"), unsupported features ("play a tone at 2000 hz"), and out-of-scope appliances. Failure-mode prompts all route to `respond()` with a helpful clarification message. ### On-device benchmark (Coralboard, 2-core Cortex-A55 @ 2 GHz, Q5_K_M GGUF) Measured with `llama-cpp-python` 0.3.16, `n_ctx=1024`, `n_threads=2`, CPU governor `performance`, 8 representative prompts spanning all 6 tools. | | | |---|---| | Model load | 2.23 s | | Prompt tokens | 11–16 (mean ~13) | | **Cold prefill (turn 1)** | **0.48 s** | | Warm prefill (turn 2+, avg) | 0.47 s | | Decode rate | **~9.7 tok/s** | | Decode time, typical tool call (3–8 output tokens) | 0.3–0.8 s | | Decode time, `respond()` (~25 output tokens) | ~2.6 s | | End-to-end first turn (model load + prefill + decode) | ~3.4 s | ## Files ``` functiongemma-physical-ai-v10-Q5_K_M.gguf # ~248 MB, Q5_K_M weights (Ollama / llama.cpp) Modelfile # Ollama Modelfile (functional-token format) tools.json # 6-tool schema, canonical mobile-actions format token_map.json # functional-token <-> tool-name map README.md # this file ``` Earlier checkpoint GGUFs from the project's development history (`functiongemma-physical-ai-v9-Q5_K_M.gguf`, `functiongemma-physical-ai-v7-Q5_K_M.gguf`, `functiongemma-physical-ai-v6-Q5_K_M.gguf`, `functiongemma-physical-ai-Q4_K_M.gguf`) remain in the repo for reproducibility. They use different tool surfaces and (for v7 and earlier) a different inference-prompt format; new deployments should use the v10 file above. ## License Released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms). By using this model you agree to those terms. Base model: [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it). ## Links - Base model: - Octopus v2 paper: - Mercedes-Benz Octopus v2 (named-arg variant): - Hardware demo + integration code (Synaptics Coralboard, Grinn HAT, WLED-over-USB-CDC, full PyQt UI): → `Function_calling/`