Text Generation
ONNX
GGUF
English
function-calling
edge
on-device
physical-ai
iot
octopus-v2
synaptics-sl2619
gemma3
conversational
Instructions to use BrinqAI/functiongemma-270m-physical-ai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use BrinqAI/functiongemma-270m-physical-ai with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="BrinqAI/functiongemma-270m-physical-ai", filename="functiongemma-physical-ai-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use BrinqAI/functiongemma-270m-physical-ai with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Use Docker
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use BrinqAI/functiongemma-270m-physical-ai with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BrinqAI/functiongemma-270m-physical-ai" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BrinqAI/functiongemma-270m-physical-ai", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Ollama
How to use BrinqAI/functiongemma-270m-physical-ai with Ollama:
ollama run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Unsloth Studio new
How to use BrinqAI/functiongemma-270m-physical-ai with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
- Pi new
How to use BrinqAI/functiongemma-270m-physical-ai with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "BrinqAI/functiongemma-270m-physical-ai:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use BrinqAI/functiongemma-270m-physical-ai with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use BrinqAI/functiongemma-270m-physical-ai with Docker Model Runner:
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Lemonade
How to use BrinqAI/functiongemma-270m-physical-ai with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Run and chat with the model
lemonade run user.functiongemma-270m-physical-ai-Q4_K_M
List all available models
lemonade list
File size: 12,256 Bytes
b6ab551 e515857 b6ab551 2a22670 b6ab551 6d659ec b6ab551 2a22670 6d659ec 2a22670 b6ab551 2a22670 eef4acc 2a22670 eef4acc 2a22670 eef4acc 2a22670 eef4acc 2a22670 eef4acc 2a22670 eef4acc 2a22670 eef4acc 17c8362 e515857 17c8362 e515857 2a22670 e515857 17c8362 2a22670 598a857 b6ab551 2a22670 598a857 2a22670 598a857 eef4acc 598a857 eef4acc 598a857 2a22670 598a857 2a22670 598a857 2a22670 598a857 e515857 eef4acc b6ab551 2a22670 eef4acc 2a22670 eef4acc 2a22670 eef4acc 2a22670 b6ab551 eef4acc 2a22670 eef4acc 2a22670 eef4acc b6ab551 e515857 2a22670 e515857 b6ab551 eef4acc b6ab551 eef4acc 3947888 2a22670 eef4acc 2a22670 eef4acc b6ab551 2a22670 b6ab551 2a22670 eef4acc 2a22670 b6ab551 2a22670 3947888 eef4acc 3947888 2a22670 3947888 b6ab551 2a22670 eef4acc 2a22670 35645d8 b6ab551 2a22670 eef4acc b6ab551 e515857 2a22670 eef4acc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 | ---
license: gemma
license_link: https://ai.google.dev/gemma/terms
base_model: google/functiongemma-270m-it
language:
- en
tags:
- function-calling
- edge
- on-device
- physical-ai
- iot
- octopus-v2
- synaptics-sl2619
- gemma3
pipeline_tag: text-generation
inference: false
---
# FunctionGemma 270M β Physical AI (v10, Octopus v2)
Fine-tuned [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it)
for voice-controlled physical-AI / household-IoT actions on a Synaptics
SL2619 "Coral" edge board (Google IO 2026 demo).
**Current revision:** [`functiongemma-physical-ai-v10-Q5_K_M.gguf`](./functiongemma-physical-ai-v10-Q5_K_M.gguf)
β 6 tools, ~248 MB Q5_K_M, ~0.48 s cold prefill on the 2-core
Cortex-A55, 97.9 % mean token accuracy on eval.
Schema ships as [`tools.json`](./tools.json). Token-to-tool mapping is
in [`token_map.json`](./token_map.json).
## Tool surface (6 tools)
| Token | Name | Args | Purpose |
|---|---|---|---|
| `<tool_0>` | `set_lights` | `color?`, `effect?`, `state?` | Drive whatever lights are connected β HAT 3-LED indicators or a WLED-driven addressable strip / ring. All three args optional; the model emits only what the user implied. |
| `<tool_1>` | `play_buzzer` | `pattern` | Named pattern on the piezo buzzer: `beep`, `double_beep`, `chirp`, `siren`, `alarm`, `success`, `error`. |
| `<tool_2>` | `set_alarm` | `duration` or `time`, `label?` | Schedule an alarm. Fires the buzzer plus a visible flash. |
| `<tool_3>` | `cancel_alarm` | `label?` | Cancel one alarm by label, or all if no label given. |
| `<tool_4>` | `get_system_status` | `metric` | `cpu`, `memory`, `temperature`, `npu`, or `all`. |
| `<tool_5>` | `respond` | `message` | Natural-language reply when no physical-action tool fits, or when the request is ambiguous and the model needs to ask for clarification. |
The model is **hardware-agnostic** for lighting: it parses user intent
into semantic args (`color`, `effect`, `state`) and leaves the dispatcher
to map those onto whatever LED hardware is detected at launch β the
HAT's three indicator LEDs, a WLED-driven strip, or a Neopixel ring. The
user vocabulary is hardware-agnostic too: "lights", "LEDs", "strip",
"indicators" all refer to whatever is wired up.
## Prompt format
The v10 model is trained
[Octopus v2](https://arxiv.org/abs/2404.01744) style: no schema, no
tools list, just a bare user turn.
```
<start_of_turn>user
{user_text}<end_of_turn>
<start_of_turn>model
```
Tool semantics live in the model weights (via the special functional
tokens `<tool_0>` β¦ `<tool_5>` plus `<end>`), not in the prompt. The
`tools.json` schema in this repo is the dispatcher's arg-validation
contract and is embedded in the GGUF metadata for schema-drift checks,
but it is **not** loaded into the inference prompt. Typical prompts are
~13 tokens.
## Output format β functional tokens, named args
Tool calls emit as **functional tokens with named arguments**, per the
Mercedes-Benz Octopus v2 convention
([arXiv 2501.02342](https://arxiv.org/abs/2501.02342)). Each tool name
compiles to a single special-vocabulary token (`<tool_0>` β¦ `<tool_5>`);
arguments are written as `name="value"` pairs; a single `<end>` token
terminates the call. The model emits **only the args the user implied**
β absent args are simply not present.
Examples:
| User says | Model emits | Resolves to |
|---|---|---|
| `turn the lights red` | `<tool_0>(color="red")<end>` | `set_lights(color="red")` |
| `rainbow on the strip` | `<tool_0>(effect="rainbow")<end>` | `set_lights(effect="rainbow")` |
| `lights off` | `<tool_0>(state="off")<end>` | `set_lights(state="off")` |
| `red sparkle` | `<tool_0>(color="red", effect="sparkle")<end>` | `set_lights(color="red", effect="sparkle")` |
| `set an alarm in 5 minutes` | `<tool_2>(duration="5 minutes")<end>` | `set_alarm(duration="5 minutes")` |
| `cancel all alarms` | `<tool_3>()<end>` | `cancel_alarm()` |
| `what's the cpu` | `<tool_4>(metric="cpu")<end>` | `get_system_status(metric="cpu")` |
| `good morning` | `<tool_5>(message="Good morning. ...")<end>` | `respond(message="...")` |
A complete call decodes in roughly 8β20 output tokens, well inside the
sub-second voice-UX budget on a 2-core Cortex-A55.
> β οΈ Inference servers MUST stop generation on `<end_of_turn>` (or
> `<eos>`), NOT on `<end>`. The model can emit multi-tool sequences
> `<tool_A>(args)<end><tool_B>(args)<end>`, so stopping at the first
> `<end>` truncates legitimate multi-tool output.
## Quick start (Ollama)
```bash
hf download BrinqAI/functiongemma-270m-physical-ai \
functiongemma-physical-ai-v10-Q5_K_M.gguf Modelfile tools.json token_map.json \
--local-dir ./fg-physical-ai
cd fg-physical-ai
ollama create functiongemma-physical-ai -f Modelfile
```
The shipped `Modelfile` bakes in the stop tokens (`<end_of_turn>`,
`<eos>`) and decode parameters (`temperature=0`, `num_ctx=1024`,
`num_predict=80`).
## Calling the model
Send a **bare user turn** β no schema, no tools list. With Ollama, use
`raw=true`:
```python
import json
import re
import urllib.request
OLLAMA_URL = "http://localhost:11434"
MODEL = "functiongemma-physical-ai"
reverse_token_map = json.load(open("token_map.json"))["reverse"]
NAMED_ARG_RE = re.compile(r'(\w+)\s*=\s*"((?:[^"\\]|\\.)*)"')
def build_prompt(user_text: str) -> str:
return (
f"<start_of_turn>user\n{user_text}<end_of_turn>\n"
f"<start_of_turn>model\n"
)
def call_model(user_text: str) -> str:
body = json.dumps({
"model": MODEL,
"prompt": build_prompt(user_text),
"raw": True,
"stream": False,
"options": {
"temperature": 0.0,
"top_p": 1.0,
"num_predict": 80,
"stop": ["<end_of_turn>", "<eos>"],
},
}).encode()
req = urllib.request.Request(
f"{OLLAMA_URL}/api/generate",
data=body,
headers={"Content-Type": "application/json"},
)
with urllib.request.urlopen(req, timeout=60) as resp:
return json.loads(resp.read())["response"]
def parse_call(raw: str) -> tuple[str | None, dict[str, str]]:
"""Return (tool_name, kwargs). tool_name is None on parse fail."""
m = re.match(r"\s*(<tool_\d+>)\((.*?)\)<end>", raw)
if not m:
return None, {}
tok, body = m.group(1), m.group(2)
kwargs = {k: v for k, v in NAMED_ARG_RE.findall(body)}
return reverse_token_map.get(tok), kwargs
raw = call_model("turn the lights red")
print(raw) # e.g. '<tool_0>(color="red")<end>'
print(parse_call(raw)) # ('set_lights', {'color': 'red'})
```
For `llama-cpp-python` directly, use `detokenize(..., special=True)` so
the `<tool_N>` and `<end>` tokens render in the output instead of being
stripped.
## Training data
Training data was generated from Haiku-authored phrasing templates
crossed with deterministic entity pools, then lightly augmented with
Moonshine-flavored ASR noise (dropped function words, lowercased traces,
filler-word prepends). Each record is a flat `{input, output}` pair β
no tools / messages array, no chat template.
| | |
|---|---|
| Train rows | 5,222 |
| Eval rows | 920 |
| Tools | 6 |
| Per-template entity expansion | color Γ effect Γ state pools for `set_lights`; pattern pool for `play_buzzer`; duration / time pools for `set_alarm`; metric pool for `get_system_status` |
| ASR-style augmentation | Moonshine-sim noise on a fraction of records (dropped articles, lowercased traces, filler prepends) |
| Multi-tool fraction | None β single-tool emphasis; multi-tool routines composed at dispatch time |
The `set_lights` tool also gets explicit **failure-mode rows** that
route bare ambiguous prompts to `respond()` β e.g. "rainbow" alone
("Did you mean the lights? Try 'rainbow on the lights'."), "siren" alone
(prompts the user toward `play_buzzer`), and bare "on" / "off"
(asks what the user wants to act on).
## Methodology
- **Full bf16 fine-tune** (no LoRA).
- **Functional tokens**: `<tool_0>` β¦ `<tool_5>` + `<end>` added as
`additional_special_tokens`; new embeddings **mean-initialized** from
the existing input-embedding matrix (random init under-converges on
small datasets at this scale).
- **Completion-only loss mask**: hand-rolled β labels before
`<start_of_turn>model\n` are masked to `-100`. The model learns only
from the assistant turn, not the user prompt.
- **5 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup, weight decay
0.01.
- **Effective batch = 16**
(`per_device_train_batch_size=8 Γ gradient_accumulation_steps=2`).
- **`max_length=256`** β the trained prompt format is ~13 tokens and
the assistant turn fits comfortably under 64 tokens, including
`respond()` messages.
- bf16, gradient checkpointing, `adamw_torch_fused`,
`metric_for_best_model="eval_loss"` + `load_best_model_at_end=True`.
- Training wallclock: **5 min on a single H100** (~15β20 min on a 4090).
### Citation
```bibtex
@article{chen2024octopusv2,
title = {Octopus v2: On-device language model for super agent},
author = {Chen, Wei and Li, Zhiyuan},
journal = {arXiv preprint arXiv:2404.01744},
year = {2024},
url = {https://arxiv.org/abs/2404.01744}
}
@article{merc2025octopusv2,
title = {Octopus v2 named-arg function calling},
journal = {arXiv preprint arXiv:2501.02342},
year = {2025},
url = {https://arxiv.org/abs/2501.02342}
}
```
## Results
### Training metrics (final epoch)
| | |
|---|---|
| Final train loss | 0.493 |
| Final eval loss | **0.046** |
| Mean token accuracy (eval) | **97.9 %** |
### Held-out smoke test (post-train, 36 prompts spanning all 6 tools)
| | |
|---|---|
| Smoke-test routing accuracy | **35 / 36 (97.2 %)** |
The 36-prompt suite covers single-tool happy paths for every tool plus
failure modes the model is expected to deflect: ambiguous color words
without a target ("make it red"), effect names without a target
("rainbow"), unsupported features ("play a tone at 2000 hz"), and
out-of-scope appliances. Failure-mode prompts all route to `respond()`
with a helpful clarification message.
### On-device benchmark (Coralboard, 2-core Cortex-A55 @ 2 GHz, Q5_K_M GGUF)
Measured with `llama-cpp-python` 0.3.16, `n_ctx=1024`, `n_threads=2`,
CPU governor `performance`, 8 representative prompts spanning all 6
tools.
| | |
|---|---|
| Model load | 2.23 s |
| Prompt tokens | 11β16 (mean ~13) |
| **Cold prefill (turn 1)** | **0.48 s** |
| Warm prefill (turn 2+, avg) | 0.47 s |
| Decode rate | **~9.7 tok/s** |
| Decode time, typical tool call (3β8 output tokens) | 0.3β0.8 s |
| Decode time, `respond()` (~25 output tokens) | ~2.6 s |
| End-to-end first turn (model load + prefill + decode) | ~3.4 s |
## Files
```
functiongemma-physical-ai-v10-Q5_K_M.gguf # ~248 MB, Q5_K_M weights (Ollama / llama.cpp)
Modelfile # Ollama Modelfile (functional-token format)
tools.json # 6-tool schema, canonical mobile-actions format
token_map.json # functional-token <-> tool-name map
README.md # this file
```
Earlier checkpoint GGUFs from the project's development history
(`functiongemma-physical-ai-v9-Q5_K_M.gguf`,
`functiongemma-physical-ai-v7-Q5_K_M.gguf`,
`functiongemma-physical-ai-v6-Q5_K_M.gguf`,
`functiongemma-physical-ai-Q4_K_M.gguf`) remain in the repo for
reproducibility. They use different tool surfaces and (for v7 and
earlier) a different inference-prompt format; new deployments should use
the v10 file above.
## License
Released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
By using this model you agree to those terms. Base model:
[`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it).
## Links
- Base model: <https://huggingface.co/google/functiongemma-270m-it>
- Octopus v2 paper: <https://arxiv.org/abs/2404.01744>
- Mercedes-Benz Octopus v2 (named-arg variant): <https://arxiv.org/abs/2501.02342>
- Hardware demo + integration code (Synaptics Coralboard, Grinn HAT,
WLED-over-USB-CDC, full PyQt UI):
<https://github.com/synaptics-astra-demos/sl2610-examples> β
`Function_calling/`
|