Text Generation
ONNX
GGUF
English
function-calling
edge
on-device
physical-ai
iot
octopus-v2
synaptics-sl2619
gemma3
conversational
Instructions to use BrinqAI/functiongemma-270m-physical-ai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use BrinqAI/functiongemma-270m-physical-ai with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="BrinqAI/functiongemma-270m-physical-ai", filename="functiongemma-physical-ai-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use BrinqAI/functiongemma-270m-physical-ai with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Use Docker
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use BrinqAI/functiongemma-270m-physical-ai with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BrinqAI/functiongemma-270m-physical-ai" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BrinqAI/functiongemma-270m-physical-ai", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Ollama
How to use BrinqAI/functiongemma-270m-physical-ai with Ollama:
ollama run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Unsloth Studio new
How to use BrinqAI/functiongemma-270m-physical-ai with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
- Pi new
How to use BrinqAI/functiongemma-270m-physical-ai with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "BrinqAI/functiongemma-270m-physical-ai:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use BrinqAI/functiongemma-270m-physical-ai with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use BrinqAI/functiongemma-270m-physical-ai with Docker Model Runner:
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Lemonade
How to use BrinqAI/functiongemma-270m-physical-ai with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Run and chat with the model
lemonade run user.functiongemma-270m-physical-ai-Q4_K_M
List all available models
lemonade list
| license: gemma | |
| license_link: https://ai.google.dev/gemma/terms | |
| base_model: google/functiongemma-270m-it | |
| language: | |
| - en | |
| tags: | |
| - function-calling | |
| - edge | |
| - on-device | |
| - physical-ai | |
| - iot | |
| - octopus-v2 | |
| - synaptics-sl2619 | |
| - gemma3 | |
| pipeline_tag: text-generation | |
| inference: false | |
| # FunctionGemma 270M β Physical AI (v10, Octopus v2) | |
| Fine-tuned [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it) | |
| for voice-controlled physical-AI / household-IoT actions on a Synaptics | |
| SL2619 "Coral" edge board (Google IO 2026 demo). | |
| **Current revision:** [`functiongemma-physical-ai-v10-Q5_K_M.gguf`](./functiongemma-physical-ai-v10-Q5_K_M.gguf) | |
| β 6 tools, ~248 MB Q5_K_M, ~0.48 s cold prefill on the 2-core | |
| Cortex-A55, 97.9 % mean token accuracy on eval. | |
| Schema ships as [`tools.json`](./tools.json). Token-to-tool mapping is | |
| in [`token_map.json`](./token_map.json). | |
| ## Tool surface (6 tools) | |
| | Token | Name | Args | Purpose | | |
| |---|---|---|---| | |
| | `<tool_0>` | `set_lights` | `color?`, `effect?`, `state?` | Drive whatever lights are connected β HAT 3-LED indicators or a WLED-driven addressable strip / ring. All three args optional; the model emits only what the user implied. | | |
| | `<tool_1>` | `play_buzzer` | `pattern` | Named pattern on the piezo buzzer: `beep`, `double_beep`, `chirp`, `siren`, `alarm`, `success`, `error`. | | |
| | `<tool_2>` | `set_alarm` | `duration` or `time`, `label?` | Schedule an alarm. Fires the buzzer plus a visible flash. | | |
| | `<tool_3>` | `cancel_alarm` | `label?` | Cancel one alarm by label, or all if no label given. | | |
| | `<tool_4>` | `get_system_status` | `metric` | `cpu`, `memory`, `temperature`, `npu`, or `all`. | | |
| | `<tool_5>` | `respond` | `message` | Natural-language reply when no physical-action tool fits, or when the request is ambiguous and the model needs to ask for clarification. | | |
| The model is **hardware-agnostic** for lighting: it parses user intent | |
| into semantic args (`color`, `effect`, `state`) and leaves the dispatcher | |
| to map those onto whatever LED hardware is detected at launch β the | |
| HAT's three indicator LEDs, a WLED-driven strip, or a Neopixel ring. The | |
| user vocabulary is hardware-agnostic too: "lights", "LEDs", "strip", | |
| "indicators" all refer to whatever is wired up. | |
| ## Prompt format | |
| The v10 model is trained | |
| [Octopus v2](https://arxiv.org/abs/2404.01744) style: no schema, no | |
| tools list, just a bare user turn. | |
| ``` | |
| <start_of_turn>user | |
| {user_text}<end_of_turn> | |
| <start_of_turn>model | |
| ``` | |
| Tool semantics live in the model weights (via the special functional | |
| tokens `<tool_0>` β¦ `<tool_5>` plus `<end>`), not in the prompt. The | |
| `tools.json` schema in this repo is the dispatcher's arg-validation | |
| contract and is embedded in the GGUF metadata for schema-drift checks, | |
| but it is **not** loaded into the inference prompt. Typical prompts are | |
| ~13 tokens. | |
| ## Output format β functional tokens, named args | |
| Tool calls emit as **functional tokens with named arguments**, per the | |
| Mercedes-Benz Octopus v2 convention | |
| ([arXiv 2501.02342](https://arxiv.org/abs/2501.02342)). Each tool name | |
| compiles to a single special-vocabulary token (`<tool_0>` β¦ `<tool_5>`); | |
| arguments are written as `name="value"` pairs; a single `<end>` token | |
| terminates the call. The model emits **only the args the user implied** | |
| β absent args are simply not present. | |
| Examples: | |
| | User says | Model emits | Resolves to | | |
| |---|---|---| | |
| | `turn the lights red` | `<tool_0>(color="red")<end>` | `set_lights(color="red")` | | |
| | `rainbow on the strip` | `<tool_0>(effect="rainbow")<end>` | `set_lights(effect="rainbow")` | | |
| | `lights off` | `<tool_0>(state="off")<end>` | `set_lights(state="off")` | | |
| | `red sparkle` | `<tool_0>(color="red", effect="sparkle")<end>` | `set_lights(color="red", effect="sparkle")` | | |
| | `set an alarm in 5 minutes` | `<tool_2>(duration="5 minutes")<end>` | `set_alarm(duration="5 minutes")` | | |
| | `cancel all alarms` | `<tool_3>()<end>` | `cancel_alarm()` | | |
| | `what's the cpu` | `<tool_4>(metric="cpu")<end>` | `get_system_status(metric="cpu")` | | |
| | `good morning` | `<tool_5>(message="Good morning. ...")<end>` | `respond(message="...")` | | |
| A complete call decodes in roughly 8β20 output tokens, well inside the | |
| sub-second voice-UX budget on a 2-core Cortex-A55. | |
| > β οΈ Inference servers MUST stop generation on `<end_of_turn>` (or | |
| > `<eos>`), NOT on `<end>`. The model can emit multi-tool sequences | |
| > `<tool_A>(args)<end><tool_B>(args)<end>`, so stopping at the first | |
| > `<end>` truncates legitimate multi-tool output. | |
| ## Quick start (Ollama) | |
| ```bash | |
| hf download BrinqAI/functiongemma-270m-physical-ai \ | |
| functiongemma-physical-ai-v10-Q5_K_M.gguf Modelfile tools.json token_map.json \ | |
| --local-dir ./fg-physical-ai | |
| cd fg-physical-ai | |
| ollama create functiongemma-physical-ai -f Modelfile | |
| ``` | |
| The shipped `Modelfile` bakes in the stop tokens (`<end_of_turn>`, | |
| `<eos>`) and decode parameters (`temperature=0`, `num_ctx=1024`, | |
| `num_predict=80`). | |
| ## Calling the model | |
| Send a **bare user turn** β no schema, no tools list. With Ollama, use | |
| `raw=true`: | |
| ```python | |
| import json | |
| import re | |
| import urllib.request | |
| OLLAMA_URL = "http://localhost:11434" | |
| MODEL = "functiongemma-physical-ai" | |
| reverse_token_map = json.load(open("token_map.json"))["reverse"] | |
| NAMED_ARG_RE = re.compile(r'(\w+)\s*=\s*"((?:[^"\\]|\\.)*)"') | |
| def build_prompt(user_text: str) -> str: | |
| return ( | |
| f"<start_of_turn>user\n{user_text}<end_of_turn>\n" | |
| f"<start_of_turn>model\n" | |
| ) | |
| def call_model(user_text: str) -> str: | |
| body = json.dumps({ | |
| "model": MODEL, | |
| "prompt": build_prompt(user_text), | |
| "raw": True, | |
| "stream": False, | |
| "options": { | |
| "temperature": 0.0, | |
| "top_p": 1.0, | |
| "num_predict": 80, | |
| "stop": ["<end_of_turn>", "<eos>"], | |
| }, | |
| }).encode() | |
| req = urllib.request.Request( | |
| f"{OLLAMA_URL}/api/generate", | |
| data=body, | |
| headers={"Content-Type": "application/json"}, | |
| ) | |
| with urllib.request.urlopen(req, timeout=60) as resp: | |
| return json.loads(resp.read())["response"] | |
| def parse_call(raw: str) -> tuple[str | None, dict[str, str]]: | |
| """Return (tool_name, kwargs). tool_name is None on parse fail.""" | |
| m = re.match(r"\s*(<tool_\d+>)\((.*?)\)<end>", raw) | |
| if not m: | |
| return None, {} | |
| tok, body = m.group(1), m.group(2) | |
| kwargs = {k: v for k, v in NAMED_ARG_RE.findall(body)} | |
| return reverse_token_map.get(tok), kwargs | |
| raw = call_model("turn the lights red") | |
| print(raw) # e.g. '<tool_0>(color="red")<end>' | |
| print(parse_call(raw)) # ('set_lights', {'color': 'red'}) | |
| ``` | |
| For `llama-cpp-python` directly, use `detokenize(..., special=True)` so | |
| the `<tool_N>` and `<end>` tokens render in the output instead of being | |
| stripped. | |
| ## Training data | |
| Training data was generated from Haiku-authored phrasing templates | |
| crossed with deterministic entity pools, then lightly augmented with | |
| Moonshine-flavored ASR noise (dropped function words, lowercased traces, | |
| filler-word prepends). Each record is a flat `{input, output}` pair β | |
| no tools / messages array, no chat template. | |
| | | | | |
| |---|---| | |
| | Train rows | 5,222 | | |
| | Eval rows | 920 | | |
| | Tools | 6 | | |
| | Per-template entity expansion | color Γ effect Γ state pools for `set_lights`; pattern pool for `play_buzzer`; duration / time pools for `set_alarm`; metric pool for `get_system_status` | | |
| | ASR-style augmentation | Moonshine-sim noise on a fraction of records (dropped articles, lowercased traces, filler prepends) | | |
| | Multi-tool fraction | None β single-tool emphasis; multi-tool routines composed at dispatch time | | |
| The `set_lights` tool also gets explicit **failure-mode rows** that | |
| route bare ambiguous prompts to `respond()` β e.g. "rainbow" alone | |
| ("Did you mean the lights? Try 'rainbow on the lights'."), "siren" alone | |
| (prompts the user toward `play_buzzer`), and bare "on" / "off" | |
| (asks what the user wants to act on). | |
| ## Methodology | |
| - **Full bf16 fine-tune** (no LoRA). | |
| - **Functional tokens**: `<tool_0>` β¦ `<tool_5>` + `<end>` added as | |
| `additional_special_tokens`; new embeddings **mean-initialized** from | |
| the existing input-embedding matrix (random init under-converges on | |
| small datasets at this scale). | |
| - **Completion-only loss mask**: hand-rolled β labels before | |
| `<start_of_turn>model\n` are masked to `-100`. The model learns only | |
| from the assistant turn, not the user prompt. | |
| - **5 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup, weight decay | |
| 0.01. | |
| - **Effective batch = 16** | |
| (`per_device_train_batch_size=8 Γ gradient_accumulation_steps=2`). | |
| - **`max_length=256`** β the trained prompt format is ~13 tokens and | |
| the assistant turn fits comfortably under 64 tokens, including | |
| `respond()` messages. | |
| - bf16, gradient checkpointing, `adamw_torch_fused`, | |
| `metric_for_best_model="eval_loss"` + `load_best_model_at_end=True`. | |
| - Training wallclock: **5 min on a single H100** (~15β20 min on a 4090). | |
| ### Citation | |
| ```bibtex | |
| @article{chen2024octopusv2, | |
| title = {Octopus v2: On-device language model for super agent}, | |
| author = {Chen, Wei and Li, Zhiyuan}, | |
| journal = {arXiv preprint arXiv:2404.01744}, | |
| year = {2024}, | |
| url = {https://arxiv.org/abs/2404.01744} | |
| } | |
| @article{merc2025octopusv2, | |
| title = {Octopus v2 named-arg function calling}, | |
| journal = {arXiv preprint arXiv:2501.02342}, | |
| year = {2025}, | |
| url = {https://arxiv.org/abs/2501.02342} | |
| } | |
| ``` | |
| ## Results | |
| ### Training metrics (final epoch) | |
| | | | | |
| |---|---| | |
| | Final train loss | 0.493 | | |
| | Final eval loss | **0.046** | | |
| | Mean token accuracy (eval) | **97.9 %** | | |
| ### Held-out smoke test (post-train, 36 prompts spanning all 6 tools) | |
| | | | | |
| |---|---| | |
| | Smoke-test routing accuracy | **35 / 36 (97.2 %)** | | |
| The 36-prompt suite covers single-tool happy paths for every tool plus | |
| failure modes the model is expected to deflect: ambiguous color words | |
| without a target ("make it red"), effect names without a target | |
| ("rainbow"), unsupported features ("play a tone at 2000 hz"), and | |
| out-of-scope appliances. Failure-mode prompts all route to `respond()` | |
| with a helpful clarification message. | |
| ### On-device benchmark (Coralboard, 2-core Cortex-A55 @ 2 GHz, Q5_K_M GGUF) | |
| Measured with `llama-cpp-python` 0.3.16, `n_ctx=1024`, `n_threads=2`, | |
| CPU governor `performance`, 8 representative prompts spanning all 6 | |
| tools. | |
| | | | | |
| |---|---| | |
| | Model load | 2.23 s | | |
| | Prompt tokens | 11β16 (mean ~13) | | |
| | **Cold prefill (turn 1)** | **0.48 s** | | |
| | Warm prefill (turn 2+, avg) | 0.47 s | | |
| | Decode rate | **~9.7 tok/s** | | |
| | Decode time, typical tool call (3β8 output tokens) | 0.3β0.8 s | | |
| | Decode time, `respond()` (~25 output tokens) | ~2.6 s | | |
| | End-to-end first turn (model load + prefill + decode) | ~3.4 s | | |
| ## Files | |
| ``` | |
| functiongemma-physical-ai-v10-Q5_K_M.gguf # ~248 MB, Q5_K_M weights (Ollama / llama.cpp) | |
| Modelfile # Ollama Modelfile (functional-token format) | |
| tools.json # 6-tool schema, canonical mobile-actions format | |
| token_map.json # functional-token <-> tool-name map | |
| README.md # this file | |
| ``` | |
| Earlier checkpoint GGUFs from the project's development history | |
| (`functiongemma-physical-ai-v9-Q5_K_M.gguf`, | |
| `functiongemma-physical-ai-v7-Q5_K_M.gguf`, | |
| `functiongemma-physical-ai-v6-Q5_K_M.gguf`, | |
| `functiongemma-physical-ai-Q4_K_M.gguf`) remain in the repo for | |
| reproducibility. They use different tool surfaces and (for v7 and | |
| earlier) a different inference-prompt format; new deployments should use | |
| the v10 file above. | |
| ## License | |
| Released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms). | |
| By using this model you agree to those terms. Base model: | |
| [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it). | |
| ## Links | |
| - Base model: <https://huggingface.co/google/functiongemma-270m-it> | |
| - Octopus v2 paper: <https://arxiv.org/abs/2404.01744> | |
| - Mercedes-Benz Octopus v2 (named-arg variant): <https://arxiv.org/abs/2501.02342> | |
| - Hardware demo + integration code (Synaptics Coralboard, Grinn HAT, | |
| WLED-over-USB-CDC, full PyQt UI): | |
| <https://github.com/synaptics-astra-demos/sl2610-examples> β | |
| `Function_calling/` | |