Text Generation
ONNX
GGUF
English
function-calling
edge
on-device
physical-ai
iot
octopus-v2
synaptics-sl2619
gemma3
conversational
Instructions to use BrinqAI/functiongemma-270m-physical-ai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use BrinqAI/functiongemma-270m-physical-ai with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="BrinqAI/functiongemma-270m-physical-ai", filename="functiongemma-physical-ai-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use BrinqAI/functiongemma-270m-physical-ai with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Use Docker
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use BrinqAI/functiongemma-270m-physical-ai with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BrinqAI/functiongemma-270m-physical-ai" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BrinqAI/functiongemma-270m-physical-ai", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Ollama
How to use BrinqAI/functiongemma-270m-physical-ai with Ollama:
ollama run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Unsloth Studio new
How to use BrinqAI/functiongemma-270m-physical-ai with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
- Pi new
How to use BrinqAI/functiongemma-270m-physical-ai with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "BrinqAI/functiongemma-270m-physical-ai:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use BrinqAI/functiongemma-270m-physical-ai with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use BrinqAI/functiongemma-270m-physical-ai with Docker Model Runner:
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Lemonade
How to use BrinqAI/functiongemma-270m-physical-ai with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Run and chat with the model
lemonade run user.functiongemma-270m-physical-ai-Q4_K_M
List all available models
lemonade list
v9: schema-free inference, 100% smoke, sub-second cold prefill
Browse files- .gitattributes +1 -0
- README.md +148 -210
- functiongemma-physical-ai-v9-Q5_K_M.gguf +3 -0
- token_map.json +20 -27
- tools.json +33 -44
.gitattributes
CHANGED
|
@@ -42,3 +42,4 @@ functiongemma-physical-ai-v6-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
|
| 42 |
functiongemma-physical-ai-v7-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 43 |
moonshine/decoder.vmfb filter=lfs diff=lfs merge=lfs -text
|
| 44 |
moonshine/decoder_with_past.vmfb filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 42 |
functiongemma-physical-ai-v7-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 43 |
moonshine/decoder.vmfb filter=lfs diff=lfs merge=lfs -text
|
| 44 |
moonshine/decoder_with_past.vmfb filter=lfs diff=lfs merge=lfs -text
|
| 45 |
+
functiongemma-physical-ai-v9-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -17,100 +17,114 @@ pipeline_tag: text-generation
|
|
| 17 |
inference: false
|
| 18 |
---
|
| 19 |
|
| 20 |
-
# FunctionGemma 270M — Physical AI
|
| 21 |
|
| 22 |
Fine-tuned [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it)
|
| 23 |
for voice-controlled physical-AI / household-IoT actions on a Synaptics
|
| 24 |
SL2619 "Coral" edge board (Google IO 2026 demo).
|
| 25 |
|
| 26 |
-
| Revision | File | Tool count |
|
| 27 |
-
|----------|------|-----------:|-------|
|
| 28 |
-
| **
|
| 29 |
-
|
|
| 30 |
-
|
|
|
|
|
| 31 |
|
| 32 |
-
Schema ships as [`tools.json`](./tools.json) (
|
| 33 |
mapping is in [`token_map.json`](./token_map.json).
|
| 34 |
|
| 35 |
-
##
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
`<start_function_call>call:NAME{...}<end_function_call>` syntax. On a
|
| 42 |
2-core Cortex-A55 this is the difference between sub-second and 2–5 s
|
| 43 |
voice-UX latency.
|
| 44 |
|
| 45 |
-
Sample output: `<
|
| 46 |
-
|
| 47 |
-
`<tool_0>` → `turn_on_lights`, `<tool_3>` → `blink_lights`,
|
| 48 |
-
`<tool_8>` → `get_system_status`, `<tool_9>` → `respond` (v7 numbering;
|
| 49 |
-
v6 used `<tool_9>` and `<tool_10>` for those — bumped down by one when
|
| 50 |
-
`list_alarms` was removed). Full mapping in [`token_map.json`](./token_map.json).
|
| 51 |
|
| 52 |
> ⚠️ Inference servers MUST stop generation on `<end_of_turn>` (or `<eos>`),
|
| 53 |
-
> NOT on `<end>`.
|
| 54 |
-
>
|
|
|
|
| 55 |
|
| 56 |
## Quick start (Ollama)
|
| 57 |
|
| 58 |
```bash
|
| 59 |
hf download BrinqAI/functiongemma-270m-physical-ai \
|
| 60 |
-
functiongemma-physical-ai-
|
| 61 |
--local-dir ./fg-physical-ai
|
| 62 |
|
| 63 |
cd fg-physical-ai
|
| 64 |
ollama create functiongemma-physical-ai -f Modelfile
|
| 65 |
```
|
| 66 |
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
`<eos>`) and decode parameters (`temperature=0`, `num_ctx=1024`,
|
| 70 |
-
`num_predict=80`). Direct `ollama pull hf.co/...` does not apply these,
|
| 71 |
-
and the function-token output will run past `<end>` until it hits
|
| 72 |
-
`num_predict`. Only use the direct-pull path if your client injects stops
|
| 73 |
-
at request time (the Python snippet below does this via `options.stop`).
|
| 74 |
|
| 75 |
## Calling the model
|
| 76 |
|
| 77 |
-
The model expects
|
| 78 |
-
|
| 79 |
-
`tokenizer.apply_chat_template(..., tools=tools)`). Send to Ollama with
|
| 80 |
-
`raw=true` so it forwards the prompt verbatim. Plain `ollama run` from the
|
| 81 |
-
CLI does **not** pass tools and will degenerate to chat-style refusals.
|
| 82 |
-
|
| 83 |
-
Standalone client (depends on `transformers` for the chat template, plus
|
| 84 |
-
the `tools.json` and `token_map.json` files in the same directory):
|
| 85 |
|
| 86 |
```python
|
| 87 |
import json
|
| 88 |
import re
|
| 89 |
import urllib.request
|
| 90 |
-
from transformers import AutoTokenizer
|
| 91 |
|
| 92 |
OLLAMA_URL = "http://localhost:11434"
|
| 93 |
MODEL = "functiongemma-physical-ai"
|
| 94 |
|
| 95 |
-
tools = json.load(open("tools.json"))["tools"]
|
| 96 |
reverse_token_map = json.load(open("token_map.json"))["reverse"]
|
| 97 |
-
tokenizer = AutoTokenizer.from_pretrained("google/functiongemma-270m-it")
|
| 98 |
|
| 99 |
|
| 100 |
def build_prompt(user_text: str) -> str:
|
| 101 |
-
|
| 102 |
-
{
|
| 103 |
-
|
| 104 |
-
"content": (
|
| 105 |
-
"You are a model that can do function calling with the "
|
| 106 |
-
"following functions\n"
|
| 107 |
-
),
|
| 108 |
-
"tool_calls": None,
|
| 109 |
-
},
|
| 110 |
-
{"role": "user", "content": user_text, "tool_calls": None},
|
| 111 |
-
]
|
| 112 |
-
return tokenizer.apply_chat_template(
|
| 113 |
-
msgs, tools=tools, tokenize=False, add_generation_prompt=True
|
| 114 |
)
|
| 115 |
|
| 116 |
|
|
@@ -124,7 +138,7 @@ def call_model(user_text: str) -> str:
|
|
| 124 |
"temperature": 0.0,
|
| 125 |
"top_p": 1.0,
|
| 126 |
"num_predict": 80,
|
| 127 |
-
"stop": ["<
|
| 128 |
},
|
| 129 |
}).encode()
|
| 130 |
req = urllib.request.Request(
|
|
@@ -145,79 +159,55 @@ def parse_call(raw: str) -> tuple[str | None, str]:
|
|
| 145 |
return reverse_token_map.get(tok), args
|
| 146 |
|
| 147 |
|
| 148 |
-
raw = call_model("
|
| 149 |
-
print(raw)
|
| 150 |
-
print(parse_call(raw))
|
| 151 |
```
|
| 152 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 153 |
## Training data
|
| 154 |
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
-
|
| 158 |
-
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
would help in v8.
|
| 173 |
-
|
| 174 |
-
### v6 (previous)
|
| 175 |
-
|
| 176 |
-
- **Size**: 1,400 train / 150 eval (v5/v6 dataset lineage, `coral_v5_compact.jsonl`).
|
| 177 |
-
- **Tool count**: 11. Cameras / vision tools dropped from earlier
|
| 178 |
-
checkpoints; alarm-list tool kept.
|
| 179 |
-
|
| 180 |
-
### v4 (legacy)
|
| 181 |
-
|
| 182 |
-
- **Size**: 367 train / 100 eval.
|
| 183 |
-
- **Multi-tool**: 13% (vs Google mobile-actions 33.4%).
|
| 184 |
-
- **Buzzer schema**: pattern-only (binary GPIO on the reference HAT — no
|
| 185 |
-
PWM). Old `frequency_hz` / `duration_seconds` prompts are routed
|
| 186 |
-
through `respond()` as out-of-scope negatives.
|
| 187 |
|
| 188 |
## Methodology
|
| 189 |
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
| 201 |
-
- **
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
`{% generation %}` markers required for `assistant_only_loss`.
|
| 209 |
-
- **8 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup. (2,000 examples in
|
| 210 |
-
v7 — fewer epochs than v4's 15 because dataset size grew 5×.)
|
| 211 |
-
- **Tool-token loss weight 4.0** to keep the new function tokens learning
|
| 212 |
-
faster than the rest of the vocabulary (Gemma3's 262k-vocab dilutes the
|
| 213 |
-
signal otherwise).
|
| 214 |
-
- **Effective batch 16** = `per_device_train_batch_size=2 ×
|
| 215 |
-
gradient_accumulation_steps=8` (kept this way to avoid the 8 GiB
|
| 216 |
-
cross-entropy logit allocation OOM that bites Gemma3's 262k vocab).
|
| 217 |
-
- **`max_length=1024`** to fit the full 13-tool schema in the prompt.
|
| 218 |
-
- bf16, gradient checkpointing, `adamw_torch_fused`, `weight_decay=0.01`.
|
| 219 |
-
- Trained inside `unsloth/unsloth:latest` Docker container with GPU
|
| 220 |
-
passthrough.
|
| 221 |
|
| 222 |
### Citation
|
| 223 |
|
|
@@ -231,105 +221,54 @@ for a smaller dataset:
|
|
| 231 |
}
|
| 232 |
```
|
| 233 |
|
| 234 |
-
##
|
| 235 |
-
|
| 236 |
-
**v7 checkpoint (2,000 train / 250 eval), Q5_K_M GGUF, greedy decode:**
|
| 237 |
-
|
| 238 |
-
| Metric | Result |
|
| 239 |
-
|--------|--------|
|
| 240 |
-
| Overall accuracy | 217 / 250 = **86.8%** |
|
| 241 |
-
| Single-tool accuracy | 154 / 166 = **92.8%** |
|
| 242 |
-
| Multi-tool exact-match | 63 / 84 = **75.0%** |
|
| 243 |
-
| Parse failure rate | 0 / 250 = **0.0%** |
|
| 244 |
|
| 245 |
-
|
| 246 |
-
`set_neopixel_pattern` 0.92, `turn_on_lights` 0.90, `blink_lights` 0.89,
|
| 247 |
-
`turn_off_lights` 0.89, `set_led_color` 0.88, `play_buzzer` 0.83,
|
| 248 |
-
`respond` 0.74. (`respond` is the lowest because the model occasionally
|
| 249 |
-
chooses a physical-action tool with a hallucinated text argument when the
|
| 250 |
-
prompt shares keywords with one — an issue the dispatcher's enum validation
|
| 251 |
-
catches at runtime.)
|
| 252 |
|
| 253 |
-
|
| 254 |
-
|
| 255 |
-
|
|
|
|
|
|
|
| 256 |
|
| 257 |
-
##
|
| 258 |
|
| 259 |
-
|
| 260 |
-
-
|
| 261 |
-
|
| 262 |
-
Grinn Coralboard with the v7 GGUF + the `Function_calling/` demo from
|
| 263 |
-
[BrinqAI/sl2610-examples](https://github.com/BrinqAI/sl2610-examples/tree/coralboard/functiongemma/Function_calling).
|
| 264 |
|
| 265 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 266 |
|
| 267 |
-
|
| 268 |
-
Synaptics Torq), the model is also published as ONNX with KV-cache support
|
| 269 |
-
(`text-generation-with-past`). Both exports are derived from the same
|
| 270 |
-
`coral-functiongemma-v4c-compact` checkpoint as the GGUF above.
|
| 271 |
|
| 272 |
-
|
|
| 273 |
-
|---
|
| 274 |
-
|
|
| 275 |
-
|
|
| 276 |
-
|
| 277 |
-
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
|
| 281 |
-
```python
|
| 282 |
-
from transformers import AutoTokenizer
|
| 283 |
-
import onnxruntime as ort
|
| 284 |
-
import numpy as np, json
|
| 285 |
-
|
| 286 |
-
MODEL = "onnx/compact-fp32" # or downloaded local path
|
| 287 |
-
tok = AutoTokenizer.from_pretrained(MODEL)
|
| 288 |
-
sess = ort.InferenceSession(f"{MODEL}/model.onnx", providers=["CPUExecutionProvider"])
|
| 289 |
-
|
| 290 |
-
tools = json.load(open("tools.json"))["tools"]
|
| 291 |
-
prompt = tok.apply_chat_template(
|
| 292 |
-
[{"role": "developer",
|
| 293 |
-
"content": "You are a model that can do function calling with the following functions\n",
|
| 294 |
-
"tool_calls": None},
|
| 295 |
-
{"role": "user", "content": "Turn on the lights", "tool_calls": None}],
|
| 296 |
-
tools=tools, tokenize=False, add_generation_prompt=True,
|
| 297 |
-
)
|
| 298 |
-
# Then feed input_ids + empty past_key_values.* (shape (1, num_kv_heads, 0, head_dim))
|
| 299 |
-
# greedy-decode in a loop, stop on <end>. See repo for full snippet.
|
| 300 |
-
```
|
| 301 |
-
|
| 302 |
-
Smoke decode of "Turn on the lights" against the fp32 ONNX returns
|
| 303 |
-
`<tool_0>()<end>` (= `turn_on_lights()`), matching the GGUF output.
|
| 304 |
-
|
| 305 |
-
### fp16 + ONNX Runtime caveat
|
| 306 |
-
|
| 307 |
-
The fp16 ONNX file is structurally valid but **does not currently load in
|
| 308 |
-
ONNX Runtime ≥ 1.20** for this model: ORT's `SimplifiedLayerNormFusion` pass
|
| 309 |
-
chokes on the `InsertedPrecisionFreeCast_*` nodes that the fp16 conversion
|
| 310 |
-
inserts around Gemma3's RMSNorm layers. The error is graph-optimizer-internal
|
| 311 |
-
and reproduces with `ORT_DISABLE_ALL`. This is an ORT bug, not an ONNX-spec
|
| 312 |
-
issue — the file passes `onnx.checker` and the graph is well-formed.
|
| 313 |
-
|
| 314 |
-
For compiler frontends that consume ONNX directly (IREE / MLIR, TensorRT,
|
| 315 |
-
OpenVINO, Synaptics Torq), the fp16 file should ingest fine. For runtime
|
| 316 |
-
inference via `onnxruntime` itself, use the fp32 export and let your compiler
|
| 317 |
-
or runtime do its own dtype conversion / quantization downstream.
|
| 318 |
|
| 319 |
## Files
|
| 320 |
|
| 321 |
```
|
| 322 |
-
functiongemma-physical-ai-
|
| 323 |
-
|
| 324 |
-
|
| 325 |
-
|
| 326 |
-
tools.json # 10-tool schema (mobile-actions format, current)
|
| 327 |
-
token_map.json # function-token <-> tool-name map
|
| 328 |
-
onnx/compact-fp32/ # ONNX export, fp32, with KV cache (1.7 GB)
|
| 329 |
-
onnx/compact-fp16/ # ONNX export, fp16, with KV cache (833 MB) — see ORT caveat above
|
| 330 |
README.md # this file
|
| 331 |
```
|
| 332 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 333 |
## License
|
| 334 |
|
| 335 |
Released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
|
|
@@ -340,8 +279,7 @@ By using this model you agree to those terms. Base model:
|
|
| 340 |
|
| 341 |
- Base model: <https://huggingface.co/google/functiongemma-270m-it>
|
| 342 |
- Octopus v2 paper: <https://arxiv.org/abs/2404.01744>
|
| 343 |
-
- Hardware demo
|
| 344 |
-
WLED-over-USB-CDC,
|
| 345 |
-
<https://github.com/
|
| 346 |
-
|
| 347 |
-
[synaptics-astra-demos/sl2610-examples](https://github.com/synaptics-astra-demos/sl2610-examples)).
|
|
|
|
| 17 |
inference: false
|
| 18 |
---
|
| 19 |
|
| 20 |
+
# FunctionGemma 270M — Physical AI (v9, Octopus v2)
|
| 21 |
|
| 22 |
Fine-tuned [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it)
|
| 23 |
for voice-controlled physical-AI / household-IoT actions on a Synaptics
|
| 24 |
SL2619 "Coral" edge board (Google IO 2026 demo).
|
| 25 |
|
| 26 |
+
| Revision | File | Tool count | Headline result |
|
| 27 |
+
|----------|------|-----------:|-----------------|
|
| 28 |
+
| **v9 (current)** | [`functiongemma-physical-ai-v9-Q5_K_M.gguf`](./functiongemma-physical-ai-v9-Q5_K_M.gguf) | 8 | 30/30 (100 %) routing on held-out smoke prompts; **0.55 s cold prefill** on the 2-core A55 (vs ~57 s for v7's schema-in-prompt build — **105× faster**). |
|
| 29 |
+
| v7 (legacy) | [`functiongemma-physical-ai-v7-Q5_K_M.gguf`](./functiongemma-physical-ai-v7-Q5_K_M.gguf) | 10 | 86.8 % overall on a 250-row eval; schema-in-prompt build. |
|
| 30 |
+
| v6 (legacy) | [`functiongemma-physical-ai-v6-Q5_K_M.gguf`](./functiongemma-physical-ai-v6-Q5_K_M.gguf) | 11 | Camera + vision dropped from earlier revs. Schema-in-prompt build. |
|
| 31 |
+
| v4c (legacy) | [`functiongemma-physical-ai-Q4_K_M.gguf`](./functiongemma-physical-ai-Q4_K_M.gguf) | 13 | Earliest published checkpoint. |
|
| 32 |
|
| 33 |
+
Schema ships as [`tools.json`](./tools.json) (8 tools, v9). Token-to-tool
|
| 34 |
mapping is in [`token_map.json`](./token_map.json).
|
| 35 |
|
| 36 |
+
## What changed in v9
|
| 37 |
+
|
| 38 |
+
v9 is a structural rewrite of the training pipeline, not just a dataset
|
| 39 |
+
refresh. Earlier revisions used the upstream FunctionGemma prompt format,
|
| 40 |
+
which injects the full tool schema (~1088 tokens) into every request as a
|
| 41 |
+
`<start_function_declaration>` developer turn. On a 2-core Cortex-A55 that
|
| 42 |
+
prefill cost ~57 s on the first turn — incompatible with a sub-second
|
| 43 |
+
voice UX.
|
| 44 |
+
|
| 45 |
+
v9 follows [Octopus v2](https://arxiv.org/abs/2404.01744) end to end:
|
| 46 |
+
|
| 47 |
+
| | Pre-v9 (schema-in-prompt) | v9 (Octopus v2) |
|
| 48 |
+
|---|---|---|
|
| 49 |
+
| Prompt format | `<start_of_turn>developer\n<start_function_declaration>...{schema}...<end_function_declaration>\n<end_of_turn>\n<start_of_turn>user\n{q}<end_of_turn>\n<start_of_turn>model\n` | `<start_of_turn>user\n{q}<end_of_turn>\n<start_of_turn>model\n` |
|
| 50 |
+
| Tokens per prompt | ~1088 | ~13 |
|
| 51 |
+
| Cold prefill on SL2619 (2-core A55) | ~57 s | **~0.55 s** |
|
| 52 |
+
| Tool routing | learned from in-context schema | learned from `<tool_N>` token weights |
|
| 53 |
+
| Training data shape | `{tools, messages: [dev, user, asst]}` with schema embedded | `{input, output, split}` — flat |
|
| 54 |
+
|
| 55 |
+
The schema in `tools.json` is still the source of truth for dispatcher
|
| 56 |
+
arg validation and is embedded in the GGUF metadata for schema-drift
|
| 57 |
+
checks, but it is **not** loaded into the inference prompt.
|
| 58 |
+
|
| 59 |
+
## Tool surface (v9, 8 tools)
|
| 60 |
+
|
| 61 |
+
| Token | Name | Args | Purpose |
|
| 62 |
+
|---|---|---|---|
|
| 63 |
+
| `<tool_0>` | `set_status_led` | `led`, `state`, `brightness?` | On/off one or all HAT status LEDs |
|
| 64 |
+
| `<tool_1>` | `blink_status_led` | `led`, `count?`, `speed?` | Discrete blink |
|
| 65 |
+
| `<tool_2>` | `set_neopixel_effect` | `effect`, `color?`, `palette?`, `speed?`, `intensity?` | Animated effect on the ring |
|
| 66 |
+
| `<tool_3>` | `play_buzzer` | `pattern` | `beep`, `double_beep`, `chirp`, `siren`, `alarm`, `success`, `error` |
|
| 67 |
+
| `<tool_4>` | `set_alarm` | `duration` or `time`, `label?` | Timer |
|
| 68 |
+
| `<tool_5>` | `cancel_alarm` | `label?` | Cancel one or all |
|
| 69 |
+
| `<tool_6>` | `get_system_status` | `metric` | `cpu`, `memory`, `temperature`, `npu`, or `all` |
|
| 70 |
+
| `<tool_7>` | `respond` | `message` | Natural-language fallback when no tool fits |
|
| 71 |
+
|
| 72 |
+
Surface routing keyword: `set_neopixel_effect` requires the literal
|
| 73 |
+
substring `neopixels` in the user input. LED-vs-ring ambiguous prompts
|
| 74 |
+
("turn off the lights") route to `respond()` asking the user to
|
| 75 |
+
disambiguate.
|
| 76 |
+
|
| 77 |
+
## Output format — functional tokens
|
| 78 |
+
|
| 79 |
+
Tool calls emit as **functional tokens**: each tool name compiles to a
|
| 80 |
+
single special-vocabulary token (`<tool_0>` … `<tool_7>`) plus a single
|
| 81 |
+
`<end>` terminator. A complete call decodes in 8–15 output tokens, vs
|
| 82 |
+
~30–80 for the upstream native FunctionGemma
|
| 83 |
`<start_function_call>call:NAME{...}<end_function_call>` syntax. On a
|
| 84 |
2-core Cortex-A55 this is the difference between sub-second and 2–5 s
|
| 85 |
voice-UX latency.
|
| 86 |
|
| 87 |
+
Sample output: `<tool_0>("red","on")<end>` for `set_status_led(led="red", state="on")`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
| 89 |
> ⚠️ Inference servers MUST stop generation on `<end_of_turn>` (or `<eos>`),
|
| 90 |
+
> NOT on `<end>`. The v9 model can emit multi-tool sequences
|
| 91 |
+
> `<tool_A>(args)<end><tool_B>(args)<end>`, so stopping at the first
|
| 92 |
+
> `<end>` truncates legitimate multi-tool output.
|
| 93 |
|
| 94 |
## Quick start (Ollama)
|
| 95 |
|
| 96 |
```bash
|
| 97 |
hf download BrinqAI/functiongemma-270m-physical-ai \
|
| 98 |
+
functiongemma-physical-ai-v9-Q5_K_M.gguf Modelfile tools.json token_map.json \
|
| 99 |
--local-dir ./fg-physical-ai
|
| 100 |
|
| 101 |
cd fg-physical-ai
|
| 102 |
ollama create functiongemma-physical-ai -f Modelfile
|
| 103 |
```
|
| 104 |
|
| 105 |
+
The shipped `Modelfile` bakes in the stop tokens (`<end_of_turn>`, `<eos>`)
|
| 106 |
+
and decode parameters (`temperature=0`, `num_ctx=1024`, `num_predict=80`).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
|
| 108 |
## Calling the model
|
| 109 |
|
| 110 |
+
The v9 model expects a **bare user turn** — no schema, no tools list. Send
|
| 111 |
+
to Ollama with `raw=true`:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
|
| 113 |
```python
|
| 114 |
import json
|
| 115 |
import re
|
| 116 |
import urllib.request
|
|
|
|
| 117 |
|
| 118 |
OLLAMA_URL = "http://localhost:11434"
|
| 119 |
MODEL = "functiongemma-physical-ai"
|
| 120 |
|
|
|
|
| 121 |
reverse_token_map = json.load(open("token_map.json"))["reverse"]
|
|
|
|
| 122 |
|
| 123 |
|
| 124 |
def build_prompt(user_text: str) -> str:
|
| 125 |
+
return (
|
| 126 |
+
f"<start_of_turn>user\n{user_text}<end_of_turn>\n"
|
| 127 |
+
f"<start_of_turn>model\n"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
)
|
| 129 |
|
| 130 |
|
|
|
|
| 138 |
"temperature": 0.0,
|
| 139 |
"top_p": 1.0,
|
| 140 |
"num_predict": 80,
|
| 141 |
+
"stop": ["<end_of_turn>", "<eos>"],
|
| 142 |
},
|
| 143 |
}).encode()
|
| 144 |
req = urllib.request.Request(
|
|
|
|
| 159 |
return reverse_token_map.get(tok), args
|
| 160 |
|
| 161 |
|
| 162 |
+
raw = call_model("turn the red LED on")
|
| 163 |
+
print(raw) # e.g. '<tool_0>("red","on")<end>'
|
| 164 |
+
print(parse_call(raw)) # ('set_status_led', '"red","on"')
|
| 165 |
```
|
| 166 |
|
| 167 |
+
For `llama-cpp-python` directly, use `detokenize(..., special=True)` so
|
| 168 |
+
the `<tool_N>` and `<end>` tokens render in the output instead of being
|
| 169 |
+
stripped.
|
| 170 |
+
|
| 171 |
## Training data
|
| 172 |
|
| 173 |
+
v9's training data was generated from Haiku-authored phrasing templates
|
| 174 |
+
crossed with deterministic entity pools, then lightly augmented with
|
| 175 |
+
Moonshine-flavored ASR noise (dropped function words, lowercased traces,
|
| 176 |
+
filler-word prepends). The shape matches Brinq's SmartPanel v15 trainer:
|
| 177 |
+
flat `{input, output, split}` records, no tools / messages array.
|
| 178 |
+
|
| 179 |
+
| | v9 |
|
| 180 |
+
|---|---|
|
| 181 |
+
| Train rows | 6,127 |
|
| 182 |
+
| Eval rows | 1,339 |
|
| 183 |
+
| Tools | 8 |
|
| 184 |
+
| Multi-tool fraction | low — single-tool emphasis; multi-tool routines composed at dispatch time |
|
| 185 |
+
| Augmentation | Moonshine-sim noise on ~30 % of records |
|
| 186 |
+
|
| 187 |
+
Per-tool train counts (range 217–1,199; cancel_alarm + play_buzzer are the
|
| 188 |
+
narrowest classes because their natural phrasing variation is smaller —
|
| 189 |
+
not a coverage gap).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 190 |
|
| 191 |
## Methodology
|
| 192 |
|
| 193 |
+
Direct port of the SmartPanel v15 trainer:
|
| 194 |
+
|
| 195 |
+
- **Full bf16 fine-tune** (no LoRA).
|
| 196 |
+
- **Functional tokens**: `<tool_0>` … `<tool_7>` + `<end>` added as
|
| 197 |
+
`additional_special_tokens`; new embeddings **mean-initialized** from the
|
| 198 |
+
existing input embedding matrix (random init under-converges on small
|
| 199 |
+
datasets).
|
| 200 |
+
- **Completion-only loss mask**: hand-rolled — labels before
|
| 201 |
+
`<start_of_turn>model\n` are masked to `-100`. The model learns only from
|
| 202 |
+
the assistant turn, not the user prompt.
|
| 203 |
+
- **5 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup, weight decay 0.01.
|
| 204 |
+
- **Effective batch = 16** (`per_device_train_batch_size=8 ×
|
| 205 |
+
gradient_accumulation_steps=2`).
|
| 206 |
+
- **`max_length=256`** — the trained prompt format is ~13 tokens and the
|
| 207 |
+
assistant turn fits comfortably under 64 tokens, including respond()
|
| 208 |
+
messages.
|
| 209 |
+
- bf16, gradient checkpointing, `adamw_torch_fused`, `metric_for_best_model="eval_loss"` + `load_best_model_at_end=True`.
|
| 210 |
+
- Training wallclock: **5 min on a single H100** (~15–20 min on a 4090).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 211 |
|
| 212 |
### Citation
|
| 213 |
|
|
|
|
| 221 |
}
|
| 222 |
```
|
| 223 |
|
| 224 |
+
## Results
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 225 |
|
| 226 |
+
### Training metrics (final epoch)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 227 |
|
| 228 |
+
| | v9 |
|
| 229 |
+
|---|---|
|
| 230 |
+
| Final train loss | 0.555 |
|
| 231 |
+
| Final eval loss | **0.090** |
|
| 232 |
+
| Mean token accuracy (eval) | **97.9 %** |
|
| 233 |
|
| 234 |
+
### Held-out smoke test (post-train, 30 prompts spanning all 8 tools)
|
| 235 |
|
| 236 |
+
| | v9 |
|
| 237 |
+
|---|---|
|
| 238 |
+
| Smoke-test routing accuracy | **30 / 30 (100 %)** |
|
|
|
|
|
|
|
| 239 |
|
| 240 |
+
The 30-prompt suite covers single-tool happy paths for every tool plus
|
| 241 |
+
the failure modes that broke v8: ambiguous LED prompts ("turn off the
|
| 242 |
+
lights"), effect-name without `neopixels` keyword ("do the aurora"),
|
| 243 |
+
unsupported features ("play a tone at 2000 hz"), and out-of-scope
|
| 244 |
+
appliances ("turn on the TV"). All 8 of those route to `respond()` with a
|
| 245 |
+
helpful explanation.
|
| 246 |
|
| 247 |
+
### On-device benchmark (Coralboard, 2-core Cortex-A55 @ 2 GHz, Q5_K_M GGUF)
|
|
|
|
|
|
|
|
|
|
| 248 |
|
| 249 |
+
| | v7 (schema-in-prompt, 10 tools) | v9 (Octopus v2, 8 tools) |
|
| 250 |
+
|---|---|---|
|
| 251 |
+
| Prompt tokens | ~1088 | ~13 |
|
| 252 |
+
| **Cold prefill (turn 1)** | **57.3 s** | **0.55 s** (105× faster) |
|
| 253 |
+
| Warm prefill (turn 2+) | ~3 s | ~0.4 s |
|
| 254 |
+
| Decode for a typical call | 0.5–1.2 s | 0.5–1.2 s |
|
| 255 |
+
| End-to-end first-turn (model load 2.3 s + prefill + decode) | ~62 s | ~3 s |
|
| 256 |
+
| Routing on a 29-prompt board bench | n/a directly comparable | **29 / 29 (100 %)** |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 257 |
|
| 258 |
## Files
|
| 259 |
|
| 260 |
```
|
| 261 |
+
functiongemma-physical-ai-v9-Q5_K_M.gguf # ~248 MB, v9 GGUF Q5_K_M weights (Ollama / llama.cpp)
|
| 262 |
+
Modelfile # Ollama Modelfile (functional-token format)
|
| 263 |
+
tools.json # 8-tool schema (v9, canonical mobile-actions format)
|
| 264 |
+
token_map.json # functional-token <-> tool-name map (v9)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 265 |
README.md # this file
|
| 266 |
```
|
| 267 |
|
| 268 |
+
Legacy v6/v7 GGUFs are kept in repo history for reproducibility but should
|
| 269 |
+
not be used for new deployments — they require the schema-in-prompt
|
| 270 |
+
inference wrapper and pay the ~57 s cold-prefill cost.
|
| 271 |
+
|
| 272 |
## License
|
| 273 |
|
| 274 |
Released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
|
|
|
|
| 279 |
|
| 280 |
- Base model: <https://huggingface.co/google/functiongemma-270m-it>
|
| 281 |
- Octopus v2 paper: <https://arxiv.org/abs/2404.01744>
|
| 282 |
+
- Hardware demo + integration code (Synaptics Coralboard, Grinn HAT,
|
| 283 |
+
WLED-over-USB-CDC, full PyQt UI):
|
| 284 |
+
<https://github.com/synaptics-astra-demos/sl2610-examples> →
|
| 285 |
+
`Function_calling/`
|
|
|
functiongemma-physical-ai-v9-Q5_K_M.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:efa8d6f922ba40e0064b8efa3c330f36f843e933b62086c60964fbcafea22852
|
| 3 |
+
size 260047008
|
token_map.json
CHANGED
|
@@ -1,30 +1,25 @@
|
|
| 1 |
{
|
| 2 |
-
"version": "0.
|
| 3 |
-
"description": "Compressed token map for FunctionGemma CPU inference on SL2619.
|
| 4 |
"tokens": {
|
| 5 |
-
"
|
| 6 |
-
"
|
| 7 |
-
"
|
| 8 |
-
"
|
| 9 |
-
"
|
| 10 |
-
"
|
| 11 |
-
"
|
| 12 |
-
"
|
| 13 |
-
"get_system_status": "<tool_8>",
|
| 14 |
-
"respond": "<tool_9>"
|
| 15 |
},
|
| 16 |
"reverse": {
|
| 17 |
-
"<tool_0>": "
|
| 18 |
-
"<tool_1>": "
|
| 19 |
-
"<tool_2>": "
|
| 20 |
-
"<tool_3>": "
|
| 21 |
-
"<tool_4>": "
|
| 22 |
-
"<tool_5>": "
|
| 23 |
-
"<tool_6>": "
|
| 24 |
-
"<tool_7>": "
|
| 25 |
-
"<tool_8>": "get_system_status",
|
| 26 |
-
"<tool_9>": "respond",
|
| 27 |
-
"<tool_none>": null
|
| 28 |
},
|
| 29 |
"special_tokens": [
|
| 30 |
"<tool_0>",
|
|
@@ -35,11 +30,9 @@
|
|
| 35 |
"<tool_5>",
|
| 36 |
"<tool_6>",
|
| 37 |
"<tool_7>",
|
| 38 |
-
"<tool_8>",
|
| 39 |
-
"<tool_9>",
|
| 40 |
-
"<tool_none>",
|
| 41 |
"<end>"
|
| 42 |
],
|
| 43 |
"output_format": "<tool_N>(\"arg1\",\"arg2\",...)<end>",
|
| 44 |
-
"
|
|
|
|
| 45 |
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"version": "0.4.0",
|
| 3 |
+
"description": "Compressed token map for FunctionGemma CPU inference on SL2619. v9: 8 tools (set_status_led, blink_status_led, set_neopixel_effect, play_buzzer, set_alarm, cancel_alarm, get_system_status, respond) + <end> terminator. Trained Octopus v2 style — functional tokens are the entire output vocabulary the model uses for routing; the tool schema (tools.json) is NOT loaded into the inference prompt. v9 drops the unused <tool_none> sentinel that v8's training pipeline reserved but never emitted.",
|
| 4 |
"tokens": {
|
| 5 |
+
"set_status_led": "<tool_0>",
|
| 6 |
+
"blink_status_led": "<tool_1>",
|
| 7 |
+
"set_neopixel_effect": "<tool_2>",
|
| 8 |
+
"play_buzzer": "<tool_3>",
|
| 9 |
+
"set_alarm": "<tool_4>",
|
| 10 |
+
"cancel_alarm": "<tool_5>",
|
| 11 |
+
"get_system_status": "<tool_6>",
|
| 12 |
+
"respond": "<tool_7>"
|
|
|
|
|
|
|
| 13 |
},
|
| 14 |
"reverse": {
|
| 15 |
+
"<tool_0>": "set_status_led",
|
| 16 |
+
"<tool_1>": "blink_status_led",
|
| 17 |
+
"<tool_2>": "set_neopixel_effect",
|
| 18 |
+
"<tool_3>": "play_buzzer",
|
| 19 |
+
"<tool_4>": "set_alarm",
|
| 20 |
+
"<tool_5>": "cancel_alarm",
|
| 21 |
+
"<tool_6>": "get_system_status",
|
| 22 |
+
"<tool_7>": "respond"
|
|
|
|
|
|
|
|
|
|
| 23 |
},
|
| 24 |
"special_tokens": [
|
| 25 |
"<tool_0>",
|
|
|
|
| 30 |
"<tool_5>",
|
| 31 |
"<tool_6>",
|
| 32 |
"<tool_7>",
|
|
|
|
|
|
|
|
|
|
| 33 |
"<end>"
|
| 34 |
],
|
| 35 |
"output_format": "<tool_N>(\"arg1\",\"arg2\",...)<end>",
|
| 36 |
+
"prompt_format": "<start_of_turn>user\\n{user_text}<end_of_turn>\\n<start_of_turn>model\\n",
|
| 37 |
+
"notes": "Argument order positional per canonical schema's required-first then optional declaration order. v9 trains Octopus v2 pure (no schema in prompt) — see prompt_format. set_neopixel_effect routing keyword: 'neopixels' (literal substring, case-insensitive) required in user prompt; otherwise the model routes to respond() asking the user to disambiguate (HAT status LEDs vs. neopixel ring)."
|
| 38 |
}
|
tools.json
CHANGED
|
@@ -1,95 +1,84 @@
|
|
| 1 |
{
|
| 2 |
-
"version": "0.
|
| 3 |
-
"description": "Physical AI tool schema for Coral Dev Board (SL2619) FunctionGemma demo. Canonical mobile-actions format.
|
| 4 |
"tools": [
|
| 5 |
{
|
| 6 |
"function": {
|
| 7 |
-
"name": "
|
| 8 |
-
"description": "Turn
|
| 9 |
-
"parameters": {
|
| 10 |
-
"type": "OBJECT",
|
| 11 |
-
"properties": {}
|
| 12 |
-
}
|
| 13 |
-
}
|
| 14 |
-
},
|
| 15 |
-
{
|
| 16 |
-
"function": {
|
| 17 |
-
"name": "turn_off_lights",
|
| 18 |
-
"description": "Turn off all RGB LEDs and the Neopixel strip.",
|
| 19 |
-
"parameters": {
|
| 20 |
-
"type": "OBJECT",
|
| 21 |
-
"properties": {}
|
| 22 |
-
}
|
| 23 |
-
}
|
| 24 |
-
},
|
| 25 |
-
{
|
| 26 |
-
"function": {
|
| 27 |
-
"name": "set_led_color",
|
| 28 |
-
"description": "Set the color of RGB LEDs on the HAT or Neopixel strip.",
|
| 29 |
"parameters": {
|
| 30 |
"type": "OBJECT",
|
| 31 |
"properties": {
|
| 32 |
-
"
|
| 33 |
"type": "STRING",
|
| 34 |
-
"description": "
|
| 35 |
},
|
| 36 |
-
"
|
| 37 |
"type": "STRING",
|
| 38 |
-
"description": "
|
| 39 |
},
|
| 40 |
"brightness": {
|
| 41 |
"type": "INTEGER",
|
| 42 |
-
"description": "Brightness 0-100. Optional, default 100."
|
| 43 |
}
|
| 44 |
},
|
| 45 |
-
"required": ["
|
| 46 |
}
|
| 47 |
}
|
| 48 |
},
|
| 49 |
{
|
| 50 |
"function": {
|
| 51 |
-
"name": "
|
| 52 |
-
"description": "Blink LEDs a given number of times
|
| 53 |
"parameters": {
|
| 54 |
"type": "OBJECT",
|
| 55 |
"properties": {
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
"count": {
|
| 57 |
"type": "INTEGER",
|
| 58 |
"description": "Number of blinks. Default 3."
|
| 59 |
},
|
| 60 |
-
"color": {
|
| 61 |
-
"type": "STRING",
|
| 62 |
-
"description": "Color to blink. Default current color or white."
|
| 63 |
-
},
|
| 64 |
"speed": {
|
| 65 |
"type": "STRING",
|
| 66 |
"description": "One of 'slow', 'normal', 'fast'. Default 'normal'."
|
| 67 |
}
|
| 68 |
-
}
|
|
|
|
| 69 |
}
|
| 70 |
}
|
| 71 |
},
|
| 72 |
{
|
| 73 |
"function": {
|
| 74 |
-
"name": "
|
| 75 |
-
"description": "Play
|
| 76 |
"parameters": {
|
| 77 |
"type": "OBJECT",
|
| 78 |
"properties": {
|
| 79 |
-
"
|
| 80 |
"type": "STRING",
|
| 81 |
-
"description": "One of '
|
| 82 |
},
|
| 83 |
"color": {
|
| 84 |
"type": "STRING",
|
| 85 |
-
"description": "Color
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
},
|
| 87 |
"speed": {
|
| 88 |
"type": "STRING",
|
| 89 |
"description": "One of 'slow', 'normal', 'fast'. Default 'normal'."
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
}
|
| 91 |
},
|
| 92 |
-
"required": ["
|
| 93 |
}
|
| 94 |
}
|
| 95 |
},
|
|
@@ -165,7 +154,7 @@
|
|
| 165 |
{
|
| 166 |
"function": {
|
| 167 |
"name": "respond",
|
| 168 |
-
"description": "Reply to the user in natural language without taking any physical action. Use this when the request is out of scope (no matching tool), ambiguous (needs clarification), purely conversational (greetings, thanks), or impossible on this device. Do NOT use this when any physical-action tool fits the request.",
|
| 169 |
"parameters": {
|
| 170 |
"type": "OBJECT",
|
| 171 |
"properties": {
|
|
|
|
| 1 |
{
|
| 2 |
+
"version": "0.4.0",
|
| 3 |
+
"description": "Physical AI tool schema for Coral Dev Board (SL2619) FunctionGemma demo. Canonical mobile-actions format. v9: same 8-tool surface as v8, but the model is trained Octopus v2 style — functional tokens (<tool_0>..<tool_7>) emitted directly from a minimal user-only prompt with NO tool schema in the context. This shrinks the cold prefill from ~1088 prompt tokens (v7/v8 with the FunctionGemma developer turn) to ~13, taking on-board cold first-turn from ~57s to ~0.5s on a 2-core A55. The schema in this file is purely a developer/runtime contract (dispatcher arg validation, GGUF metadata, documentation) — it is NOT injected into the inference prompt. Surface routing keyword: 'neopixels' (literal) for the ring; 'LED' / 'light' / 'red light' / 'green light' / 'blue light' for the HAT status LEDs.",
|
| 4 |
"tools": [
|
| 5 |
{
|
| 6 |
"function": {
|
| 7 |
+
"name": "set_status_led",
|
| 8 |
+
"description": "Turn one or all of the HAT status LEDs on or off. The HAT has three individual LEDs (red, green, blue), each independently addressable. Invoke when the user mentions 'LED', 'LEDs', or 'the <color> light'.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
"parameters": {
|
| 10 |
"type": "OBJECT",
|
| 11 |
"properties": {
|
| 12 |
+
"led": {
|
| 13 |
"type": "STRING",
|
| 14 |
+
"description": "Which LED: 'red', 'green', 'blue', or 'all'."
|
| 15 |
},
|
| 16 |
+
"state": {
|
| 17 |
"type": "STRING",
|
| 18 |
+
"description": "'on' or 'off'."
|
| 19 |
},
|
| 20 |
"brightness": {
|
| 21 |
"type": "INTEGER",
|
| 22 |
+
"description": "Brightness 0-100. Optional, default 100. Ignored when state is 'off'."
|
| 23 |
}
|
| 24 |
},
|
| 25 |
+
"required": ["led", "state"]
|
| 26 |
}
|
| 27 |
}
|
| 28 |
},
|
| 29 |
{
|
| 30 |
"function": {
|
| 31 |
+
"name": "blink_status_led",
|
| 32 |
+
"description": "Blink one or all HAT status LEDs a given number of times. Invoke for 'blink/flash the <color> light' or 'blink the LEDs' style requests.",
|
| 33 |
"parameters": {
|
| 34 |
"type": "OBJECT",
|
| 35 |
"properties": {
|
| 36 |
+
"led": {
|
| 37 |
+
"type": "STRING",
|
| 38 |
+
"description": "Which LED: 'red', 'green', 'blue', or 'all'."
|
| 39 |
+
},
|
| 40 |
"count": {
|
| 41 |
"type": "INTEGER",
|
| 42 |
"description": "Number of blinks. Default 3."
|
| 43 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
"speed": {
|
| 45 |
"type": "STRING",
|
| 46 |
"description": "One of 'slow', 'normal', 'fast'. Default 'normal'."
|
| 47 |
}
|
| 48 |
+
},
|
| 49 |
+
"required": ["led"]
|
| 50 |
}
|
| 51 |
}
|
| 52 |
},
|
| 53 |
{
|
| 54 |
"function": {
|
| 55 |
+
"name": "set_neopixel_effect",
|
| 56 |
+
"description": "Play a visual effect on the neopixel ring (48-pixel WS2812B ring around the 7\" display, driven by WLED). Only invoke when the user explicitly mentions 'neopixels'. Use effect='off' to turn the ring off.",
|
| 57 |
"parameters": {
|
| 58 |
"type": "OBJECT",
|
| 59 |
"properties": {
|
| 60 |
+
"effect": {
|
| 61 |
"type": "STRING",
|
| 62 |
+
"description": "One of: 'solid', 'pulse', 'fade', 'chase', 'rainbow', 'sparkle', 'off', 'aurora', 'plasma', 'comet', 'twinkle', 'fireworks', 'police', 'heartbeat', 'loading', 'lightning', 'glitter', 'fire', 'sunrise'."
|
| 63 |
},
|
| 64 |
"color": {
|
| 65 |
"type": "STRING",
|
| 66 |
+
"description": "Color name (e.g. 'red', 'teal', 'amber', 'gold', 'violet') or 6-digit hex like '#FF8800'. Used by effects that take a primary color (solid, pulse, fade, chase, sparkle, comet). Ignored for rainbow and palette-driven effects."
|
| 67 |
+
},
|
| 68 |
+
"palette": {
|
| 69 |
+
"type": "STRING",
|
| 70 |
+
"description": "Color palette: 'auto', 'ocean', 'lava', 'forest', 'sunset', 'party', 'sherbet', 'c9', 'aurora', 'beach', 'fire', 'sakura', 'splash', 'pastel'. Most useful with aurora, plasma, fire, twinkle, comet."
|
| 71 |
},
|
| 72 |
"speed": {
|
| 73 |
"type": "STRING",
|
| 74 |
"description": "One of 'slow', 'normal', 'fast'. Default 'normal'."
|
| 75 |
+
},
|
| 76 |
+
"intensity": {
|
| 77 |
+
"type": "STRING",
|
| 78 |
+
"description": "One of 'low', 'medium', 'high'. Default 'medium'. Controls effect density / depth (sparkle density, fire height, comet tail length, aurora width)."
|
| 79 |
}
|
| 80 |
},
|
| 81 |
+
"required": ["effect"]
|
| 82 |
}
|
| 83 |
}
|
| 84 |
},
|
|
|
|
| 154 |
{
|
| 155 |
"function": {
|
| 156 |
"name": "respond",
|
| 157 |
+
"description": "Reply to the user in natural language without taking any physical action. Use this when the request is out of scope (no matching tool), ambiguous (needs clarification — e.g. surface keyword missing for LED requests), purely conversational (greetings, thanks), or impossible on this device. Do NOT use this when any physical-action tool fits the request.",
|
| 158 |
"parameters": {
|
| 159 |
"type": "OBJECT",
|
| 160 |
"properties": {
|