Text Generation
ONNX
GGUF
English
function-calling
edge
on-device
physical-ai
iot
octopus-v2
synaptics-sl2619
gemma3
conversational
Instructions to use BrinqAI/functiongemma-270m-physical-ai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use BrinqAI/functiongemma-270m-physical-ai with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="BrinqAI/functiongemma-270m-physical-ai", filename="functiongemma-physical-ai-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use BrinqAI/functiongemma-270m-physical-ai with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Use Docker
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use BrinqAI/functiongemma-270m-physical-ai with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BrinqAI/functiongemma-270m-physical-ai" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BrinqAI/functiongemma-270m-physical-ai", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Ollama
How to use BrinqAI/functiongemma-270m-physical-ai with Ollama:
ollama run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Unsloth Studio new
How to use BrinqAI/functiongemma-270m-physical-ai with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
- Pi new
How to use BrinqAI/functiongemma-270m-physical-ai with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "BrinqAI/functiongemma-270m-physical-ai:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use BrinqAI/functiongemma-270m-physical-ai with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use BrinqAI/functiongemma-270m-physical-ai with Docker Model Runner:
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Lemonade
How to use BrinqAI/functiongemma-270m-physical-ai with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Run and chat with the model
lemonade run user.functiongemma-270m-physical-ai-Q4_K_M
List all available models
lemonade list
Update model card for v7 (10 tools, list_alarms removed)
Browse files
README.md
CHANGED
|
@@ -25,16 +25,17 @@ SL2619 "Coral" edge board (Google IO 2026 demo).
|
|
| 25 |
|
| 26 |
| Revision | File | Tool count | Notes |
|
| 27 |
|----------|------|-----------:|-------|
|
| 28 |
-
| **
|
|
|
|
| 29 |
| v4c (legacy) | [`functiongemma-physical-ai-Q4_K_M.gguf`](./functiongemma-physical-ai-Q4_K_M.gguf) | 13 | Earlier checkpoint, includes camera/scene tools. |
|
| 30 |
|
| 31 |
-
Schema ships as [`tools.json`](./tools.json) (
|
| 32 |
mapping is in [`token_map.json`](./token_map.json).
|
| 33 |
|
| 34 |
## Output format β function tokens
|
| 35 |
|
| 36 |
Tool calls emit as **function tokens**: each tool name compiles to a single
|
| 37 |
-
special-vocabulary token (`<tool_0>` β¦ `<
|
| 38 |
`<end>` terminator. A complete call decodes in roughly 8β15 output tokens,
|
| 39 |
vs ~30β80 for native FunctionGemma's
|
| 40 |
`<start_function_call>call:NAME{...}<end_function_call>` syntax. On a
|
|
@@ -44,8 +45,9 @@ voice-UX latency.
|
|
| 44 |
Sample output: `<tool_3>(3,"red")<end>` for `blink_lights(count=3, color="red")`.
|
| 45 |
|
| 46 |
`<tool_0>` β `turn_on_lights`, `<tool_3>` β `blink_lights`,
|
| 47 |
-
`<
|
| 48 |
-
|
|
|
|
| 49 |
|
| 50 |
> β οΈ Inference servers MUST stop generation on `<end_of_turn>` (or `<eos>`),
|
| 51 |
> NOT on `<end>`. Multi-tool sequences emit `<tool_A>(args)<end><tool_B>(args)<end>`,
|
|
@@ -150,23 +152,32 @@ print(parse_call(raw)) # ('turn_on_lights', '')
|
|
| 150 |
|
| 151 |
## Training data
|
| 152 |
|
| 153 |
-
###
|
| 154 |
-
|
| 155 |
-
- **Size**:
|
| 156 |
-
- **
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
- **
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 170 |
|
| 171 |
- **Size**: 367 train / 100 eval.
|
| 172 |
- **Multi-tool**: 13% (vs Google mobile-actions 33.4%).
|
|
@@ -188,16 +199,18 @@ The training recipe is a direct port of Brinq's SmartPanel v14 trainer
|
|
| 188 |
for a smaller dataset:
|
| 189 |
|
| 190 |
- **Full bf16 fine-tune (no LoRA)**.
|
| 191 |
-
- **Mean-init** for new `<tool_0>..<
|
| 192 |
(init = mean of existing input embeddings; random init under-converges
|
| 193 |
for tiny models on small datasets).
|
| 194 |
- **Completion-only loss mask**: hand-rolled, masking everything before
|
| 195 |
`<start_of_turn>model\n`. TRL 0.25's `completion_only_loss=True` is a
|
| 196 |
no-op on flat-text data and FunctionGemma's chat template lacks
|
| 197 |
`{% generation %}` markers required for `assistant_only_loss`.
|
| 198 |
-
- **
|
| 199 |
-
|
| 200 |
-
|
|
|
|
|
|
|
| 201 |
- **Effective batch 16** = `per_device_train_batch_size=2 Γ
|
| 202 |
gradient_accumulation_steps=8` (kept this way to avoid the 8 GiB
|
| 203 |
cross-entropy logit allocation OOM that bites Gemma3's 262k vocab).
|
|
@@ -218,29 +231,36 @@ for a smaller dataset:
|
|
| 218 |
}
|
| 219 |
```
|
| 220 |
|
| 221 |
-
##
|
| 222 |
|
| 223 |
-
**
|
| 224 |
|
| 225 |
-
|
|
| 226 |
-
|----------------
|
| 227 |
-
|
|
|
|
|
|
|
|
|
|
|
| 228 |
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
|
|
|
|
|
|
|
|
|
|
| 233 |
|
| 234 |
-
**
|
| 235 |
-
|
|
|
|
| 236 |
|
| 237 |
## Latency
|
| 238 |
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
|
| 245 |
## ONNX exports (for compiler toolchains)
|
| 246 |
|
|
@@ -299,13 +319,15 @@ or runtime do its own dtype conversion / quantization downstream.
|
|
| 299 |
## Files
|
| 300 |
|
| 301 |
```
|
| 302 |
-
functiongemma-physical-ai-
|
| 303 |
-
|
| 304 |
-
|
| 305 |
-
|
| 306 |
-
|
| 307 |
-
|
| 308 |
-
|
|
|
|
|
|
|
| 309 |
```
|
| 310 |
|
| 311 |
## License
|
|
@@ -319,5 +341,7 @@ By using this model you agree to those terms. Base model:
|
|
| 319 |
- Base model: <https://huggingface.co/google/functiongemma-270m-it>
|
| 320 |
- Octopus v2 paper: <https://arxiv.org/abs/2404.01744>
|
| 321 |
- Hardware demo (Coralboard, Google IO 2026 β full physical setup,
|
| 322 |
-
WLED-over-USB-CDC, Grinn HAT,
|
| 323 |
-
<https://github.com/BrinqAI/
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
| Revision | File | Tool count | Notes |
|
| 27 |
|----------|------|-----------:|-------|
|
| 28 |
+
| **v7 (current)** | [`functiongemma-physical-ai-v7-Q5_K_M.gguf`](./functiongemma-physical-ai-v7-Q5_K_M.gguf) | 10 | `list_alarms` removed; alarm-query prompts route via `respond()`. 250-row eval: **86.8%** overall, **92.8%** single-tool, **75.0%** multi-tool exact-match, **0.0%** parse failure. |
|
| 29 |
+
| v6 (previous) | [`functiongemma-physical-ai-v6-Q5_K_M.gguf`](./functiongemma-physical-ai-v6-Q5_K_M.gguf) | 11 | Camera + vision dropped. Single-tool routing 95.5%, multi-tool exact-match 23.9%. |
|
| 30 |
| v4c (legacy) | [`functiongemma-physical-ai-Q4_K_M.gguf`](./functiongemma-physical-ai-Q4_K_M.gguf) | 13 | Earlier checkpoint, includes camera/scene tools. |
|
| 31 |
|
| 32 |
+
Schema ships as [`tools.json`](./tools.json) (10 tools, current). Token-to-tool
|
| 33 |
mapping is in [`token_map.json`](./token_map.json).
|
| 34 |
|
| 35 |
## Output format β function tokens
|
| 36 |
|
| 37 |
Tool calls emit as **function tokens**: each tool name compiles to a single
|
| 38 |
+
special-vocabulary token (`<tool_0>` β¦ `<tool_9>` for v7) and a single
|
| 39 |
`<end>` terminator. A complete call decodes in roughly 8β15 output tokens,
|
| 40 |
vs ~30β80 for native FunctionGemma's
|
| 41 |
`<start_function_call>call:NAME{...}<end_function_call>` syntax. On a
|
|
|
|
| 45 |
Sample output: `<tool_3>(3,"red")<end>` for `blink_lights(count=3, color="red")`.
|
| 46 |
|
| 47 |
`<tool_0>` β `turn_on_lights`, `<tool_3>` β `blink_lights`,
|
| 48 |
+
`<tool_8>` β `get_system_status`, `<tool_9>` β `respond` (v7 numbering;
|
| 49 |
+
v6 used `<tool_9>` and `<tool_10>` for those β bumped down by one when
|
| 50 |
+
`list_alarms` was removed). Full mapping in [`token_map.json`](./token_map.json).
|
| 51 |
|
| 52 |
> β οΈ Inference servers MUST stop generation on `<end_of_turn>` (or `<eos>`),
|
| 53 |
> NOT on `<end>`. Multi-tool sequences emit `<tool_A>(args)<end><tool_B>(args)<end>`,
|
|
|
|
| 152 |
|
| 153 |
## Training data
|
| 154 |
|
| 155 |
+
### v7 (current)
|
| 156 |
+
|
| 157 |
+
- **Size**: 2,000 train / 250 eval (`coral_v7_compact.jsonl`).
|
| 158 |
+
- **Schema change**: `list_alarms` removed. Out-of-scope alarm-query prompts
|
| 159 |
+
("what alarms do I have?") are deliberately routed through `respond()`
|
| 160 |
+
rather than answered by a query tool. Compact token map shifted accordingly:
|
| 161 |
+
`get_system_status` is now `<tool_8>` (was `<tool_9>`), `respond` is
|
| 162 |
+
`<tool_9>` (was `<tool_10>`).
|
| 163 |
+
- **Multi-tool**: 84 of 250 eval rows (33.6%) are multi-tool sequences,
|
| 164 |
+
matching the Google mobile-actions distribution.
|
| 165 |
+
- **GGUF eval (Q5_K_M, greedy)**: overall **86.8%** (217/250), single-tool
|
| 166 |
+
**92.8%** (154/166), multi-tool exact-match **75.0%** (63/84), parse
|
| 167 |
+
failure **0.0%** (0/250). Per-tool F1 ranges from 0.74 (`respond`) to
|
| 168 |
+
1.00 (`cancel_alarm`).
|
| 169 |
+
- **Known weak spots** (informal on-device REPL): "tell me a joke" / "what
|
| 170 |
+
alarms do I have" tend to misroute to `play_buzzer` instead of `respond` β
|
| 171 |
+
more `respond()` negatives sharing keywords with physical-action tools
|
| 172 |
+
would help in v8.
|
| 173 |
+
|
| 174 |
+
### v6 (previous)
|
| 175 |
+
|
| 176 |
+
- **Size**: 1,400 train / 150 eval (v5/v6 dataset lineage, `coral_v5_compact.jsonl`).
|
| 177 |
+
- **Tool count**: 11. Cameras / vision tools dropped from earlier
|
| 178 |
+
checkpoints; alarm-list tool kept.
|
| 179 |
+
|
| 180 |
+
### v4 (legacy)
|
| 181 |
|
| 182 |
- **Size**: 367 train / 100 eval.
|
| 183 |
- **Multi-tool**: 13% (vs Google mobile-actions 33.4%).
|
|
|
|
| 199 |
for a smaller dataset:
|
| 200 |
|
| 201 |
- **Full bf16 fine-tune (no LoRA)**.
|
| 202 |
+
- **Mean-init** for new `<tool_0>..<tool_9>` and `<end>` special tokens
|
| 203 |
(init = mean of existing input embeddings; random init under-converges
|
| 204 |
for tiny models on small datasets).
|
| 205 |
- **Completion-only loss mask**: hand-rolled, masking everything before
|
| 206 |
`<start_of_turn>model\n`. TRL 0.25's `completion_only_loss=True` is a
|
| 207 |
no-op on flat-text data and FunctionGemma's chat template lacks
|
| 208 |
`{% generation %}` markers required for `assistant_only_loss`.
|
| 209 |
+
- **8 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup. (2,000 examples in
|
| 210 |
+
v7 β fewer epochs than v4's 15 because dataset size grew 5Γ.)
|
| 211 |
+
- **Tool-token loss weight 4.0** to keep the new function tokens learning
|
| 212 |
+
faster than the rest of the vocabulary (Gemma3's 262k-vocab dilutes the
|
| 213 |
+
signal otherwise).
|
| 214 |
- **Effective batch 16** = `per_device_train_batch_size=2 Γ
|
| 215 |
gradient_accumulation_steps=8` (kept this way to avoid the 8 GiB
|
| 216 |
cross-entropy logit allocation OOM that bites Gemma3's 262k vocab).
|
|
|
|
| 231 |
}
|
| 232 |
```
|
| 233 |
|
| 234 |
+
## Eval results
|
| 235 |
|
| 236 |
+
**v7 checkpoint (2,000 train / 250 eval), Q5_K_M GGUF, greedy decode:**
|
| 237 |
|
| 238 |
+
| Metric | Result |
|
| 239 |
+
|--------|--------|
|
| 240 |
+
| Overall accuracy | 217 / 250 = **86.8%** |
|
| 241 |
+
| Single-tool accuracy | 154 / 166 = **92.8%** |
|
| 242 |
+
| Multi-tool exact-match | 63 / 84 = **75.0%** |
|
| 243 |
+
| Parse failure rate | 0 / 250 = **0.0%** |
|
| 244 |
|
| 245 |
+
Per-tool F1: `cancel_alarm` 1.00, `get_system_status` 0.96, `set_alarm` 0.93,
|
| 246 |
+
`set_neopixel_pattern` 0.92, `turn_on_lights` 0.90, `blink_lights` 0.89,
|
| 247 |
+
`turn_off_lights` 0.89, `set_led_color` 0.88, `play_buzzer` 0.83,
|
| 248 |
+
`respond` 0.74. (`respond` is the lowest because the model occasionally
|
| 249 |
+
chooses a physical-action tool with a hallucinated text argument when the
|
| 250 |
+
prompt shares keywords with one β an issue the dispatcher's enum validation
|
| 251 |
+
catches at runtime.)
|
| 252 |
|
| 253 |
+
**On-device latency** (SL2619 / 2Γ Cortex-A55 @ 2 GHz, `performance` governor):
|
| 254 |
+
~42 s cold prefill (one-time), ~1.6 s / turn warm β measured across a 33-prompt
|
| 255 |
+
exhaustive REPL run on the actual Coralboard.
|
| 256 |
|
| 257 |
## Latency
|
| 258 |
|
| 259 |
+
- **~1.1 β 1.3 s** per call on a laptop CPU (Ollama / standalone client above).
|
| 260 |
+
- **~1.6 s / turn warm**, ~42 s cold prefill on SL2619 (2Γ Cortex-A55 @ 2 GHz)
|
| 261 |
+
with the CPU governor pinned to `performance`. Measured 2026-05-05 on the
|
| 262 |
+
Grinn Coralboard with the v7 GGUF + the `Function_calling/` demo from
|
| 263 |
+
[BrinqAI/sl2610-examples](https://github.com/BrinqAI/sl2610-examples/tree/coralboard/functiongemma/Function_calling).
|
| 264 |
|
| 265 |
## ONNX exports (for compiler toolchains)
|
| 266 |
|
|
|
|
| 319 |
## Files
|
| 320 |
|
| 321 |
```
|
| 322 |
+
functiongemma-physical-ai-v7-Q5_K_M.gguf # 248 MB, GGUF Q5_K_M, 10-tool v7 schema (current)
|
| 323 |
+
functiongemma-physical-ai-v6-Q5_K_M.gguf # 248 MB, GGUF Q5_K_M, 11-tool v6 schema (previous)
|
| 324 |
+
functiongemma-physical-ai-Q4_K_M.gguf # 253 MB, GGUF Q4_K_M, v4c (legacy)
|
| 325 |
+
Modelfile # Ollama Modelfile (function-token format)
|
| 326 |
+
tools.json # 10-tool schema (mobile-actions format, current)
|
| 327 |
+
token_map.json # function-token <-> tool-name map
|
| 328 |
+
onnx/compact-fp32/ # ONNX export, fp32, with KV cache (1.7 GB)
|
| 329 |
+
onnx/compact-fp16/ # ONNX export, fp16, with KV cache (833 MB) β see ORT caveat above
|
| 330 |
+
README.md # this file
|
| 331 |
```
|
| 332 |
|
| 333 |
## License
|
|
|
|
| 341 |
- Base model: <https://huggingface.co/google/functiongemma-270m-it>
|
| 342 |
- Octopus v2 paper: <https://arxiv.org/abs/2404.01744>
|
| 343 |
- Hardware demo (Coralboard, Google IO 2026 β full physical setup,
|
| 344 |
+
WLED-over-USB-CDC, Grinn HAT, end-to-end voice + text REPL):
|
| 345 |
+
<https://github.com/BrinqAI/sl2610-examples/tree/coralboard/functiongemma/Function_calling>
|
| 346 |
+
(BrinqAI fork of the upstream Synaptics demo repo,
|
| 347 |
+
[synaptics-astra-demos/sl2610-examples](https://github.com/synaptics-astra-demos/sl2610-examples)).
|