Instructions to use BrinqAI/functiongemma-270m-physical-ai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use BrinqAI/functiongemma-270m-physical-ai with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="BrinqAI/functiongemma-270m-physical-ai",
	filename="functiongemma-physical-ai-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use BrinqAI/functiongemma-270m-physical-ai with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Use Docker

docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

LM Studio
Jan

vLLM

How to use BrinqAI/functiongemma-270m-physical-ai with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "BrinqAI/functiongemma-270m-physical-ai"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BrinqAI/functiongemma-270m-physical-ai",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Ollama
How to use BrinqAI/functiongemma-270m-physical-ai with Ollama:
```
ollama run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
```

Unsloth Studio new

How to use BrinqAI/functiongemma-270m-physical-ai with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BrinqAI/functiongemma-270m-physical-ai to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BrinqAI/functiongemma-270m-physical-ai to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for BrinqAI/functiongemma-270m-physical-ai to start chatting

Pi new

How to use BrinqAI/functiongemma-270m-physical-ai with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "BrinqAI/functiongemma-270m-physical-ai:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use BrinqAI/functiongemma-270m-physical-ai with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use BrinqAI/functiongemma-270m-physical-ai with Docker Model Runner:
```
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
```

Lemonade

How to use BrinqAI/functiongemma-270m-physical-ai with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Run and chat with the model

lemonade run user.functiongemma-270m-physical-ai-Q4_K_M

List all available models

lemonade list

hmahadik commited on 16 days ago

Commit

35645d8

verified ·

1 Parent(s): 408c7e9

Update model card for v7 (10 tools, list_alarms removed)

Browse files

Files changed (1) hide show

README.md +75 -51

README.md CHANGED Viewed

@@ -25,16 +25,17 @@ SL2619 "Coral" edge board (Google IO 2026 demo).
 | Revision | File | Tool count | Notes |
 |----------|------|-----------:|-------|
-| **v6 (current)** | [`functiongemma-physical-ai-v6-Q5_K_M.gguf`](./functiongemma-physical-ai-v6-Q5_K_M.gguf) | 11 | Camera + vision dropped. Single-tool routing **95.5%**, multi-tool exact-match 23.9%. |
 | v4c (legacy) | [`functiongemma-physical-ai-Q4_K_M.gguf`](./functiongemma-physical-ai-Q4_K_M.gguf) | 13 | Earlier checkpoint, includes camera/scene tools. |
-Schema ships as [`tools.json`](./tools.json) (11 tools, current). Token-to-tool
 mapping is in [`token_map.json`](./token_map.json).
 ## Output format — function tokens
 Tool calls emit as **function tokens**: each tool name compiles to a single
-special-vocabulary token (`<tool_0>` … `<tool_10>` for v6) and a single
 `<end>` terminator. A complete call decodes in roughly 8–15 output tokens,
 vs ~30–80 for native FunctionGemma's
 `<start_function_call>call:NAME{...}<end_function_call>` syntax. On a
@@ -44,8 +45,9 @@ voice-UX latency.
 Sample output: `<tool_3>(3,"red")<end>` for `blink_lights(count=3, color="red")`.
 `<tool_0>` → `turn_on_lights`, `<tool_3>` → `blink_lights`,
-`<tool_9>` → `get_system_status`, `<tool_10>` → `respond` (v6 numbering).
-Full mapping in [`token_map.json`](./token_map.json).
 > ⚠️ Inference servers MUST stop generation on `<end_of_turn>` (or `<eos>`),
 > NOT on `<end>`. Multi-tool sequences emit `<tool_A>(args)<end><tool_B>(args)<end>`,
@@ -150,23 +152,32 @@ print(parse_call(raw))      # ('turn_on_lights', '')
 ## Training data
-### v5 (current — use this for training)
-- **Size**: 1,400 train / 150 eval (v5 dataset, `coral_v5_compact.jsonl`).
-- **Multi-tool**: 292 multi-tool examples in train (20.9%), 50 in eval (33.3%). Google
-  mobile-actions target is 33.4%; train is capped by pool size — the ~450 Haiku-generated
-  multi-tool examples deduplicated to 343 unique. Future: spawn more agents.
-- **Generation**: base hand-written examples + `paraphrases_cache.json` (generated by parallel
-  Claude Haiku agents). 971 new single-tool + 450 new multi-tool paraphrases before dedup.
-- **Coverage fixes**: explicit brightness form ("set led red brightness 50") — 46 examples.
-  Bare alarm form ("set alarm 5 minutes", no preposition) — 36 examples. Both were zero in v4
-  and caused the two known smoke-test failures.
-- **Non-determinism fix**: `set_led_color_examples()` previously used unseeded `random.sample`;
-  now iterates all 18 templates × 12 colors deterministically (216 examples vs ~60).
-- **Eval harness**: `scripts/eval_harness.py` — greedy decode against eval JSONL, per-tool F1,
-  arg-match rate, multi-tool sequence accuracy. Run on GPU host post-training.
-### v4 (previous)
 - **Size**: 367 train / 100 eval.
 - **Multi-tool**: 13% (vs Google mobile-actions 33.4%).
@@ -188,16 +199,18 @@ The training recipe is a direct port of Brinq's SmartPanel v14 trainer
 for a smaller dataset:
 - **Full bf16 fine-tune (no LoRA)**.
-- **Mean-init** for new `<tool_0>..<tool_12>` and `<end>` special tokens
   (init = mean of existing input embeddings; random init under-converges
   for tiny models on small datasets).
 - **Completion-only loss mask**: hand-rolled, masking everything before
   `<start_of_turn>model\n`. TRL 0.25's `completion_only_loss=True` is a
   no-op on flat-text data and FunctionGemma's chat template lacks
   `{% generation %}` markers required for `assistant_only_loss`.
-- **15 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup. (367 examples here
-  vs SmartPanel v14's ~21k — the higher epoch count compensates for the
-  smaller dataset.)
 - **Effective batch 16** = `per_device_train_batch_size=2 ×
   gradient_accumulation_steps=8` (kept this way to avoid the 8 GiB
   cross-entropy logit allocation OOM that bites Gemma3's 262k vocab).
@@ -218,29 +231,36 @@ for a smaller dataset:
 }
 ```
-## Smoke-test results
-**v4 checkpoint (367-example training):**
-| Smoke pass-rate |
-|-----------------|
-| 8 / 10 (80 %) |
-Note: 21/22 smoke prompts are NOT in the held-out eval set, so 80% measures training
-memorization, not generalization. The two failures — `set led red brightness 50`
-(hallucinated `acceptor(...)`) and `set alarm 5 minutes` (misrouted) — were caused by
-absent phrasing patterns, now fixed in v5.
-**v5 checkpoint: pending GPU training run.** Use `scripts/eval_harness.py` for
-proper per-tool precision/recall/F1 against the 150-example held-out eval set.
 ## Latency
-Measured against a local Ollama using the standalone client above:
-- **~1.1 – 1.3 s** per call on a laptop CPU.
-- Target on SL2619 (2× Cortex-A55 @ 2 GHz): **0.5 – 1.2 s** with the CPU
-  governor pinned to `performance`. On-device measurement pending.
 ## ONNX exports (for compiler toolchains)
@@ -299,13 +319,15 @@ or runtime do its own dtype conversion / quantization downstream.
 ## Files
 ```
-functiongemma-physical-ai-Q4_K_M.gguf   # 253 MB, GGUF Q4_K_M weights (Ollama / llama.cpp)
-Modelfile                                # Ollama Modelfile (function-token format)
-tools.json                               # 13-tool schema (mobile-actions format)
-token_map.json                           # function-token <-> tool-name map
-onnx/compact-fp32/                       # ONNX export, fp32, with KV cache (1.7 GB)
-onnx/compact-fp16/                       # ONNX export, fp16, with KV cache (833 MB) — see ORT caveat above
-README.md                                # this file
 ```
 ## License
@@ -319,5 +341,7 @@ By using this model you agree to those terms. Base model:
 - Base model: <https://huggingface.co/google/functiongemma-270m-it>
 - Octopus v2 paper: <https://arxiv.org/abs/2404.01744>
 - Hardware demo (Coralboard, Google IO 2026 — full physical setup,
-  WLED-over-USB-CDC, Grinn HAT, etc.):
-  <https://github.com/BrinqAI/coral-functiongemma-demo>

 | Revision | File | Tool count | Notes |
 |----------|------|-----------:|-------|
+| **v7 (current)** | [`functiongemma-physical-ai-v7-Q5_K_M.gguf`](./functiongemma-physical-ai-v7-Q5_K_M.gguf) | 10 | `list_alarms` removed; alarm-query prompts route via `respond()`. 250-row eval: **86.8%** overall, **92.8%** single-tool, **75.0%** multi-tool exact-match, **0.0%** parse failure. |
+| v6 (previous) | [`functiongemma-physical-ai-v6-Q5_K_M.gguf`](./functiongemma-physical-ai-v6-Q5_K_M.gguf) | 11 | Camera + vision dropped. Single-tool routing 95.5%, multi-tool exact-match 23.9%. |
 | v4c (legacy) | [`functiongemma-physical-ai-Q4_K_M.gguf`](./functiongemma-physical-ai-Q4_K_M.gguf) | 13 | Earlier checkpoint, includes camera/scene tools. |
+Schema ships as [`tools.json`](./tools.json) (10 tools, current). Token-to-tool
 mapping is in [`token_map.json`](./token_map.json).
 ## Output format — function tokens
 Tool calls emit as **function tokens**: each tool name compiles to a single
+special-vocabulary token (`<tool_0>` … `<tool_9>` for v7) and a single
 `<end>` terminator. A complete call decodes in roughly 8–15 output tokens,
 vs ~30–80 for native FunctionGemma's
 `<start_function_call>call:NAME{...}<end_function_call>` syntax. On a
 Sample output: `<tool_3>(3,"red")<end>` for `blink_lights(count=3, color="red")`.
 `<tool_0>` → `turn_on_lights`, `<tool_3>` → `blink_lights`,
+`<tool_8>` → `get_system_status`, `<tool_9>` → `respond` (v7 numbering;
+v6 used `<tool_9>` and `<tool_10>` for those — bumped down by one when
+`list_alarms` was removed). Full mapping in [`token_map.json`](./token_map.json).
 > ⚠️ Inference servers MUST stop generation on `<end_of_turn>` (or `<eos>`),
 > NOT on `<end>`. Multi-tool sequences emit `<tool_A>(args)<end><tool_B>(args)<end>`,
 ## Training data
+### v7 (current)
+- **Size**: 2,000 train / 250 eval (`coral_v7_compact.jsonl`).
+- **Schema change**: `list_alarms` removed. Out-of-scope alarm-query prompts
+  ("what alarms do I have?") are deliberately routed through `respond()`
+  rather than answered by a query tool. Compact token map shifted accordingly:
+  `get_system_status` is now `<tool_8>` (was `<tool_9>`), `respond` is
+  `<tool_9>` (was `<tool_10>`).
+- **Multi-tool**: 84 of 250 eval rows (33.6%) are multi-tool sequences,
+  matching the Google mobile-actions distribution.
+- **GGUF eval (Q5_K_M, greedy)**: overall **86.8%** (217/250), single-tool
+  **92.8%** (154/166), multi-tool exact-match **75.0%** (63/84), parse
+  failure **0.0%** (0/250). Per-tool F1 ranges from 0.74 (`respond`) to
+  1.00 (`cancel_alarm`).
+- **Known weak spots** (informal on-device REPL): "tell me a joke" / "what
+  alarms do I have" tend to misroute to `play_buzzer` instead of `respond` —
+  more `respond()` negatives sharing keywords with physical-action tools
+  would help in v8.
+### v6 (previous)
+- **Size**: 1,400 train / 150 eval (v5/v6 dataset lineage, `coral_v5_compact.jsonl`).
+- **Tool count**: 11. Cameras / vision tools dropped from earlier
+  checkpoints; alarm-list tool kept.
+### v4 (legacy)
 - **Size**: 367 train / 100 eval.
 - **Multi-tool**: 13% (vs Google mobile-actions 33.4%).
 for a smaller dataset:
 - **Full bf16 fine-tune (no LoRA)**.
+- **Mean-init** for new `<tool_0>..<tool_9>` and `<end>` special tokens
   (init = mean of existing input embeddings; random init under-converges
   for tiny models on small datasets).
 - **Completion-only loss mask**: hand-rolled, masking everything before
   `<start_of_turn>model\n`. TRL 0.25's `completion_only_loss=True` is a
   no-op on flat-text data and FunctionGemma's chat template lacks
   `{% generation %}` markers required for `assistant_only_loss`.
+- **8 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup. (2,000 examples in
+  v7 — fewer epochs than v4's 15 because dataset size grew 5×.)
+- **Tool-token loss weight 4.0** to keep the new function tokens learning
+  faster than the rest of the vocabulary (Gemma3's 262k-vocab dilutes the
+  signal otherwise).
 - **Effective batch 16** = `per_device_train_batch_size=2 ×
   gradient_accumulation_steps=8` (kept this way to avoid the 8 GiB
   cross-entropy logit allocation OOM that bites Gemma3's 262k vocab).
 }
 ```
+## Eval results
+**v7 checkpoint (2,000 train / 250 eval), Q5_K_M GGUF, greedy decode:**
+| Metric | Result |
+|--------|--------|
+| Overall accuracy | 217 / 250 = **86.8%** |
+| Single-tool accuracy | 154 / 166 = **92.8%** |
+| Multi-tool exact-match | 63 / 84 = **75.0%** |
+| Parse failure rate | 0 / 250 = **0.0%** |
+Per-tool F1: `cancel_alarm` 1.00, `get_system_status` 0.96, `set_alarm` 0.93,
+`set_neopixel_pattern` 0.92, `turn_on_lights` 0.90, `blink_lights` 0.89,
+`turn_off_lights` 0.89, `set_led_color` 0.88, `play_buzzer` 0.83,
+`respond` 0.74. (`respond` is the lowest because the model occasionally
+chooses a physical-action tool with a hallucinated text argument when the
+prompt shares keywords with one — an issue the dispatcher's enum validation
+catches at runtime.)
+**On-device latency** (SL2619 / 2× Cortex-A55 @ 2 GHz, `performance` governor):
+~42 s cold prefill (one-time), ~1.6 s / turn warm — measured across a 33-prompt
+exhaustive REPL run on the actual Coralboard.
 ## Latency
+- **~1.1 – 1.3 s** per call on a laptop CPU (Ollama / standalone client above).
+- **~1.6 s / turn warm**, ~42 s cold prefill on SL2619 (2× Cortex-A55 @ 2 GHz)
+  with the CPU governor pinned to `performance`. Measured 2026-05-05 on the
+  Grinn Coralboard with the v7 GGUF + the `Function_calling/` demo from
+  [BrinqAI/sl2610-examples](https://github.com/BrinqAI/sl2610-examples/tree/coralboard/functiongemma/Function_calling).
 ## ONNX exports (for compiler toolchains)
 ## Files
 ```
+functiongemma-physical-ai-v7-Q5_K_M.gguf  # 248 MB, GGUF Q5_K_M, 10-tool v7 schema (current)
+functiongemma-physical-ai-v6-Q5_K_M.gguf  # 248 MB, GGUF Q5_K_M, 11-tool v6 schema (previous)
+functiongemma-physical-ai-Q4_K_M.gguf     # 253 MB, GGUF Q4_K_M, v4c (legacy)
+Modelfile                                  # Ollama Modelfile (function-token format)
+tools.json                                 # 10-tool schema (mobile-actions format, current)
+token_map.json                             # function-token <-> tool-name map
+onnx/compact-fp32/                         # ONNX export, fp32, with KV cache (1.7 GB)
+onnx/compact-fp16/                         # ONNX export, fp16, with KV cache (833 MB) — see ORT caveat above
+README.md                                  # this file
 ```
 ## License
 - Base model: <https://huggingface.co/google/functiongemma-270m-it>
 - Octopus v2 paper: <https://arxiv.org/abs/2404.01744>
 - Hardware demo (Coralboard, Google IO 2026 — full physical setup,
+  WLED-over-USB-CDC, Grinn HAT, end-to-end voice + text REPL):
+  <https://github.com/BrinqAI/sl2610-examples/tree/coralboard/functiongemma/Function_calling>
+  (BrinqAI fork of the upstream Synaptics demo repo,
+  [synaptics-astra-demos/sl2610-examples](https://github.com/synaptics-astra-demos/sl2610-examples)).