Instructions to use BrinqAI/functiongemma-270m-physical-ai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use BrinqAI/functiongemma-270m-physical-ai with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="BrinqAI/functiongemma-270m-physical-ai",
	filename="functiongemma-physical-ai-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use BrinqAI/functiongemma-270m-physical-ai with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Use Docker

docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

LM Studio
Jan

vLLM

How to use BrinqAI/functiongemma-270m-physical-ai with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "BrinqAI/functiongemma-270m-physical-ai"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BrinqAI/functiongemma-270m-physical-ai",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Ollama
How to use BrinqAI/functiongemma-270m-physical-ai with Ollama:
```
ollama run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
```

Unsloth Studio new

How to use BrinqAI/functiongemma-270m-physical-ai with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BrinqAI/functiongemma-270m-physical-ai to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BrinqAI/functiongemma-270m-physical-ai to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for BrinqAI/functiongemma-270m-physical-ai to start chatting

Pi new

How to use BrinqAI/functiongemma-270m-physical-ai with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "BrinqAI/functiongemma-270m-physical-ai:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use BrinqAI/functiongemma-270m-physical-ai with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use BrinqAI/functiongemma-270m-physical-ai with Docker Model Runner:
```
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
```

Lemonade

How to use BrinqAI/functiongemma-270m-physical-ai with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Run and chat with the model

lemonade run user.functiongemma-270m-physical-ai-Q4_K_M

List all available models

lemonade list

hmahadik commited on 21 days ago

Commit

e515857

verified ·

1 Parent(s): bfcf399

Rewrite README: rename, single function-token format, Octopus v2 citation

Browse files

Files changed (1) hide show

README.md +98 -73

README.md CHANGED Viewed

@@ -8,87 +8,102 @@ tags:
   - function-calling
   - edge
   - on-device
   - synaptics-sl2619
-  - coral
   - gemma3
 pipeline_tag: text-generation
 inference: false
 ---
-# Coral FunctionGemma 270M
 Fine-tuned [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it)
-for the Coral physical-AI demo (Synaptics SL2619 edge board, Google IO 2026).
-Two output formats are provided:
-| File | Format | Sample output | Use when |
-|------|--------|----------------|----------|
-| `coral-functiongemma-v4c-compact-Q4_K_M.gguf` | **compact** | `<tool_3>(3,"red")<end>` | Default. ~8-15 output tokens per call → sub-second decode on a 2-core A55. |
-| `coral-functiongemma-v4c-native-Q4_K_M.gguf` | **native**  | `<start_function_call>call:blink_lights{count:<escape>3<escape>,color:<escape>red<escape>}<end_function_call>` | Drop-in for the existing Synaptics agentic runtime parser. ~30-80 output tokens. |
-The 13-tool schema covers LED control, Neopixel patterns, buzzer, alarms,
-system status, photo capture, scene description, and a `respond` fallback
-for chat / out-of-scope prompts. Full schema lives in the demo repo
-([function_gemma/schema/tools.json](https://github.com/BrinqAI/coral-functiongemma-demo/blob/main/function_gemma/schema/tools.json)).
-## Quick start (Ollama)
-Two install paths. **Pick the second** unless you know your client sets the
-stop tokens itself — `ollama pull hf.co/...` ignores the shipped Modelfile,
-so the compact format will run past `<end>` until it hits `num_predict`.
-### Option A — direct HF pull (defaults only)
-```bash
-ollama pull hf.co/BrinqAI/coral-functiongemma-270m:compact-Q4_K_M
-ollama pull hf.co/BrinqAI/coral-functiongemma-270m:native-Q4_K_M
-```
-Stop tokens (`<end>`, `<end_of_turn>`, `<eos>`) and runtime params
-(`temperature=0`, `num_ctx=1024`, `num_predict=80`) are **not** applied —
-Ollama generates a default Modelfile from the GGUF. Use only if your client
-injects stop tokens at request time (the demo `inference/backend.py` does
-this via `options.stop`).
-### Option B — local `ollama create` (recommended)
 ```bash
-# Download GGUF + Modelfile into the same dir
-# (huggingface_hub >= 1.0 ships the `hf` CLI; older installs use `huggingface-cli`)
-hf download BrinqAI/coral-functiongemma-270m \
-  coral-functiongemma-v4c-compact-Q4_K_M.gguf Modelfile.compact \
-  --local-dir ./coral-fg
-cd coral-fg
-ollama create coral-functiongemma:compact -f Modelfile.compact
-ollama run coral-functiongemma:compact
 ```
-Same flow for native: swap `compact` → `native` in both filenames and tag.
-This path bakes the stop tokens and decode params into the registered model.
 The model expects prompts built via the FunctionGemma chat template
 (developer role + user role, tools list passed via
 `tokenizer.apply_chat_template(..., tools=tools)`). Send to Ollama with
 `raw=true` so it forwards the prompt verbatim. Plain `ollama run` from the
-CLI does **not** pass tools and will degenerate to chat-style refusals — see
 [demo `inference/backend.py`](https://github.com/BrinqAI/coral-functiongemma-demo/blob/main/function_gemma/inference/backend.py)
 for the canonical client code.
 ## Training data
-- **Source**: `function_gemma/data/coral_v4_{compact,native}.jsonl` in the demo repo.
 - **Size**: 367 train / 100 eval examples.
 - **Mix**: paraphrase expansion + multi-tool sequences + `respond()`
-  fallbacks for ambiguous / out-of-scope prompts (so the model has a clean
-  exit when no tool fits, rather than hallucinating one).
-- **Buzzer schema**: pattern-only (binary GPIO on the Coral HAT — no PWM).
-  Old `frequency_hz` / `duration_seconds` prompts are routed through
-  `respond()` as out-of-scope negatives.
 ## Methodology
-Direct adaptation of the SmartPanel v14 trainer:
 - **Full bf16 fine-tune (no LoRA)**.
 - **Mean-init** for new `<tool_0>..<tool_12>` and `<end>` special tokens
@@ -98,9 +113,9 @@ Direct adaptation of the SmartPanel v14 trainer:
   `<start_of_turn>model\n`. TRL 0.25's `completion_only_loss=True` is a
   no-op on flat-text data and FunctionGemma's chat template lacks
   `{% generation %}` markers required for `assistant_only_loss`.
-- **15 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup. (Coral has 367
-  examples vs SmartPanel v14's 21k — the higher epoch count compensates
-  for the smaller dataset.)
 - **Effective batch 16** = `per_device_train_batch_size=2 ×
   gradient_accumulation_steps=8` (kept this way to avoid the 8 GiB
   cross-entropy logit allocation OOM that bites Gemma3's 262k vocab).
@@ -111,44 +126,53 @@ Direct adaptation of the SmartPanel v14 trainer:
 The trainer source lives at
 [`function_gemma/training/train_coral_v4c.py`](https://github.com/BrinqAI/coral-functiongemma-demo/blob/main/function_gemma/training/train_coral_v4c.py)
-in the demo repo.
 ## Smoke-test results
-10-prompt Ollama smoke against the registered models (built-in
 `function_gemma/training/smoke_test_ollama.py`):
-| Model    | Smoke pass-rate |
-|----------|-----------------|
-| compact  | **8 / 10 (80 %)** |
-| native   | **7 / 10 (70 %)** |
-Both models hit the simple control prompts cleanly
-(`turn on the lights`, `blink red 3 times`, `play a beep`, `take a picture`,
-`good morning` → respond). Known weak prompts at 367-example scale:
-`set led red brightness 50` (compact emits a hallucinated `acceptor(...)` —
-likely Q4_K_M quantization artifact on `<tool_2>`), `set alarm 5 minutes`
-(compact misroutes), `what is cpu temp` (native hallucinates
-`get_cpu_temp` instead of `get_system_status`). Plan: paraphrase-expand
-the dataset to 2-3k examples for the next checkpoint.
 ## Latency
 Measured on the [demo](https://github.com/BrinqAI/coral-functiongemma-demo)
 with `inference/backend.py` against a local Ollama:
-- Compact format: **~1.1 - 1.3 s** per call on a laptop CPU; target on
-  SL2619 (2× Cortex-A55 @ 2 GHz) is **0.5 - 1.2 s** with the CPU governor
-  pinned to `performance`.
-- Native format: 2 - 5× slower decode (more output tokens).
 ## Files
 ```
-coral-functiongemma-v4c-compact-Q4_K_M.gguf   # 253 MB, primary
-coral-functiongemma-v4c-native-Q4_K_M.gguf    # 253 MB, fallback / Synaptics-runtime parity
-Modelfile.compact                              # Ollama Modelfile (compact)
-Modelfile.native                               # Ollama Modelfile (native)
 ```
 ## License
@@ -161,4 +185,5 @@ By using this model you agree to those terms. Base model:
 - Demo source: <https://github.com/BrinqAI/coral-functiongemma-demo>
 - Base model: <https://huggingface.co/google/functiongemma-270m-it>
 - Methodology reference (SmartPanel v14): internal — see demo README for the published recipe.

   - function-calling
   - edge
   - on-device
+  - physical-ai
+  - iot
+  - octopus-v2
   - synaptics-sl2619
   - gemma3
 pipeline_tag: text-generation
 inference: false
 ---
+# FunctionGemma 270M — Physical AI
 Fine-tuned [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it)
+for voice-controlled physical-AI / household-IoT actions. 13 callable tools
+(lights, neopixel patterns, buzzer, alarms, camera, scene description, system
+status, plus a `respond` natural-language fallback for ambiguous or
+out-of-scope prompts). Reference deployment: Synaptics SL2619 "Coral" edge
+board, Google IO 2026 demo.
+The full 13-tool schema lives in the demo repo at
+[`function_gemma/schema/tools.json`](https://github.com/BrinqAI/coral-functiongemma-demo/blob/main/function_gemma/schema/tools.json).
+## Output format — function tokens
+This model emits tool calls as **function tokens**: each tool name is
+compiled to a single special-vocabulary token (`<tool_0>` … `<tool_12>`)
+and a single `<end>` terminator. A complete call decodes in roughly 8–15
+output tokens, vs ~30–80 for native FunctionGemma's
+`<start_function_call>call:NAME{...}<end_function_call>` syntax. On a
+2-core Cortex-A55 this is the difference between sub-second and 2–5 s
+voice-UX latency.
+| File | Sample output | Output tokens |
+|------|---------------|---------------|
+| `functiongemma-physical-ai-Q4_K_M.gguf` | `<tool_3>(3,"red")<end>` | ~8–15 |
+The token-to-tool mapping (`<tool_0>` → `turn_on_lights`, …, `<tool_12>` →
+`respond`) is in
+[`function_gemma/schema/token_map.json`](https://github.com/BrinqAI/coral-functiongemma-demo/blob/main/function_gemma/schema/token_map.json).
+## Quick start (Ollama)
 ```bash
+hf download BrinqAI/functiongemma-270m-physical-ai \
+  functiongemma-physical-ai-Q4_K_M.gguf Modelfile \
+  --local-dir ./fg-physical-ai
+cd fg-physical-ai
+ollama create functiongemma-physical-ai -f Modelfile
+ollama run functiongemma-physical-ai
 ```
+`ollama create -f Modelfile` is the documented install path because the
+shipped `Modelfile` bakes in the stop tokens (`<end>`, `<end_of_turn>`,
+`<eos>`) and decode parameters (`temperature=0`, `num_ctx=1024`,
+`num_predict=80`). Direct `ollama pull hf.co/...` does not apply these,
+and the function-token output will run past `<end>` until it hits
+`num_predict`. Only use the direct-pull path if your client injects stops
+at request time (the demo `inference/backend.py` does this via
+`options.stop`).
 The model expects prompts built via the FunctionGemma chat template
 (developer role + user role, tools list passed via
 `tokenizer.apply_chat_template(..., tools=tools)`). Send to Ollama with
 `raw=true` so it forwards the prompt verbatim. Plain `ollama run` from the
+CLI does **not** pass tools and will degenerate to chat-style refusals —
+see
 [demo `inference/backend.py`](https://github.com/BrinqAI/coral-functiongemma-demo/blob/main/function_gemma/inference/backend.py)
 for the canonical client code.
+> **Note on demo Ollama tag**: the demo backend currently defaults to
+> `OLLAMA_MODEL=functiongemma-coral:latest` (legacy name). Set
+> `OLLAMA_MODEL=functiongemma-physical-ai` explicitly until the demo repo
+> ships an updated default.
 ## Training data
 - **Size**: 367 train / 100 eval examples.
 - **Mix**: paraphrase expansion + multi-tool sequences + `respond()`
+  fallbacks for ambiguous / out-of-scope prompts (so the model has a
+  clean exit when no tool fits, rather than hallucinating one).
+- **Buzzer schema**: pattern-only (binary GPIO on the reference HAT — no
+  PWM). Old `frequency_hz` / `duration_seconds` prompts are routed
+  through `respond()` as out-of-scope negatives.
 ## Methodology
+This model uses the **functional-token** approach introduced by Octopus v2
+(Chen and Li, 2024): special vocabulary tokens are added for each callable
+function so a tool call decodes in a single output token rather than a
+multi-token JSON string. On-device this collapses ~30–80-token native
+FunctionGemma calls down to ~8–15 tokens, enabling sub-second decode on a
+2-core Cortex-A55.
+The training recipe is a direct port of Brinq's SmartPanel v14 trainer
+(full bf16, mean-init for new tokens, completion-only loss mask), adapted
+for a smaller dataset:
 - **Full bf16 fine-tune (no LoRA)**.
 - **Mean-init** for new `<tool_0>..<tool_12>` and `<end>` special tokens
   `<start_of_turn>model\n`. TRL 0.25's `completion_only_loss=True` is a
   no-op on flat-text data and FunctionGemma's chat template lacks
   `{% generation %}` markers required for `assistant_only_loss`.
+- **15 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup. (367 examples here
+  vs SmartPanel v14's ~21k — the higher epoch count compensates for the
+  smaller dataset.)
 - **Effective batch 16** = `per_device_train_batch_size=2 ×
   gradient_accumulation_steps=8` (kept this way to avoid the 8 GiB
   cross-entropy logit allocation OOM that bites Gemma3's 262k vocab).
 The trainer source lives at
 [`function_gemma/training/train_coral_v4c.py`](https://github.com/BrinqAI/coral-functiongemma-demo/blob/main/function_gemma/training/train_coral_v4c.py)
+in the demo repo. (Filename is preserved for now and may be renamed in a
+follow-up cleanup pass.)
+### Citation
+```bibtex
+@article{chen2024octopusv2,
+  title   = {Octopus v2: On-device language model for super agent},
+  author  = {Chen, Wei and Li, Zhiyuan},
+  journal = {arXiv preprint arXiv:2404.01744},
+  year    = {2024},
+  url     = {https://arxiv.org/abs/2404.01744}
+}
+```
 ## Smoke-test results
+10-prompt Ollama smoke against the registered model (built-in
 `function_gemma/training/smoke_test_ollama.py`):
+| Smoke pass-rate |
+|-----------------|
+| **8 / 10 (80 %)** |
+The model handles the simple control prompts cleanly (`turn on the
+lights`, `blink red 3 times`, `play a beep`, `take a picture`, `good
+morning` → respond). Known weak prompts at 367-example scale: `set led
+red brightness 50` (hallucinated `acceptor(...)` — likely Q4_K_M
+quantization artifact on `<tool_2>`) and `set alarm 5 minutes`
+(misroutes). Plan: paraphrase-expand the dataset to 2–3k examples for the
+next checkpoint.
 ## Latency
 Measured on the [demo](https://github.com/BrinqAI/coral-functiongemma-demo)
 with `inference/backend.py` against a local Ollama:
+- **~1.1 – 1.3 s** per call on a laptop CPU.
+- Target on SL2619 (2× Cortex-A55 @ 2 GHz): **0.5 – 1.2 s** with the CPU
+  governor pinned to `performance`. On-device measurement pending.
 ## Files
 ```
+functiongemma-physical-ai-Q4_K_M.gguf   # 253 MB
+Modelfile                                # Ollama Modelfile (function-token format)
+README.md                                # this file
 ```
 ## License
 - Demo source: <https://github.com/BrinqAI/coral-functiongemma-demo>
 - Base model: <https://huggingface.co/google/functiongemma-270m-it>
+- Octopus v2 paper: <https://arxiv.org/abs/2404.01744>
 - Methodology reference (SmartPanel v14): internal — see demo README for the published recipe.