hmahadik commited on
Commit
e515857
Β·
verified Β·
1 Parent(s): bfcf399

Rewrite README: rename, single function-token format, Octopus v2 citation

Browse files
Files changed (1) hide show
  1. README.md +98 -73
README.md CHANGED
@@ -8,87 +8,102 @@ tags:
8
  - function-calling
9
  - edge
10
  - on-device
 
 
 
11
  - synaptics-sl2619
12
- - coral
13
  - gemma3
14
  pipeline_tag: text-generation
15
  inference: false
16
  ---
17
 
18
- # Coral FunctionGemma 270M
19
 
20
  Fine-tuned [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it)
21
- for the Coral physical-AI demo (Synaptics SL2619 edge board, Google IO 2026).
22
- Two output formats are provided:
 
 
 
23
 
24
- | File | Format | Sample output | Use when |
25
- |------|--------|----------------|----------|
26
- | `coral-functiongemma-v4c-compact-Q4_K_M.gguf` | **compact** | `<tool_3>(3,"red")<end>` | Default. ~8-15 output tokens per call β†’ sub-second decode on a 2-core A55. |
27
- | `coral-functiongemma-v4c-native-Q4_K_M.gguf` | **native** | `<start_function_call>call:blink_lights{count:<escape>3<escape>,color:<escape>red<escape>}<end_function_call>` | Drop-in for the existing Synaptics agentic runtime parser. ~30-80 output tokens. |
28
 
29
- The 13-tool schema covers LED control, Neopixel patterns, buzzer, alarms,
30
- system status, photo capture, scene description, and a `respond` fallback
31
- for chat / out-of-scope prompts. Full schema lives in the demo repo
32
- ([function_gemma/schema/tools.json](https://github.com/BrinqAI/coral-functiongemma-demo/blob/main/function_gemma/schema/tools.json)).
33
 
34
- ## Quick start (Ollama)
35
-
36
- Two install paths. **Pick the second** unless you know your client sets the
37
- stop tokens itself β€” `ollama pull hf.co/...` ignores the shipped Modelfile,
38
- so the compact format will run past `<end>` until it hits `num_predict`.
39
-
40
- ### Option A β€” direct HF pull (defaults only)
41
 
42
- ```bash
43
- ollama pull hf.co/BrinqAI/coral-functiongemma-270m:compact-Q4_K_M
44
- ollama pull hf.co/BrinqAI/coral-functiongemma-270m:native-Q4_K_M
45
- ```
46
 
47
- Stop tokens (`<end>`, `<end_of_turn>`, `<eos>`) and runtime params
48
- (`temperature=0`, `num_ctx=1024`, `num_predict=80`) are **not** applied β€”
49
- Ollama generates a default Modelfile from the GGUF. Use only if your client
50
- injects stop tokens at request time (the demo `inference/backend.py` does
51
- this via `options.stop`).
52
 
53
- ### Option B β€” local `ollama create` (recommended)
54
 
55
  ```bash
56
- # Download GGUF + Modelfile into the same dir
57
- # (huggingface_hub >= 1.0 ships the `hf` CLI; older installs use `huggingface-cli`)
58
- hf download BrinqAI/coral-functiongemma-270m \
59
- coral-functiongemma-v4c-compact-Q4_K_M.gguf Modelfile.compact \
60
- --local-dir ./coral-fg
61
-
62
- cd coral-fg
63
- ollama create coral-functiongemma:compact -f Modelfile.compact
64
- ollama run coral-functiongemma:compact
65
  ```
66
 
67
- Same flow for native: swap `compact` β†’ `native` in both filenames and tag.
68
- This path bakes the stop tokens and decode params into the registered model.
 
 
 
 
 
 
69
 
70
  The model expects prompts built via the FunctionGemma chat template
71
  (developer role + user role, tools list passed via
72
  `tokenizer.apply_chat_template(..., tools=tools)`). Send to Ollama with
73
  `raw=true` so it forwards the prompt verbatim. Plain `ollama run` from the
74
- CLI does **not** pass tools and will degenerate to chat-style refusals β€” see
 
75
  [demo `inference/backend.py`](https://github.com/BrinqAI/coral-functiongemma-demo/blob/main/function_gemma/inference/backend.py)
76
  for the canonical client code.
77
 
 
 
 
 
 
78
  ## Training data
79
 
80
- - **Source**: `function_gemma/data/coral_v4_{compact,native}.jsonl` in the demo repo.
81
  - **Size**: 367 train / 100 eval examples.
82
  - **Mix**: paraphrase expansion + multi-tool sequences + `respond()`
83
- fallbacks for ambiguous / out-of-scope prompts (so the model has a clean
84
- exit when no tool fits, rather than hallucinating one).
85
- - **Buzzer schema**: pattern-only (binary GPIO on the Coral HAT β€” no PWM).
86
- Old `frequency_hz` / `duration_seconds` prompts are routed through
87
- `respond()` as out-of-scope negatives.
88
 
89
  ## Methodology
90
 
91
- Direct adaptation of the SmartPanel v14 trainer:
 
 
 
 
 
 
 
 
 
92
 
93
  - **Full bf16 fine-tune (no LoRA)**.
94
  - **Mean-init** for new `<tool_0>..<tool_12>` and `<end>` special tokens
@@ -98,9 +113,9 @@ Direct adaptation of the SmartPanel v14 trainer:
98
  `<start_of_turn>model\n`. TRL 0.25's `completion_only_loss=True` is a
99
  no-op on flat-text data and FunctionGemma's chat template lacks
100
  `{% generation %}` markers required for `assistant_only_loss`.
101
- - **15 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup. (Coral has 367
102
- examples vs SmartPanel v14's 21k β€” the higher epoch count compensates
103
- for the smaller dataset.)
104
  - **Effective batch 16** = `per_device_train_batch_size=2 Γ—
105
  gradient_accumulation_steps=8` (kept this way to avoid the 8 GiB
106
  cross-entropy logit allocation OOM that bites Gemma3's 262k vocab).
@@ -111,44 +126,53 @@ Direct adaptation of the SmartPanel v14 trainer:
111
 
112
  The trainer source lives at
113
  [`function_gemma/training/train_coral_v4c.py`](https://github.com/BrinqAI/coral-functiongemma-demo/blob/main/function_gemma/training/train_coral_v4c.py)
114
- in the demo repo.
 
 
 
 
 
 
 
 
 
 
 
 
 
115
 
116
  ## Smoke-test results
117
 
118
- 10-prompt Ollama smoke against the registered models (built-in
119
  `function_gemma/training/smoke_test_ollama.py`):
120
 
121
- | Model | Smoke pass-rate |
122
- |----------|-----------------|
123
- | compact | **8 / 10 (80 %)** |
124
- | native | **7 / 10 (70 %)** |
125
 
126
- Both models hit the simple control prompts cleanly
127
- (`turn on the lights`, `blink red 3 times`, `play a beep`, `take a picture`,
128
- `good morning` β†’ respond). Known weak prompts at 367-example scale:
129
- `set led red brightness 50` (compact emits a hallucinated `acceptor(...)` β€”
130
- likely Q4_K_M quantization artifact on `<tool_2>`), `set alarm 5 minutes`
131
- (compact misroutes), `what is cpu temp` (native hallucinates
132
- `get_cpu_temp` instead of `get_system_status`). Plan: paraphrase-expand
133
- the dataset to 2-3k examples for the next checkpoint.
134
 
135
  ## Latency
136
 
137
  Measured on the [demo](https://github.com/BrinqAI/coral-functiongemma-demo)
138
  with `inference/backend.py` against a local Ollama:
139
 
140
- - Compact format: **~1.1 - 1.3 s** per call on a laptop CPU; target on
141
- SL2619 (2Γ— Cortex-A55 @ 2 GHz) is **0.5 - 1.2 s** with the CPU governor
142
- pinned to `performance`.
143
- - Native format: 2 - 5Γ— slower decode (more output tokens).
144
 
145
  ## Files
146
 
147
  ```
148
- coral-functiongemma-v4c-compact-Q4_K_M.gguf # 253 MB, primary
149
- coral-functiongemma-v4c-native-Q4_K_M.gguf # 253 MB, fallback / Synaptics-runtime parity
150
- Modelfile.compact # Ollama Modelfile (compact)
151
- Modelfile.native # Ollama Modelfile (native)
152
  ```
153
 
154
  ## License
@@ -161,4 +185,5 @@ By using this model you agree to those terms. Base model:
161
 
162
  - Demo source: <https://github.com/BrinqAI/coral-functiongemma-demo>
163
  - Base model: <https://huggingface.co/google/functiongemma-270m-it>
 
164
  - Methodology reference (SmartPanel v14): internal β€” see demo README for the published recipe.
 
8
  - function-calling
9
  - edge
10
  - on-device
11
+ - physical-ai
12
+ - iot
13
+ - octopus-v2
14
  - synaptics-sl2619
 
15
  - gemma3
16
  pipeline_tag: text-generation
17
  inference: false
18
  ---
19
 
20
+ # FunctionGemma 270M β€” Physical AI
21
 
22
  Fine-tuned [`google/functiongemma-270m-it`](https://huggingface.co/google/functiongemma-270m-it)
23
+ for voice-controlled physical-AI / household-IoT actions. 13 callable tools
24
+ (lights, neopixel patterns, buzzer, alarms, camera, scene description, system
25
+ status, plus a `respond` natural-language fallback for ambiguous or
26
+ out-of-scope prompts). Reference deployment: Synaptics SL2619 "Coral" edge
27
+ board, Google IO 2026 demo.
28
 
29
+ The full 13-tool schema lives in the demo repo at
30
+ [`function_gemma/schema/tools.json`](https://github.com/BrinqAI/coral-functiongemma-demo/blob/main/function_gemma/schema/tools.json).
 
 
31
 
32
+ ## Output format β€” function tokens
 
 
 
33
 
34
+ This model emits tool calls as **function tokens**: each tool name is
35
+ compiled to a single special-vocabulary token (`<tool_0>` … `<tool_12>`)
36
+ and a single `<end>` terminator. A complete call decodes in roughly 8–15
37
+ output tokens, vs ~30–80 for native FunctionGemma's
38
+ `<start_function_call>call:NAME{...}<end_function_call>` syntax. On a
39
+ 2-core Cortex-A55 this is the difference between sub-second and 2–5 s
40
+ voice-UX latency.
41
 
42
+ | File | Sample output | Output tokens |
43
+ |------|---------------|---------------|
44
+ | `functiongemma-physical-ai-Q4_K_M.gguf` | `<tool_3>(3,"red")<end>` | ~8–15 |
 
45
 
46
+ The token-to-tool mapping (`<tool_0>` β†’ `turn_on_lights`, …, `<tool_12>` β†’
47
+ `respond`) is in
48
+ [`function_gemma/schema/token_map.json`](https://github.com/BrinqAI/coral-functiongemma-demo/blob/main/function_gemma/schema/token_map.json).
 
 
49
 
50
+ ## Quick start (Ollama)
51
 
52
  ```bash
53
+ hf download BrinqAI/functiongemma-270m-physical-ai \
54
+ functiongemma-physical-ai-Q4_K_M.gguf Modelfile \
55
+ --local-dir ./fg-physical-ai
56
+
57
+ cd fg-physical-ai
58
+ ollama create functiongemma-physical-ai -f Modelfile
59
+ ollama run functiongemma-physical-ai
 
 
60
  ```
61
 
62
+ `ollama create -f Modelfile` is the documented install path because the
63
+ shipped `Modelfile` bakes in the stop tokens (`<end>`, `<end_of_turn>`,
64
+ `<eos>`) and decode parameters (`temperature=0`, `num_ctx=1024`,
65
+ `num_predict=80`). Direct `ollama pull hf.co/...` does not apply these,
66
+ and the function-token output will run past `<end>` until it hits
67
+ `num_predict`. Only use the direct-pull path if your client injects stops
68
+ at request time (the demo `inference/backend.py` does this via
69
+ `options.stop`).
70
 
71
  The model expects prompts built via the FunctionGemma chat template
72
  (developer role + user role, tools list passed via
73
  `tokenizer.apply_chat_template(..., tools=tools)`). Send to Ollama with
74
  `raw=true` so it forwards the prompt verbatim. Plain `ollama run` from the
75
+ CLI does **not** pass tools and will degenerate to chat-style refusals β€”
76
+ see
77
  [demo `inference/backend.py`](https://github.com/BrinqAI/coral-functiongemma-demo/blob/main/function_gemma/inference/backend.py)
78
  for the canonical client code.
79
 
80
+ > **Note on demo Ollama tag**: the demo backend currently defaults to
81
+ > `OLLAMA_MODEL=functiongemma-coral:latest` (legacy name). Set
82
+ > `OLLAMA_MODEL=functiongemma-physical-ai` explicitly until the demo repo
83
+ > ships an updated default.
84
+
85
  ## Training data
86
 
 
87
  - **Size**: 367 train / 100 eval examples.
88
  - **Mix**: paraphrase expansion + multi-tool sequences + `respond()`
89
+ fallbacks for ambiguous / out-of-scope prompts (so the model has a
90
+ clean exit when no tool fits, rather than hallucinating one).
91
+ - **Buzzer schema**: pattern-only (binary GPIO on the reference HAT β€” no
92
+ PWM). Old `frequency_hz` / `duration_seconds` prompts are routed
93
+ through `respond()` as out-of-scope negatives.
94
 
95
  ## Methodology
96
 
97
+ This model uses the **functional-token** approach introduced by Octopus v2
98
+ (Chen and Li, 2024): special vocabulary tokens are added for each callable
99
+ function so a tool call decodes in a single output token rather than a
100
+ multi-token JSON string. On-device this collapses ~30–80-token native
101
+ FunctionGemma calls down to ~8–15 tokens, enabling sub-second decode on a
102
+ 2-core Cortex-A55.
103
+
104
+ The training recipe is a direct port of Brinq's SmartPanel v14 trainer
105
+ (full bf16, mean-init for new tokens, completion-only loss mask), adapted
106
+ for a smaller dataset:
107
 
108
  - **Full bf16 fine-tune (no LoRA)**.
109
  - **Mean-init** for new `<tool_0>..<tool_12>` and `<end>` special tokens
 
113
  `<start_of_turn>model\n`. TRL 0.25's `completion_only_loss=True` is a
114
  no-op on flat-text data and FunctionGemma's chat template lacks
115
  `{% generation %}` markers required for `assistant_only_loss`.
116
+ - **15 epochs**, lr `3e-5`, cosine schedule, 0.1 warmup. (367 examples here
117
+ vs SmartPanel v14's ~21k β€” the higher epoch count compensates for the
118
+ smaller dataset.)
119
  - **Effective batch 16** = `per_device_train_batch_size=2 Γ—
120
  gradient_accumulation_steps=8` (kept this way to avoid the 8 GiB
121
  cross-entropy logit allocation OOM that bites Gemma3's 262k vocab).
 
126
 
127
  The trainer source lives at
128
  [`function_gemma/training/train_coral_v4c.py`](https://github.com/BrinqAI/coral-functiongemma-demo/blob/main/function_gemma/training/train_coral_v4c.py)
129
+ in the demo repo. (Filename is preserved for now and may be renamed in a
130
+ follow-up cleanup pass.)
131
+
132
+ ### Citation
133
+
134
+ ```bibtex
135
+ @article{chen2024octopusv2,
136
+ title = {Octopus v2: On-device language model for super agent},
137
+ author = {Chen, Wei and Li, Zhiyuan},
138
+ journal = {arXiv preprint arXiv:2404.01744},
139
+ year = {2024},
140
+ url = {https://arxiv.org/abs/2404.01744}
141
+ }
142
+ ```
143
 
144
  ## Smoke-test results
145
 
146
+ 10-prompt Ollama smoke against the registered model (built-in
147
  `function_gemma/training/smoke_test_ollama.py`):
148
 
149
+ | Smoke pass-rate |
150
+ |-----------------|
151
+ | **8 / 10 (80 %)** |
 
152
 
153
+ The model handles the simple control prompts cleanly (`turn on the
154
+ lights`, `blink red 3 times`, `play a beep`, `take a picture`, `good
155
+ morning` β†’ respond). Known weak prompts at 367-example scale: `set led
156
+ red brightness 50` (hallucinated `acceptor(...)` β€” likely Q4_K_M
157
+ quantization artifact on `<tool_2>`) and `set alarm 5 minutes`
158
+ (misroutes). Plan: paraphrase-expand the dataset to 2–3k examples for the
159
+ next checkpoint.
 
160
 
161
  ## Latency
162
 
163
  Measured on the [demo](https://github.com/BrinqAI/coral-functiongemma-demo)
164
  with `inference/backend.py` against a local Ollama:
165
 
166
+ - **~1.1 – 1.3 s** per call on a laptop CPU.
167
+ - Target on SL2619 (2Γ— Cortex-A55 @ 2 GHz): **0.5 – 1.2 s** with the CPU
168
+ governor pinned to `performance`. On-device measurement pending.
 
169
 
170
  ## Files
171
 
172
  ```
173
+ functiongemma-physical-ai-Q4_K_M.gguf # 253 MB
174
+ Modelfile # Ollama Modelfile (function-token format)
175
+ README.md # this file
 
176
  ```
177
 
178
  ## License
 
185
 
186
  - Demo source: <https://github.com/BrinqAI/coral-functiongemma-demo>
187
  - Base model: <https://huggingface.co/google/functiongemma-270m-it>
188
+ - Octopus v2 paper: <https://arxiv.org/abs/2404.01744>
189
  - Methodology reference (SmartPanel v14): internal β€” see demo README for the published recipe.