Spaces:
Running
Running
metadata
title: Needle Playground
emoji: 🌵
colorFrom: green
colorTo: gray
sdk: static
pinned: false
license: mit
short_description: Cactus Needle (26M function-calling) in the browser
models:
- Cactus-Compute/needle
- onnx-community/needle-onnx
Needle Playground (Browser)
26M-parameter function-calling model running entirely in your browser — no server, no GPU, just onnxruntime-web + WASM. Tap a preset chip, watch JSON appear.
Credits
- Model: Cactus-Compute/needle — the original Simple Attention Network, trained by Cactus Compute on 200B tokens of pre-training and 2B tokens of function-call post-training.
- ONNX export: onnx-community/needle-onnx — browser-ready ONNX artifacts produced by a JAX→PyTorch port +
torch.onnx.export, with byte-identical output parity against the upstreamneedle.generate(). - Browser app: This Space — Vite + TypeScript,
onnxruntime-web(WASM backend),sentencepiece-jsfor tokenization. Mirrors the layout of the officialneedle playgroundCLI.
How it works
- Page load: fetch
encoder.onnx,decoder_step.onnx,needle.modelfrom onnx-community/needle-onnx (~140 MB; cached by the browser after first visit). - Click a preset, or type your own query + tool definitions.
- On generate: encoder runs once over
[query, <tools>, JSON.stringify(tools)], then the decoder runs step-by-step with KV-cache, seeded with<eos>, until the model emits<eos>again or 256 tokens are reached. - Output is decoded with SentencePiece, stripped of the leading
<tool_call>marker, parsed as JSON, and pretty-printed.
Greedy argmax sampling, EOS-only stop — matches Cactus's native generate() configuration exactly.
Source
Full source is available in this Space (browse files above) and the same code as the upstream project repo. Build with npm install && npm run build; dev with npm run dev.
License
MIT (matching the upstream Cactus Needle license).