needle-playground / README.md
shreyask's picture
Upload README.md with huggingface_hub
ee342db verified
metadata
title: Needle Playground
emoji: 🌵
colorFrom: green
colorTo: gray
sdk: static
pinned: false
license: mit
short_description: Cactus Needle (26M function-calling) in the browser
models:
  - Cactus-Compute/needle
  - onnx-community/needle-onnx

Needle Playground (Browser)

26M-parameter function-calling model running entirely in your browser — no server, no GPU, just onnxruntime-web + WASM. Tap a preset chip, watch JSON appear.

Credits

  • Model: Cactus-Compute/needle — the original Simple Attention Network, trained by Cactus Compute on 200B tokens of pre-training and 2B tokens of function-call post-training.
  • ONNX export: onnx-community/needle-onnx — browser-ready ONNX artifacts produced by a JAX→PyTorch port + torch.onnx.export, with byte-identical output parity against the upstream needle.generate().
  • Browser app: This Space — Vite + TypeScript, onnxruntime-web (WASM backend), sentencepiece-js for tokenization. Mirrors the layout of the official needle playground CLI.

How it works

  1. Page load: fetch encoder.onnx, decoder_step.onnx, needle.model from onnx-community/needle-onnx (~140 MB; cached by the browser after first visit).
  2. Click a preset, or type your own query + tool definitions.
  3. On generate: encoder runs once over [query, <tools>, JSON.stringify(tools)], then the decoder runs step-by-step with KV-cache, seeded with <eos>, until the model emits <eos> again or 256 tokens are reached.
  4. Output is decoded with SentencePiece, stripped of the leading <tool_call> marker, parsed as JSON, and pretty-printed.

Greedy argmax sampling, EOS-only stop — matches Cactus's native generate() configuration exactly.

Source

Full source is available in this Space (browse files above) and the same code as the upstream project repo. Build with npm install && npm run build; dev with npm run dev.

License

MIT (matching the upstream Cactus Needle license).