Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -13,9 +13,9 @@ tags:
|
|
| 13 |
|
| 14 |
# Needle
|
| 15 |
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
|
| 20 |
| | |
|
| 21 |
|---|---|
|
|
@@ -76,10 +76,6 @@ d=512, 8H/4KV, BPE=8192
|
|
| 76 |
βββββββββββββ
|
| 77 |
```
|
| 78 |
|
| 79 |
-
No feedforward layers. Each encoder block is gated self-attention; each decoder block is gated self-attention + gated cross-attention. The only nonlinearities are softmax and sigmoid.
|
| 80 |
-
|
| 81 |
-
See [Simple Attention Networks](https://github.com/cactus-compute/needle/blob/main/docs/simple_attention_networks.md) for the full architectural breakdown.
|
| 82 |
-
|
| 83 |
## Quickstart
|
| 84 |
|
| 85 |
```bash
|
|
@@ -119,8 +115,8 @@ Finetune on your own tools via the web UI or CLI:
|
|
| 119 |
# Web UI (generates data via Gemini, trains, evaluates, bundles result)
|
| 120 |
needle ui
|
| 121 |
|
| 122 |
-
# CLI
|
| 123 |
-
python -m src.training.finetune data.jsonl
|
| 124 |
```
|
| 125 |
|
| 126 |
## Links
|
|
|
|
| 13 |
|
| 14 |
# Needle
|
| 15 |
|
| 16 |
+
We distilled Gemini 3.1 into a 26m parameter "[Simple Attention Network](docs/simple_attention_networks.md)" that you can even finetune locally on your Mac/PC.
|
| 17 |
+
In production, Needle runs on [Cactus](https://github.com/cactus-compute/cactus) at 6000 toks/sec prefill and 1200 decode speed.
|
| 18 |
+
Weights are fully open on [Cactus-Compute/needle](https://huggingface.co/Cactus-Compute/needle), as well as the dataset generation.
|
| 19 |
|
| 20 |
| | |
|
| 21 |
|---|---|
|
|
|
|
| 76 |
βββββββββββββ
|
| 77 |
```
|
| 78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
## Quickstart
|
| 80 |
|
| 81 |
```bash
|
|
|
|
| 115 |
# Web UI (generates data via Gemini, trains, evaluates, bundles result)
|
| 116 |
needle ui
|
| 117 |
|
| 118 |
+
# CLI (auto-downloads weights if not local)
|
| 119 |
+
python -m src.training.finetune data.jsonl
|
| 120 |
```
|
| 121 |
|
| 122 |
## Links
|