Spaces:
Configuration error
Configuration error
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,13 +1,5 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: Cactus-Compute
|
| 3 |
-
sdk: static
|
| 4 |
-
pinned: true
|
| 5 |
-
---
|
| 6 |
-
|
| 7 |
# Cactus
|
| 8 |
|
| 9 |
-
<img src="assets/banner.jpg" alt="Logo" style="border-radius: 30px; width: 100%;">
|
| 10 |
-
|
| 11 |
[![Docs][docs-shield]][docs-url]
|
| 12 |
[![Website][website-shield]][website-url]
|
| 13 |
[![GitHub][github-shield]][github-url]
|
|
@@ -15,7 +7,13 @@ pinned: true
|
|
| 15 |
[![Reddit][reddit-shield]][reddit-url]
|
| 16 |
[![Blog][blog-shield]][blog-url]
|
| 17 |
|
| 18 |
-
A
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
```
|
| 21 |
βββββββββββββββββββ
|
|
@@ -31,7 +29,7 @@ A hybrid low-latency energy-efficient AI engine for mobile devices & wearables.
|
|
| 31 |
βββββββββββββββββββ Custom attention, KV-cache quant, chunked prefill
|
| 32 |
```
|
| 33 |
|
| 34 |
-
## Quick Demo
|
| 35 |
|
| 36 |
- Step 1: `brew install cactus-compute/cactus/cactus`
|
| 37 |
- Step 2: `cactus transcribe` or `cactus run`
|
|
@@ -39,11 +37,12 @@ A hybrid low-latency energy-efficient AI engine for mobile devices & wearables.
|
|
| 39 |
## Cactus Engine
|
| 40 |
|
| 41 |
```cpp
|
| 42 |
-
#include cactus.h
|
| 43 |
|
| 44 |
cactus_model_t model = cactus_init(
|
| 45 |
"path/to/weight/folder",
|
| 46 |
"path to txt or dir of txts for auto-rag",
|
|
|
|
| 47 |
);
|
| 48 |
|
| 49 |
const char* messages = R"([
|
|
@@ -91,7 +90,7 @@ Example response from Gemma3-270m
|
|
| 91 |
## Cactus Graph
|
| 92 |
|
| 93 |
```cpp
|
| 94 |
-
#include cactus.h
|
| 95 |
|
| 96 |
CactusGraph graph;
|
| 97 |
auto a = graph.input({2, 3}, Precision::FP16);
|
|
@@ -117,8 +116,8 @@ graph.hard_reset();
|
|
| 117 |
|
| 118 |
| Reference | Language | Description |
|
| 119 |
|-----------|----------|-------------|
|
| 120 |
-
| [Engine API](cactus_engine.md) | C | Chat completion, streaming, tool calling, transcription, embeddings, RAG, vision, VAD, vector index, cloud handoff |
|
| 121 |
-
| [Graph API](cactus_graph.md) | C++ | Tensor operations, matrix multiplication, attention, normalization, activation functions |
|
| 122 |
| [Python SDK](/python/) | Python | Mac, Linux |
|
| 123 |
| [Swift SDK](/apple/) | Swift | iOS, macOS, tvOS, watchOS, Android |
|
| 124 |
| [Kotlin SDK](/android/) | Kotlin | Android, iOS (via KMP) |
|
|
@@ -126,7 +125,9 @@ graph.hard_reset();
|
|
| 126 |
| [Rust SDK](/rust/) | Rust | Mac, Linux |
|
| 127 |
| [React Native](https://github.com/cactus-compute/cactus-react-native) | JavaScript | iOS, Android |
|
| 128 |
|
| 129 |
-
|
|
|
|
|
|
|
| 130 |
|
| 131 |
- All weights INT4 quantised
|
| 132 |
- LFM: 1k-prefill / 100-decode, values are prefill tps / decode tps
|
|
@@ -134,7 +135,7 @@ graph.hard_reset();
|
|
| 134 |
- Parakeet: 20s audio input, values are latency / decode tps
|
| 135 |
- Missing latency = no NPU support yet
|
| 136 |
|
| 137 |
-
| Device | LFM 1.2B | LFMVL 1.6B | Parakeet 1.1B | RAM |
|
| 138 |
|--------|----------|------------|---------------|-----|
|
| 139 |
| Mac M4 Pro | 582/100 | 0.2s/98 | 0.1s/900k+ | 76MB |
|
| 140 |
| iPad/Mac M3 | 350/60 | 0.3s/69 | 0.3s/800k+ | 70MB |
|
|
@@ -153,20 +154,20 @@ graph.hard_reset();
|
|
| 153 |
|
| 154 |
| Model | Params | End2End ms | Latency ms | Decode toks/sec | NPU | RTF | WER |
|
| 155 |
|-------|--------|------------|------------|------------|-----|-----|-----|
|
| 156 |
-
| UsefulSensors/moonshine-base | 61M | 361
|
| 157 |
-
| openai/whisper-tiny | 39M | 232
|
| 158 |
-
| openai/whisper-base | 74M | 329
|
| 159 |
-
| openai/whisper-small | 244M | 856
|
| 160 |
-
| openai/whisper-medium | 769M | 2085
|
| 161 |
-
| nvidia/parakeet-ctc-0.6b | 600M | 201
|
| 162 |
-
| nvidia/parakeet-tdt-0.6b-v3 | 600M | 718
|
| 163 |
-
| nvidia/parakeet-ctc-1.1b | 1.1B | 279
|
| 164 |
| snakers4/silero-vad | - | - | - | - | - | - | - |
|
| 165 |
|
| 166 |
## Supported LLMs
|
| 167 |
|
| 168 |
- Gemma weights are often **gated** on HuggingFace, needs tokens
|
| 169 |
-
- Run `
|
| 170 |
|
| 171 |
| Model | Features |
|
| 172 |
|-------|----------|
|
|
@@ -188,6 +189,7 @@ graph.hard_reset();
|
|
| 188 |
| LiquidAI/LFM2-2.6B | completion, tools, embed |
|
| 189 |
| LiquidAI/LFM2-VL-450M | vision, txt & img embed, Apple NPU |
|
| 190 |
| LiquidAI/LFM2.5-VL-1.6B | vision, txt & img embed, Apple NPU |
|
|
|
|
| 191 |
| nomic-ai/nomic-embed-text-v2-moe | embed |
|
| 192 |
|
| 193 |
## Roadmap
|
|
@@ -202,11 +204,8 @@ graph.hard_reset();
|
|
| 202 |
| Feb 2026 | Done | Hybrid inference, INT4, lossless Quant (1.5x) |
|
| 203 |
| Mar 2026 | Coming | Qualcomm/Google NPUs, 5-11x faster Android |
|
| 204 |
| Apr 2026 | Coming | Mediatek/Exynos NPUs, Cactus@ICLR |
|
| 205 |
-
| May 2026 | Coming |
|
| 206 |
| Jun 2026 | Coming | Torch/JAX model transpilers |
|
| 207 |
-
| Jul 2026 | Coming | Wearables optimisations, Cactus@ICML |
|
| 208 |
-
| Aug 2026 | Coming | Orchestration |
|
| 209 |
-
| Sep 2026 | Coming | Full Cactus paper, chip manufacturer partners |
|
| 210 |
|
| 211 |
## Using this repo
|
| 212 |
|
|
@@ -278,7 +277,7 @@ graph.hard_reset();
|
|
| 278 |
2. [UCLA's BruinAI](https://bruinai.org/)
|
| 279 |
3. [Char (YC S25)](https://char.com/)
|
| 280 |
4. [Yale's AI Society](https://www.yale-ai.org/team)
|
| 281 |
-
5. [National
|
| 282 |
6. [UC Irvine's AI@UCI](https://aiclub.ics.uci.edu/)
|
| 283 |
7. [Imperial College's AI Society](https://www.imperialcollegeunion.org/csp/1391)
|
| 284 |
8. [University of Pennsylvania's AI@Penn](https://ai-at-penn-main-105.vercel.app/)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Cactus
|
| 2 |
|
|
|
|
|
|
|
| 3 |
[![Docs][docs-shield]][docs-url]
|
| 4 |
[![Website][website-shield]][website-url]
|
| 5 |
[![GitHub][github-shield]][github-url]
|
|
|
|
| 7 |
[![Reddit][reddit-shield]][reddit-url]
|
| 8 |
[![Blog][blog-shield]][blog-url]
|
| 9 |
|
| 10 |
+
A low-latency AI engine for mobile devices & wearables. Main features:
|
| 11 |
+
|
| 12 |
+
- **Fast:** fastest inference on ARM CPU
|
| 13 |
+
- **Low RAM:** zero-copy memory mapping ensures 10x lower RAM use than other engines
|
| 14 |
+
- **Multimodal:** one SDK for speech, vision, and language models
|
| 15 |
+
- **Cloud fallback:** automatically route requests to cloud models if needed
|
| 16 |
+
- **Energy-efficient:** NPU-accelerated prefill
|
| 17 |
|
| 18 |
```
|
| 19 |
βββββββββββββββββββ
|
|
|
|
| 29 |
βββββββββββββββββββ Custom attention, KV-cache quant, chunked prefill
|
| 30 |
```
|
| 31 |
|
| 32 |
+
## Quick Demo (Mac)
|
| 33 |
|
| 34 |
- Step 1: `brew install cactus-compute/cactus/cactus`
|
| 35 |
- Step 2: `cactus transcribe` or `cactus run`
|
|
|
|
| 37 |
## Cactus Engine
|
| 38 |
|
| 39 |
```cpp
|
| 40 |
+
#include "cactus.h"
|
| 41 |
|
| 42 |
cactus_model_t model = cactus_init(
|
| 43 |
"path/to/weight/folder",
|
| 44 |
"path to txt or dir of txts for auto-rag",
|
| 45 |
+
false
|
| 46 |
);
|
| 47 |
|
| 48 |
const char* messages = R"([
|
|
|
|
| 90 |
## Cactus Graph
|
| 91 |
|
| 92 |
```cpp
|
| 93 |
+
#include "cactus.h"
|
| 94 |
|
| 95 |
CactusGraph graph;
|
| 96 |
auto a = graph.input({2, 3}, Precision::FP16);
|
|
|
|
| 116 |
|
| 117 |
| Reference | Language | Description |
|
| 118 |
|-----------|----------|-------------|
|
| 119 |
+
| [Engine API](docs/cactus_engine.md) | C | Chat completion, streaming, tool calling, transcription, embeddings, RAG, vision, VAD, vector index, cloud handoff |
|
| 120 |
+
| [Graph API](docs/cactus_graph.md) | C++ | Tensor operations, matrix multiplication, attention, normalization, activation functions |
|
| 121 |
| [Python SDK](/python/) | Python | Mac, Linux |
|
| 122 |
| [Swift SDK](/apple/) | Swift | iOS, macOS, tvOS, watchOS, Android |
|
| 123 |
| [Kotlin SDK](/android/) | Kotlin | Android, iOS (via KMP) |
|
|
|
|
| 125 |
| [Rust SDK](/rust/) | Rust | Mac, Linux |
|
| 126 |
| [React Native](https://github.com/cactus-compute/cactus-react-native) | JavaScript | iOS, Android |
|
| 127 |
|
| 128 |
+
> **Model weights:** Pre-converted weights for all supported models at [huggingface.co/Cactus-Compute](https://huggingface.co/Cactus-Compute).
|
| 129 |
+
|
| 130 |
+
## Benchmarks (CPU-only, no GPU)
|
| 131 |
|
| 132 |
- All weights INT4 quantised
|
| 133 |
- LFM: 1k-prefill / 100-decode, values are prefill tps / decode tps
|
|
|
|
| 135 |
- Parakeet: 20s audio input, values are latency / decode tps
|
| 136 |
- Missing latency = no NPU support yet
|
| 137 |
|
| 138 |
+
| Device | LFM 1.2B | LFMVL 1.6B | Parakeet 1.1B | VL RAM Usage |
|
| 139 |
|--------|----------|------------|---------------|-----|
|
| 140 |
| Mac M4 Pro | 582/100 | 0.2s/98 | 0.1s/900k+ | 76MB |
|
| 141 |
| iPad/Mac M3 | 350/60 | 0.3s/69 | 0.3s/800k+ | 70MB |
|
|
|
|
| 154 |
|
| 155 |
| Model | Params | End2End ms | Latency ms | Decode toks/sec | NPU | RTF | WER |
|
| 156 |
|-------|--------|------------|------------|------------|-----|-----|-----|
|
| 157 |
+
| UsefulSensors/moonshine-base | 61M | 361 | 182 | 262 | yes | 0.0180 | 0.1395 |
|
| 158 |
+
| openai/whisper-tiny | 39M | 232 | 137 | 581 | yes | 0.0116 | 0.1860 |
|
| 159 |
+
| openai/whisper-base | 74M | 329 | 178 | 358 | yes | 0.0164 | 0.1628 |
|
| 160 |
+
| openai/whisper-small | 244M | 856 | 332 | 108 | yes | 0.0428 | 0.0930 |
|
| 161 |
+
| openai/whisper-medium | 769M | 2085 | 923 | 49 | yes | 0.1041 | 0.0930 |
|
| 162 |
+
| nvidia/parakeet-ctc-0.6b | 600M | 201 | 201 | 5214285 | yes | 0.0101 | 0.0930 |
|
| 163 |
+
| nvidia/parakeet-tdt-0.6b-v3 | 600M | 718 | 718 | 3583333 | yes | 0.0359 | 0.0465 |
|
| 164 |
+
| nvidia/parakeet-ctc-1.1b | 1.1B | 279 | 278 | 4562500 | yes | 0.0139 | 0.1628 |
|
| 165 |
| snakers4/silero-vad | - | - | - | - | - | - | - |
|
| 166 |
|
| 167 |
## Supported LLMs
|
| 168 |
|
| 169 |
- Gemma weights are often **gated** on HuggingFace, needs tokens
|
| 170 |
+
- Run `huggingface-cli login` and input your huggingface token
|
| 171 |
|
| 172 |
| Model | Features |
|
| 173 |
|-------|----------|
|
|
|
|
| 189 |
| LiquidAI/LFM2-2.6B | completion, tools, embed |
|
| 190 |
| LiquidAI/LFM2-VL-450M | vision, txt & img embed, Apple NPU |
|
| 191 |
| LiquidAI/LFM2.5-VL-1.6B | vision, txt & img embed, Apple NPU |
|
| 192 |
+
| tencent/Youtu-LLM-2B | completion, tools, embed |
|
| 193 |
| nomic-ai/nomic-embed-text-v2-moe | embed |
|
| 194 |
|
| 195 |
## Roadmap
|
|
|
|
| 204 |
| Feb 2026 | Done | Hybrid inference, INT4, lossless Quant (1.5x) |
|
| 205 |
| Mar 2026 | Coming | Qualcomm/Google NPUs, 5-11x faster Android |
|
| 206 |
| Apr 2026 | Coming | Mediatek/Exynos NPUs, Cactus@ICLR |
|
| 207 |
+
| May 2026 | Coming | Wearables & custom chips optimisations |
|
| 208 |
| Jun 2026 | Coming | Torch/JAX model transpilers |
|
|
|
|
|
|
|
|
|
|
| 209 |
|
| 210 |
## Using this repo
|
| 211 |
|
|
|
|
| 277 |
2. [UCLA's BruinAI](https://bruinai.org/)
|
| 278 |
3. [Char (YC S25)](https://char.com/)
|
| 279 |
4. [Yale's AI Society](https://www.yale-ai.org/team)
|
| 280 |
+
5. [National University of Singapore's AI Society](https://www.nusaisociety.org/)
|
| 281 |
6. [UC Irvine's AI@UCI](https://aiclub.ics.uci.edu/)
|
| 282 |
7. [Imperial College's AI Society](https://www.imperialcollegeunion.org/csp/1391)
|
| 283 |
8. [University of Pennsylvania's AI@Penn](https://ai-at-penn-main-105.vercel.app/)
|