File size: 2,670 Bytes
86f2251
8532f23
 
 
 
3267225
 
86f2251
8532f23
86f2251
 
3267225
8532f23
3267225
 
 
8532f23
3267225
8532f23
3267225
 
 
8532f23
3267225
 
 
 
8532f23
3267225
 
 
8532f23
3267225
 
 
8532f23
3267225
8532f23
3267225
 
 
 
 
8532f23
3267225
8532f23
 
3267225
 
 
8532f23
 
3267225
 
 
 
8532f23
3267225
8532f23
3267225
 
 
 
 
 
 
8532f23
3267225
 
8532f23
3267225
8532f23
3267225
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
title: nautilus-compass demo
emoji: 🧭
colorFrom: blue
colorTo: purple
sdk: static
app_file: index.html
pinned: false
license: mit
---

# nautilus-compass · drift detector live demo

Static, in-browser demo for [`nautilus-compass`](https://github.com/chunxiaoxx/nautilus-compass)
v1.0 · the persona-drift detector + tamper-evident memory log for
long-running LLM agents.

## What you can try here

**Drift detection.** Paste a `(system_prompt, response)` pair. We
char-n-gram both and score the response against the **25 positive +
35 negative** persona anchors shipped with nautilus-compass.

- **Green** = response sits inside the persona anchor cone (aligned)
- **Yellow** = neutral, weak signal either way
- **Red** = response is closer to the *negative* anchors (sycophancy,
  fake-completion, root-cause skipping, "user won't notice", etc.)

The verdict + alignment / deviation / drift_score breakdown render
instantly. All scoring runs **client-side in your browser** — no upload,
no tracking, no API key needed.

Two pre-baked sample buttons load (clean) and (drifted) cases from the
same fixtures the unit tests use, so you can sanity-check the verdict
matches what nautilus-compass ships.

## What needs the local install

The full pipeline used in the paper (BGE-m3 dense + bge-reranker-v2-m3
cross-encoder, ~570M params, ~2GB model weights) doesn't fit a free
Space and isn't this demo's point. Same for Merkle hash chain
verification — it needs filesystem access to your `~/.claude/projects/`
session logs.

For the full stack:

```bash
pip install nautilus-compass==1.0.0
bash daemon_start.sh        # one-time per boot · downloads BGE-m3 ~2GB
compass-verify --all        # Merkle integrity scan
```

Or in any of 6 MCP-compatible clients (Claude Code · Claude Desktop ·
Cline · Cursor · Continue.dev · Zed) — see
[`examples/mcp_configs/`](https://github.com/chunxiaoxx/nautilus-compass/tree/main/examples/mcp_configs)
in the repo for paste-ready configs.

## Headline eval numbers (locked v1.0 · 2026-05-08)

| metric | nautilus-compass | best public baseline |
|---|---|---|
| LongMemEval-S (n=500) | **56.6%** | Zep 55-60% (different judge) |
| EverMemBench-Dynamic Run 1 | **44.4%** (n=500) | MemOS 42.55 |
| EverMemBench-Dynamic Run 2 | **47.3%** (n=497) | — |
| Drift detector ROC AUC (held-out) | **0.83** | — (no other black-box drift work) |
| Reproduction cost | **$3.50** end-to-end | $50+ for GPT-4o-judge stacks |

Two papers on arxiv (drift detection · memory recall). 228 pytests
all green. MIT (anchors CC0).

## Local repo

[github.com/chunxiaoxx/nautilus-compass](https://github.com/chunxiaoxx/nautilus-compass)