File size: 3,100 Bytes
b0723d1
 
2981625
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15edc33
b0723d1
 
d0a01a4
 
a88bf0c
c351f80
a88bf0c
 
2981625
 
 
 
d0a01a4
2981625
d0a01a4
15edc33
d0a01a4
 
2981625
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b03270c
 
2981625
 
6c604c0
 
 
2981625
 
 
 
 
 
6c604c0
 
 
 
2981625
 
 
 
 
 
 
 
 
d0a01a4
2981625
6c604c0
 
 
 
 
 
2981625
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b03270c
 
 
 
 
15edc33
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
license: apache-2.0
language:
- en
tags:
- transformers
- safetensors
- text-generation
- cybersecurity
- penetration-testing
- vulnerability-research
- osint
- cwe
- tool-use
- reasoning
- chain-of-thought
- grpo
- quantum-classical
- kaon
- ibm-quantum
- aer
- merlin-research
- qwen3_5
base_model_relation: finetune
pipeline_tag: image-text-to-text
---

# Mythoseek

<p align="center">
  <img src="banner.jpeg" alt="Mythoseek Banner" width="100%">
</p>

---

## Overview

Mythoseek is a 10B parameter language model specialized for
cybersecurity — vulnerability research, penetration testing, OSINT,
and CWE-pattern reasoning. Fine-tuned from DeepSeek V4 Pro-Qwen3.5
9B Distilled on enterprise pentest reports and frontier
model distillation traces, it brings closed-source cyber AI capability
to the open community.

Developed at **Merlin Research** (Stockholm, Sweden) as part of the
**KAON** quantum-classical research program — a closed-loop framework
connecting IBM Quantum (ibm_kingston, Heron r2) with edge LLM
inference on Apple Silicon. OTOC scrambling measurements from real
IBM QPU jobs informed AER (Adaptive Entropy Regularization)
coefficient calibration during GRPO training.

---

## Training Pipeline

| Stage | Method | Details |
|---|---|---|
| 1 | SFT Distillation | Frontier model trace distillation |
| 2 | GRPO / RL | Verifiable rewards on cyber tasks |
| 3 | Tool-use SFT | Agent-style tool calling |
| 4 | CWE Grounding | CWE-pattern structured reasoning |

**Compute:** Google Cloud TPU v6 pods

---

## Results

### CyberGym (arXiv:2506.02548)

**CyberGym** — UC Berkeley's large-scale cybersecurity benchmark,
1,507 real-world vulnerabilities from Google OSS-Fuzz across 188
projects. No partial credit, no LLM judge — pass requires a valid
PoC that crashes the pre-patch build.

<p align="center">
  <img src="CyberGym.jpeg" alt="CyberGym Results" width="100%">
</p>

| Level | Scaffold | pass@4 |
|---|---|---|
| Level 0 | Full scaffolding | 62% |
| Level 1 | Partial scaffolding | 34% |
| Level 2 | Minimal scaffolding | 12% |
| Level 3 | No scaffolding | 3% |

> For reference: Claude Mythos Preview leads the public leaderboard
> at 83.1% pass@1 (overall, closed model).
> Mythoseek is a 10B open-weight alternative.

### IFBench

<p align="center">
  <img src="IFBench.jpeg" alt="IFBench Results" width="100%">
</p>

---

## Intended Use

- Vulnerability research and CVE analysis
- Penetration testing assistance (OSINT, recon, XSS, SQLi)
- CWE classification and pattern recognition
- Security report generation
- Red team reasoning support

**Not intended for:** autonomous offensive operations,
unauthorized access, or malicious use.

---

## KAON Connection

This model is part of the **KAON** quantum-classical research program:

OTOC scrambling measurements on real quantum hardware (SYK model,
4–5 qubits, IBM job IDs: `d7a40irc6das739jkmb0`,
`d7cj3c95a5qc73doqri0`) produced entropy profiles that calibrated
AER coefficients during RL training. Correlation between OTOC decay
and token entropy: Spearman ρ = −0.733, p = 0.016 (n = 1000).