File size: 3,100 Bytes
b0723d1 2981625 15edc33 b0723d1 d0a01a4 a88bf0c c351f80 a88bf0c 2981625 d0a01a4 2981625 d0a01a4 15edc33 d0a01a4 2981625 b03270c 2981625 6c604c0 2981625 6c604c0 2981625 d0a01a4 2981625 6c604c0 2981625 b03270c 15edc33 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | ---
license: apache-2.0
language:
- en
tags:
- transformers
- safetensors
- text-generation
- cybersecurity
- penetration-testing
- vulnerability-research
- osint
- cwe
- tool-use
- reasoning
- chain-of-thought
- grpo
- quantum-classical
- kaon
- ibm-quantum
- aer
- merlin-research
- qwen3_5
base_model_relation: finetune
pipeline_tag: image-text-to-text
---
# Mythoseek
<p align="center">
<img src="banner.jpeg" alt="Mythoseek Banner" width="100%">
</p>
---
## Overview
Mythoseek is a 10B parameter language model specialized for
cybersecurity — vulnerability research, penetration testing, OSINT,
and CWE-pattern reasoning. Fine-tuned from DeepSeek V4 Pro-Qwen3.5
9B Distilled on enterprise pentest reports and frontier
model distillation traces, it brings closed-source cyber AI capability
to the open community.
Developed at **Merlin Research** (Stockholm, Sweden) as part of the
**KAON** quantum-classical research program — a closed-loop framework
connecting IBM Quantum (ibm_kingston, Heron r2) with edge LLM
inference on Apple Silicon. OTOC scrambling measurements from real
IBM QPU jobs informed AER (Adaptive Entropy Regularization)
coefficient calibration during GRPO training.
---
## Training Pipeline
| Stage | Method | Details |
|---|---|---|
| 1 | SFT Distillation | Frontier model trace distillation |
| 2 | GRPO / RL | Verifiable rewards on cyber tasks |
| 3 | Tool-use SFT | Agent-style tool calling |
| 4 | CWE Grounding | CWE-pattern structured reasoning |
**Compute:** Google Cloud TPU v6 pods
---
## Results
### CyberGym (arXiv:2506.02548)
**CyberGym** — UC Berkeley's large-scale cybersecurity benchmark,
1,507 real-world vulnerabilities from Google OSS-Fuzz across 188
projects. No partial credit, no LLM judge — pass requires a valid
PoC that crashes the pre-patch build.
<p align="center">
<img src="CyberGym.jpeg" alt="CyberGym Results" width="100%">
</p>
| Level | Scaffold | pass@4 |
|---|---|---|
| Level 0 | Full scaffolding | 62% |
| Level 1 | Partial scaffolding | 34% |
| Level 2 | Minimal scaffolding | 12% |
| Level 3 | No scaffolding | 3% |
> For reference: Claude Mythos Preview leads the public leaderboard
> at 83.1% pass@1 (overall, closed model).
> Mythoseek is a 10B open-weight alternative.
### IFBench
<p align="center">
<img src="IFBench.jpeg" alt="IFBench Results" width="100%">
</p>
---
## Intended Use
- Vulnerability research and CVE analysis
- Penetration testing assistance (OSINT, recon, XSS, SQLi)
- CWE classification and pattern recognition
- Security report generation
- Red team reasoning support
**Not intended for:** autonomous offensive operations,
unauthorized access, or malicious use.
---
## KAON Connection
This model is part of the **KAON** quantum-classical research program:
OTOC scrambling measurements on real quantum hardware (SYK model,
4–5 qubits, IBM job IDs: `d7a40irc6das739jkmb0`,
`d7cj3c95a5qc73doqri0`) produced entropy profiles that calibrated
AER coefficients during RL training. Correlation between OTOC decay
and token entropy: Spearman ρ = −0.733, p = 0.016 (n = 1000). |