---
license: apache-2.0
language:
- en
tags:
- transformers
- safetensors
- text-generation
- cybersecurity
- penetration-testing
- vulnerability-research
- osint
- cwe
- tool-use
- reasoning
- chain-of-thought
- grpo
- quantum-classical
- kaon
- ibm-quantum
- aer
- merlin-research
- qwen3_5
base_model_relation: finetune
pipeline_tag: image-text-to-text
---

# Mythoseek

<p align="center">
  <img src="banner.jpeg" alt="Mythoseek Banner" width="100%">
</p>

---

## Overview

Mythoseek is a 10B parameter language model specialized for
cybersecurity — vulnerability research, penetration testing, OSINT,
and CWE-pattern reasoning. Fine-tuned from DeepSeek V4 Pro-Qwen3.5
9B Distilled on enterprise pentest reports and frontier
model distillation traces, it brings closed-source cyber AI capability
to the open community.

Developed at **Merlin Research** (Stockholm, Sweden) as part of the
**KAON** quantum-classical research program — a closed-loop framework
connecting IBM Quantum (ibm_kingston, Heron r2) with edge LLM
inference on Apple Silicon. OTOC scrambling measurements from real
IBM QPU jobs informed AER (Adaptive Entropy Regularization)
coefficient calibration during GRPO training.

---

## Training Pipeline

| Stage | Method | Details |
|---|---|---|
| 1 | SFT Distillation | Frontier model trace distillation |
| 2 | GRPO / RL | Verifiable rewards on cyber tasks |
| 3 | Tool-use SFT | Agent-style tool calling |
| 4 | CWE Grounding | CWE-pattern structured reasoning |

**Compute:** Google Cloud TPU v6 pods

---

## Results

### CyberGym (arXiv:2506.02548)

**CyberGym** — UC Berkeley's large-scale cybersecurity benchmark,
1,507 real-world vulnerabilities from Google OSS-Fuzz across 188
projects. No partial credit, no LLM judge — pass requires a valid
PoC that crashes the pre-patch build.

<p align="center">
  <img src="CyberGym.jpeg" alt="CyberGym Results" width="100%">
</p>

| Level | Scaffold | pass@4 |
|---|---|---|
| Level 0 | Full scaffolding | 62% |
| Level 1 | Partial scaffolding | 34% |
| Level 2 | Minimal scaffolding | 12% |
| Level 3 | No scaffolding | 3% |

> For reference: Claude Mythos Preview leads the public leaderboard
> at 83.1% pass@1 (overall, closed model).
> Mythoseek is a 10B open-weight alternative.

### IFBench

<p align="center">
  <img src="IFBench.jpeg" alt="IFBench Results" width="100%">
</p>

---

## Intended Use

- Vulnerability research and CVE analysis
- Penetration testing assistance (OSINT, recon, XSS, SQLi)
- CWE classification and pattern recognition
- Security report generation
- Red team reasoning support

**Not intended for:** autonomous offensive operations,
unauthorized access, or malicious use.

---

## KAON Connection

This model is part of the **KAON** quantum-classical research program:

OTOC scrambling measurements on real quantum hardware (SYK model,
4–5 qubits, IBM job IDs: `d7a40irc6das739jkmb0`,
`d7cj3c95a5qc73doqri0`) produced entropy profiles that calibrated
AER coefficients during RL training. Correlation between OTOC decay
and token entropy: Spearman ρ = −0.733, p = 0.016 (n = 1000).