| --- |
| license: apache-2.0 |
| language: |
| - en |
| tags: |
| - transformers |
| - safetensors |
| - text-generation |
| - cybersecurity |
| - penetration-testing |
| - vulnerability-research |
| - osint |
| - cwe |
| - tool-use |
| - reasoning |
| - chain-of-thought |
| - grpo |
| - quantum-classical |
| - kaon |
| - ibm-quantum |
| - aer |
| - merlin-research |
| - qwen3_5 |
| base_model_relation: finetune |
| pipeline_tag: image-text-to-text |
| --- |
| |
| # Mythoseek |
|
|
| <p align="center"> |
| <img src="banner.jpeg" alt="Mythoseek Banner" width="100%"> |
| </p> |
|
|
| --- |
|
|
| ## Overview |
|
|
| Mythoseek is a 10B parameter language model specialized for |
| cybersecurity β vulnerability research, penetration testing, OSINT, |
| and CWE-pattern reasoning. Fine-tuned from DeepSeek V4 Pro-Qwen3.5 |
| 9B Distilled on enterprise pentest reports and frontier |
| model distillation traces, it brings closed-source cyber AI capability |
| to the open community. |
|
|
| Developed at **Merlin Research** (Stockholm, Sweden) as part of the |
| **KAON** quantum-classical research program β a closed-loop framework |
| connecting IBM Quantum (ibm_kingston, Heron r2) with edge LLM |
| inference on Apple Silicon. OTOC scrambling measurements from real |
| IBM QPU jobs informed AER (Adaptive Entropy Regularization) |
| coefficient calibration during GRPO training. |
| |
| --- |
| |
| ## Training Pipeline |
| |
| | Stage | Method | Details | |
| |---|---|---| |
| | 1 | SFT Distillation | Frontier model trace distillation | |
| | 2 | GRPO / RL | Verifiable rewards on cyber tasks | |
| | 3 | Tool-use SFT | Agent-style tool calling | |
| | 4 | CWE Grounding | CWE-pattern structured reasoning | |
| |
| **Compute:** Google Cloud TPU v6 pods |
| |
| --- |
| |
| ## Results |
| |
| ### CyberGym (arXiv:2506.02548) |
| |
| **CyberGym** β UC Berkeley's large-scale cybersecurity benchmark, |
| 1,507 real-world vulnerabilities from Google OSS-Fuzz across 188 |
| projects. No partial credit, no LLM judge β pass requires a valid |
| PoC that crashes the pre-patch build. |
| |
| <p align="center"> |
| <img src="CyberGym.jpeg" alt="CyberGym Results" width="100%"> |
| </p> |
| |
| | Level | Scaffold | pass@4 | |
| |---|---|---| |
| | Level 0 | Full scaffolding | 62% | |
| | Level 1 | Partial scaffolding | 34% | |
| | Level 2 | Minimal scaffolding | 12% | |
| | Level 3 | No scaffolding | 3% | |
| |
| > For reference: Claude Mythos Preview leads the public leaderboard |
| > at 83.1% pass@1 (overall, closed model). |
| > Mythoseek is a 10B open-weight alternative. |
| |
| ### IFBench |
| |
| <p align="center"> |
| <img src="IFBench.jpeg" alt="IFBench Results" width="100%"> |
| </p> |
| |
| --- |
| |
| ## Intended Use |
| |
| - Vulnerability research and CVE analysis |
| - Penetration testing assistance (OSINT, recon, XSS, SQLi) |
| - CWE classification and pattern recognition |
| - Security report generation |
| - Red team reasoning support |
| |
| **Not intended for:** autonomous offensive operations, |
| unauthorized access, or malicious use. |
| |
| --- |
| |
| ## KAON Connection |
| |
| This model is part of the **KAON** quantum-classical research program: |
| |
| OTOC scrambling measurements on real quantum hardware (SYK model, |
| 4β5 qubits, IBM job IDs: `d7a40irc6das739jkmb0`, |
| `d7cj3c95a5qc73doqri0`) produced entropy profiles that calibrated |
| AER coefficients during RL training. Correlation between OTOC decay |
| and token entropy: Spearman Ο = β0.733, p = 0.016 (n = 1000). |