--- license: apache-2.0 language: - en tags: - transformers - safetensors - text-generation - cybersecurity - penetration-testing - vulnerability-research - osint - cwe - tool-use - reasoning - chain-of-thought - grpo - quantum-classical - kaon - ibm-quantum - aer - merlin-research - qwen3_5 base_model_relation: finetune pipeline_tag: image-text-to-text --- # Mythoseek

Mythoseek Banner

--- ## Overview Mythoseek is a 10B parameter language model specialized for cybersecurity — vulnerability research, penetration testing, OSINT, and CWE-pattern reasoning. Fine-tuned from DeepSeek V4 Pro-Qwen3.5 9B Distilled on enterprise pentest reports and frontier model distillation traces, it brings closed-source cyber AI capability to the open community. Developed at **Merlin Research** (Stockholm, Sweden) as part of the **KAON** quantum-classical research program — a closed-loop framework connecting IBM Quantum (ibm_kingston, Heron r2) with edge LLM inference on Apple Silicon. OTOC scrambling measurements from real IBM QPU jobs informed AER (Adaptive Entropy Regularization) coefficient calibration during GRPO training. --- ## Training Pipeline | Stage | Method | Details | |---|---|---| | 1 | SFT Distillation | Frontier model trace distillation | | 2 | GRPO / RL | Verifiable rewards on cyber tasks | | 3 | Tool-use SFT | Agent-style tool calling | | 4 | CWE Grounding | CWE-pattern structured reasoning | **Compute:** Google Cloud TPU v6 pods --- ## Results ### CyberGym (arXiv:2506.02548) **CyberGym** — UC Berkeley's large-scale cybersecurity benchmark, 1,507 real-world vulnerabilities from Google OSS-Fuzz across 188 projects. No partial credit, no LLM judge — pass requires a valid PoC that crashes the pre-patch build.

CyberGym Results

| Level | Scaffold | pass@4 | |---|---|---| | Level 0 | Full scaffolding | 62% | | Level 1 | Partial scaffolding | 34% | | Level 2 | Minimal scaffolding | 12% | | Level 3 | No scaffolding | 3% | > For reference: Claude Mythos Preview leads the public leaderboard > at 83.1% pass@1 (overall, closed model). > Mythoseek is a 10B open-weight alternative. ### IFBench

IFBench Results

--- ## Intended Use - Vulnerability research and CVE analysis - Penetration testing assistance (OSINT, recon, XSS, SQLi) - CWE classification and pattern recognition - Security report generation - Red team reasoning support **Not intended for:** autonomous offensive operations, unauthorized access, or malicious use. --- ## KAON Connection This model is part of the **KAON** quantum-classical research program: OTOC scrambling measurements on real quantum hardware (SYK model, 4–5 qubits, IBM job IDs: `d7a40irc6das739jkmb0`, `d7cj3c95a5qc73doqri0`) produced entropy profiles that calibrated AER coefficients during RL training. Correlation between OTOC decay and token entropy: Spearman ρ = −0.733, p = 0.016 (n = 1000).