| --- |
| title: GhostLM |
| emoji: ๐ |
| colorFrom: purple |
| colorTo: gray |
| sdk: gradio |
| app_file: app.py |
| pinned: false |
| license: apache-2.0 |
| short_description: From-scratch 81M cybersecurity LM, v0.9 chat demo |
| --- |
| |
| # GhostLM Chat (v0.9) |
|
|
| Interactive Gradio chat for **GhostLM v0.9 chat**, an 81M-parameter |
| cybersecurity language model trained from scratch in PyTorch. |
|
|
| The Space ships a multi-turn chat interface backed by the v0.9 chat |
| weights. Generation uses the model's three role tokens |
| (`<|ghost_user|>`, `<|ghost_assistant|>`, `<|ghost_end|>`) and stops the |
| moment the assistant's `<|ghost_end|>` is sampled. |
|
|
| ## Bench numbers (v0.9 chat) |
|
|
| The v0.9 chat checkpoint is the bench winner of the ghost-small line on |
| every multiple-choice benchmark we ran: |
|
|
| | Benchmark | n | Score | |
| |---|---:|---:| |
| | [CTIBench MCQ](https://huggingface.co/datasets/AI4Sec/cti-bench), 2-permutation debiased | 2,500 | **28.9%** | |
| | in-repo CTF MCQ eval | 30 | **59.2%** | |
| | SecQA (external) | 210 | **39.3%** | |
| | free-form fact recall, hand-written | 50 | 1/50 (at floor) | |
|
|
| Free-form fact recall is at floor across the entire 81M ghost-small |
| rung by design. At this parameter count the model has the *register* |
| of cybersec writing but not the *facts* in any retrievable form. The |
| next rung (ghost-base ~360M, SmolLM2-360M shape) is gated on rented |
| GPU compute. Spec: [`docs/ghost_base_spec.md`](https://github.com/joemunene-by/GhostLM/blob/main/docs/ghost_base_spec.md). |
|
|
| ## Architecture |
|
|
| 6 layers, d_model 768, 12 heads, with RoPE + SwiGLU + RMSNorm. Pretrain |
| corpus: 273M tokens spanning PRIMUS-Seed, PRIMUS-FineWeb, NVD CVEs, |
| MITRE ATT&CK, CWE, CAPEC, OWASP, IETF RFCs, Exploit-DB, CTFtime, arXiv |
| cs.CR, plus a fact-dense Q&A set. Chat-tuned with the chat-v3 SFT recipe. |
| |
| ## Where the weights live |
| |
| The 324 MB slim weights are stored in the Models repo |
| [`Ghostgim/GhostLM-v0.9-experimental`](https://huggingface.co/Ghostgim/GhostLM-v0.9-experimental). |
| The Space's `app.py` calls `huggingface_hub.hf_hub_download` on first |
| launch and caches them locally. This keeps the Space comfortably under |
| HF's 1 GB free-tier LFS cap; the source code stays small and the |
| weights are versioned separately. |
|
|
| ## Source |
|
|
| GitHub: [`joemunene-by/GhostLM`](https://github.com/joemunene-by/GhostLM) |
|
|
| Run locally: |
|
|
| ```bash |
| git clone https://github.com/joemunene-by/GhostLM |
| cd GhostLM |
| pip install -r demo/requirements.txt |
| PYTHONPATH=. python3 demo/app.py |
| ``` |
|
|
| The model is small enough to run on a laptop CPU; expect ~10-25 s per |
| chat reply at the default 200-token cap. |
|
|
| ## Caveats |
|
|
| - **Hallucinates facts.** CVE IDs, CVSS scores, technique IDs, version |
| ranges are all unreliable. Outputs are register-shaped fiction, not |
| reference material. Verify against authoritative sources. |
| - **No general-knowledge tuning.** Outside cybersecurity the model |
| politely declines and returns to its domain. Don't expect it to |
| summarize a news article or write Python. |
| - **The MCQ wins do not mean factual recall.** The 28.9% on debiased |
| CTIBench measures the register-matching component of the |
| benchmark; the free-form fact recall floor (1/50) is the truth metric. |
|
|
| ## License |
|
|
| Apache 2.0. Built by Joe Munene. |
|
|