A newer version of the Gradio SDK is available: 6.14.0
title: GhostLM
emoji: 🔐
colorFrom: purple
colorTo: gray
sdk: gradio
app_file: app.py
pinned: false
license: apache-2.0
short_description: From-scratch 81M cybersecurity LM, v0.9 chat demo
GhostLM Chat (v0.9)
Interactive Gradio chat for GhostLM v0.9 chat, an 81M-parameter cybersecurity language model trained from scratch in PyTorch.
The Space ships a multi-turn chat interface backed by the v0.9 chat
weights. Generation uses the model's three role tokens
(<|ghost_user|>, <|ghost_assistant|>, <|ghost_end|>) and stops the
moment the assistant's <|ghost_end|> is sampled.
Bench numbers (v0.9 chat)
The v0.9 chat checkpoint is the bench winner of the ghost-small line on every multiple-choice benchmark we ran:
| Benchmark | n | Score |
|---|---|---|
| CTIBench MCQ, 2-permutation debiased | 2,500 | 28.9% |
| in-repo CTF MCQ eval | 30 | 59.2% |
| SecQA (external) | 210 | 39.3% |
| free-form fact recall, hand-written | 50 | 1/50 (at floor) |
Free-form fact recall is at floor across the entire 81M ghost-small
rung by design. At this parameter count the model has the register
of cybersec writing but not the facts in any retrievable form. The
next rung (ghost-base ~360M, SmolLM2-360M shape) is gated on rented
GPU compute. Spec: docs/ghost_base_spec.md.
Architecture
6 layers, d_model 768, 12 heads, with RoPE + SwiGLU + RMSNorm. Pretrain corpus: 273M tokens spanning PRIMUS-Seed, PRIMUS-FineWeb, NVD CVEs, MITRE ATT&CK, CWE, CAPEC, OWASP, IETF RFCs, Exploit-DB, CTFtime, arXiv cs.CR, plus a fact-dense Q&A set. Chat-tuned with the chat-v3 SFT recipe.
Where the weights live
The 324 MB slim weights are stored in the Models repo
Ghostgim/GhostLM-v0.9-experimental.
The Space's app.py calls huggingface_hub.hf_hub_download on first
launch and caches them locally. This keeps the Space comfortably under
HF's 1 GB free-tier LFS cap; the source code stays small and the
weights are versioned separately.
Source
GitHub: joemunene-by/GhostLM
Run locally:
git clone https://github.com/joemunene-by/GhostLM
cd GhostLM
pip install -r demo/requirements.txt
PYTHONPATH=. python3 demo/app.py
The model is small enough to run on a laptop CPU; expect ~10-25 s per chat reply at the default 200-token cap.
Caveats
- Hallucinates facts. CVE IDs, CVSS scores, technique IDs, version ranges are all unreliable. Outputs are register-shaped fiction, not reference material. Verify against authoritative sources.
- No general-knowledge tuning. Outside cybersecurity the model politely declines and returns to its domain. Don't expect it to summarize a news article or write Python.
- The MCQ wins do not mean factual recall. The 28.9% on debiased CTIBench measures the register-matching component of the benchmark; the free-form fact recall floor (1/50) is the truth metric.
License
Apache 2.0. Built by Joe Munene.