ghostlm / README.md
Ghostgim's picture
feat: ship v0.9 chat (81M, CTIBench 28.9% / CTF 59.2% / SecQA 39.3%)
5126953 verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: GhostLM
emoji: 🔐
colorFrom: purple
colorTo: gray
sdk: gradio
app_file: app.py
pinned: false
license: apache-2.0
short_description: From-scratch 81M cybersecurity LM, v0.9 chat demo

GhostLM Chat (v0.9)

Interactive Gradio chat for GhostLM v0.9 chat, an 81M-parameter cybersecurity language model trained from scratch in PyTorch.

The Space ships a multi-turn chat interface backed by the v0.9 chat weights. Generation uses the model's three role tokens (<|ghost_user|>, <|ghost_assistant|>, <|ghost_end|>) and stops the moment the assistant's <|ghost_end|> is sampled.

Bench numbers (v0.9 chat)

The v0.9 chat checkpoint is the bench winner of the ghost-small line on every multiple-choice benchmark we ran:

Benchmark n Score
CTIBench MCQ, 2-permutation debiased 2,500 28.9%
in-repo CTF MCQ eval 30 59.2%
SecQA (external) 210 39.3%
free-form fact recall, hand-written 50 1/50 (at floor)

Free-form fact recall is at floor across the entire 81M ghost-small rung by design. At this parameter count the model has the register of cybersec writing but not the facts in any retrievable form. The next rung (ghost-base ~360M, SmolLM2-360M shape) is gated on rented GPU compute. Spec: docs/ghost_base_spec.md.

Architecture

6 layers, d_model 768, 12 heads, with RoPE + SwiGLU + RMSNorm. Pretrain corpus: 273M tokens spanning PRIMUS-Seed, PRIMUS-FineWeb, NVD CVEs, MITRE ATT&CK, CWE, CAPEC, OWASP, IETF RFCs, Exploit-DB, CTFtime, arXiv cs.CR, plus a fact-dense Q&A set. Chat-tuned with the chat-v3 SFT recipe.

Where the weights live

The 324 MB slim weights are stored in the Models repo Ghostgim/GhostLM-v0.9-experimental. The Space's app.py calls huggingface_hub.hf_hub_download on first launch and caches them locally. This keeps the Space comfortably under HF's 1 GB free-tier LFS cap; the source code stays small and the weights are versioned separately.

Source

GitHub: joemunene-by/GhostLM

Run locally:

git clone https://github.com/joemunene-by/GhostLM
cd GhostLM
pip install -r demo/requirements.txt
PYTHONPATH=. python3 demo/app.py

The model is small enough to run on a laptop CPU; expect ~10-25 s per chat reply at the default 200-token cap.

Caveats

  • Hallucinates facts. CVE IDs, CVSS scores, technique IDs, version ranges are all unreliable. Outputs are register-shaped fiction, not reference material. Verify against authoritative sources.
  • No general-knowledge tuning. Outside cybersecurity the model politely declines and returns to its domain. Don't expect it to summarize a news article or write Python.
  • The MCQ wins do not mean factual recall. The 28.9% on debiased CTIBench measures the register-matching component of the benchmark; the free-form fact recall floor (1/50) is the truth metric.

License

Apache 2.0. Built by Joe Munene.