Spaces:

Ghostgim
/

ghostlm

Sleeping

App Files Files Community

ghostlm / README.md

Ghostgim

feat: ship v0.9 chat (81M, CTIBench 28.9% / CTF 59.2% / SecQA 39.3%)

5126953 verified 15 days ago

preview code

raw

history blame contribute delete

3.16 kB

	---
	title: GhostLM
	emoji: 🔐
	colorFrom: purple
	colorTo: gray
	sdk: gradio
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: From-scratch 81M cybersecurity LM, v0.9 chat demo
	---

	# GhostLM Chat (v0.9)

	Interactive Gradio chat for GhostLM v0.9 chat, an 81M-parameter
	cybersecurity language model trained from scratch in PyTorch.

	The Space ships a multi-turn chat interface backed by the v0.9 chat
	weights. Generation uses the model's three role tokens
	(`<\|ghost_user\|>`, `<\|ghost_assistant\|>`, `<\|ghost_end\|>`) and stops the
	moment the assistant's `<\|ghost_end\|>` is sampled.

	## Bench numbers (v0.9 chat)

	The v0.9 chat checkpoint is the bench winner of the ghost-small line on
	every multiple-choice benchmark we ran:

	\| Benchmark \| n \| Score \|
	\|---\|---:\|---:\|
	\| [CTIBench MCQ](https://huggingface.co/datasets/AI4Sec/cti-bench), 2-permutation debiased \| 2,500 \| 28.9% \|
	\| in-repo CTF MCQ eval \| 30 \| 59.2% \|
	\| SecQA (external) \| 210 \| 39.3% \|
	\| free-form fact recall, hand-written \| 50 \| 1/50 (at floor) \|

	Free-form fact recall is at floor across the entire 81M ghost-small
	rung by design. At this parameter count the model has the register
	of cybersec writing but not the facts in any retrievable form. The
	next rung (ghost-base ~360M, SmolLM2-360M shape) is gated on rented
	GPU compute. Spec: [`docs/ghost_base_spec.md`](https://github.com/joemunene-by/GhostLM/blob/main/docs/ghost_base_spec.md).

	## Architecture

	6 layers, d_model 768, 12 heads, with RoPE + SwiGLU + RMSNorm. Pretrain
	corpus: 273M tokens spanning PRIMUS-Seed, PRIMUS-FineWeb, NVD CVEs,
	MITRE ATT&CK, CWE, CAPEC, OWASP, IETF RFCs, Exploit-DB, CTFtime, arXiv
	cs.CR, plus a fact-dense Q&A set. Chat-tuned with the chat-v3 SFT recipe.

	## Where the weights live

	The 324 MB slim weights are stored in the Models repo
	[`Ghostgim/GhostLM-v0.9-experimental`](https://huggingface.co/Ghostgim/GhostLM-v0.9-experimental).
	The Space's `app.py` calls `huggingface_hub.hf_hub_download` on first
	launch and caches them locally. This keeps the Space comfortably under
	HF's 1 GB free-tier LFS cap; the source code stays small and the
	weights are versioned separately.

	## Source

	GitHub: [`joemunene-by/GhostLM`](https://github.com/joemunene-by/GhostLM)

	Run locally:

	```bash
	git clone https://github.com/joemunene-by/GhostLM
	cd GhostLM
	pip install -r demo/requirements.txt
	PYTHONPATH=. python3 demo/app.py
	```

	The model is small enough to run on a laptop CPU; expect ~10-25 s per
	chat reply at the default 200-token cap.

	## Caveats

	- Hallucinates facts. CVE IDs, CVSS scores, technique IDs, version
	ranges are all unreliable. Outputs are register-shaped fiction, not
	reference material. Verify against authoritative sources.
	- No general-knowledge tuning. Outside cybersecurity the model
	politely declines and returns to its domain. Don't expect it to
	summarize a news article or write Python.
	- The MCQ wins do not mean factual recall. The 28.9% on debiased
	CTIBench measures the register-matching component of the
	benchmark; the free-form fact recall floor (1/50) is the truth metric.

	## License

	Apache 2.0. Built by Joe Munene.